Keyword Spotting¶

Keyword Spotting vs Speed Recognition¶
| Keyword Spotting | Speed Recognition | |
|---|---|---|
| Power-Usage | Low | High |
| Type | Continuous | |
| Location | On-Device | On-Device/ Online |
Types¶
| Single Shot | Streaming | |
|---|---|---|
| Only keyword spoken | Keyword within a sentence |
Challenges¶
| Aspect | Constraint | Comment | Metrics |
|---|---|---|---|
| System performance | Latency | Listening animation | |
| Bandwidth | |||
| Preserving | Security | Safeguarding data being sent to cloud | |
| Privacy | |||
| Model | Accuracy | Listen continuously, but only trigger at the right time Pick operating point accordingly | ![]() |
| Personalization | Trigger only for user, not for other users or for background noise | ||
| Resource constraints | Battery | ||
| Memory | ![]() |
Model¶
Spectrogram is just an image

TinyConv¶
Since we only we are only focused on recognizing a few keywords, we can just use One Conv2D followed by single dense layer
flowchart LR
Input --> Conv --> FC --> Softmax --> Output Limitations¶
- Limited vocabulary
- Lower accuracy
- Limited UX
Cascading¶


Multiple Inferences¶
- Average inferences across multiple time slices
This is to avoid False Positives for group of words. For eg: - No - No good - Notion - Notice - Notable

