Keyword Spotting¶

Keyword Spotting vs Speed Recognition¶

	Single Shot	Streaming
	Only keyword spoken	Keyword within a sentence

Aspect	Constraint	Comment
System performance	Latency	Listening animation
	Bandwidth
Preserving	Security	Safeguarding data being sent to cloud
	Privacy
Model	Accuracy	Listen continuously, but only trigger at the right time Pick operating point accordingly
	Personalization	Trigger only for user, not for other users or for background noise
Resource constraints	Battery
	Memory

Spectrogram is just an image

Since we only we are only focused on recognizing a few keywords, we can just use One Conv2D followed by single dense layer

flowchart LR

Input --> Conv --> FC --> Softmax --> Output

This is to avoid False Positives for group of words. For eg: - No - No good - Notion - Notice - Notable

2025-12-01