graph LR
Wake_Word_Inference_Orchestrator["Wake Word Inference Orchestrator"]
Audio_to_Mel_Spectrogram_Converter["Audio-to-Mel Spectrogram Converter"]
Mel_Spectrogram_to_Feature_Extractor["Mel Spectrogram-to-Feature Extractor"]
Wake_Word_Model_Inference["Wake Word Model Inference"]
Wake_Word_ONNX_Models["Wake Word ONNX Models"]
Wake_Word_Inference_Orchestrator -- "orchestrates" --> Audio_to_Mel_Spectrogram_Converter
Wake_Word_Inference_Orchestrator -- "orchestrates" --> Mel_Spectrogram_to_Feature_Extractor
Wake_Word_Inference_Orchestrator -- "orchestrates" --> Wake_Word_Model_Inference
Audio_to_Mel_Spectrogram_Converter -- "produces mel spectrograms for" --> Mel_Spectrogram_to_Feature_Extractor
Mel_Spectrogram_to_Feature_Extractor -- "produces speech embeddings for" --> Wake_Word_Model_Inference
Wake_Word_Model_Inference -- "utilizes" --> Wake_Word_ONNX_Models
Wake_Word_Model_Inference -- "provides detection results to" --> Wake_Word_Inference_Orchestrator
The openWakeWord-cpp system is structured around a multi-threaded pipeline for real-time wake word detection. The Wake Word Inference Orchestrator (represented by the main function) initializes the ONNX Runtime environment and coordinates three primary processing stages: Audio-to-Mel Spectrogram Converter, Mel Spectrogram-to-Feature Extractor, and Wake Word Model Inference. The Audio-to-Mel Spectrogram Converter transforms raw audio input into mel spectrograms, which are then consumed by the Mel Spectrogram-to-Feature Extractor to generate speech embeddings. Finally, the Wake Word Model Inference component utilizes these embeddings, along with pre-trained Wake Word ONNX Models, to determine the presence of wake words. Data flows sequentially through these components, with the orchestrator managing the overall execution and synchronization.
This component is the central coordinator for the wake word detection process. It initializes the ONNX Runtime environment, loads multiple wake word ONNX models, creates and manages Ort::Session instances for each model, and orchestrates the inference calls. It receives pre-processed speech embeddings and dispatches them for actual inference execution.
Related Classes/Methods:
main
This component is responsible for converting raw audio samples into mel spectrograms, a frequency-domain representation suitable for further speech processing. It utilizes an ONNX model for this transformation.
Related Classes/Methods:
audioToMels
This component takes the generated mel spectrograms and extracts higher-level speech embeddings (features) that are used as input for the wake word detection models. It also uses an ONNX model for this process.
Related Classes/Methods:
melsToFeatures
This component performs the actual wake word detection. It receives speech embeddings, prepares them for inference, executes the wake word ONNX models, and processes their outputs to determine wake word probabilities or scores.
Related Classes/Methods:
featuresToOutput
These are passive data components that represent the trained machine learning models for specific wake words. They are stored in the ONNX format and define the computational graph that the Wake Word Model Inference component executes. They are the "knowledge base" of the wake word detection system.
Related Classes/Methods:
models/*.onnx