Skip to content

Latest commit

 

History

History
55 lines (29 loc) · 3.14 KB

File metadata and controls

55 lines (29 loc) · 3.14 KB
graph LR
    AudioBufferProcessor["AudioBufferProcessor"]
    VADAnalyzer["VADAnalyzer"]
    TranscriptProcessor["TranscriptProcessor"]
    DTMFAggregator["DTMFAggregator"]
    AudioBufferProcessor -- "provides processed audio to" --> VADAnalyzer
    VADAnalyzer -- "provides speech activity context to" --> TranscriptProcessor
Loading

CodeBoardingDemoContact

Details

This subsystem is designed to efficiently process diverse inputs derived from an audio stream, primarily focusing on speech and DTMF signals. The AudioBufferProcessor forms the foundation by managing raw audio data. This processed audio is then channeled to the VADAnalyzer for voice activity detection, a critical step that provides contextual information. The TranscriptProcessor leverages this context to accurately aggregate textual transcriptions, while the DTMFAggregator independently handles DTMF tones. This architecture ensures parallel and specialized processing of different audio-derived data types, optimizing for responsive and accurate conversational AI interactions.

AudioBufferProcessor

Manages the buffering, recording, and initial processing of raw audio data, serving as the entry point for audio streams into the system.

Related Classes/Methods:

VADAnalyzer

Analyzes processed audio frames to detect human voice activity (VAD) and compute volume, providing crucial signals for subsequent speech-related processing.

Related Classes/Methods:

TranscriptProcessor

Aggregates and processes textual transcription frames, distinguishing between speakers, and preparing aggregated text for further use. It operates on data derived from speech-to-text (STT) outputs.

Related Classes/Methods:

DTMFAggregator

Specializes in aggregating DTMF (Dual-Tone Multi-Frequency) signals, managing their detection and handling interruptions, providing a pathway for non-speech input.

Related Classes/Methods: