graph LR
Detector["Detector"]
detect["detect"]
get_probabilities["get_probabilities"]
_detect_block["_detect_block"]
cleaning_text["cleaning_text"]
_extract_ngrams["_extract_ngrams"]
_update_lang_prob["_update_lang_prob"]
NGram["NGram"]
detect -- "calls" --> get_probabilities
get_probabilities -- "delegates to" --> _detect_block
_detect_block -- "invokes" --> cleaning_text
_detect_block -- "calls" --> _extract_ngrams
_detect_block -- "updates probabilities via" --> _update_lang_prob
_extract_ngrams -- "relies on" --> NGram
The Core Detection Engine operates as a pipeline: 1. External calls initiate detection via the detect function. 2. detect delegates to get_probabilities to manage the overall process of calculating language likelihoods. 3. get_probabilities orchestrates the core analysis by repeatedly calling _detect_block for segments of the input text. 4. _detect_block performs the detailed work for each text segment: It first cleans the text using cleaning_text. Then, it extracts features (n-grams) using _extract_ngrams, which in turn leverages the langdetect.utils.ngram.NGram class and add_char utility for character handling. Finally, it updates language probabilities based on the extracted n-grams using _update_lang_prob. 5. After all text blocks are processed, get_probabilities sorts the final results using _sort_probability to present the most probable languages. The Detector class acts as the overarching container and state manager for this entire process, maintaining the context and state throughout the detection lifecycle. The flow is highly sequential and data-driven, with text being transformed and probabilities being refined at each step.
The central orchestrator and state manager for the language detection process. It holds the internal state and coordinates the overall detection flow.
Related Classes/Methods:
Provides the high-level public API for users to initiate language detection on a given text, serving as the primary external interface.
Related Classes/Methods:
Manages the overall calculation and retrieval of language probabilities, orchestrating detailed text analysis and result sorting.
Related Classes/Methods:
The core algorithmic component responsible for detailed, block-by-block text analysis, coordinating text cleaning, n-gram extraction, and probability updates.
Related Classes/Methods:
Preprocesses the input text by removing irrelevant characters or applying normalization rules before n-gram extraction.
Related Classes/Methods:
Generates n-grams from processed text, which are fundamental features for language identification.
Related Classes/Methods:
Adjusts and refines language probability scores based on extracted n-grams and pre-loaded language profiles.
Related Classes/Methods:
Provides core utility functionalities for n-gram generation and text normalization, supporting the main detection components.
Related Classes/Methods: