graph LR
ebooklib_plugins_standard["ebooklib.plugins.standard"]
ebooklib_plugins_tidyhtml["ebooklib.plugins.tidyhtml"]
ebooklib_utils["ebooklib.utils"]
ebooklib_plugins_standard_html_before_write["ebooklib.plugins.standard.html_before_write"]
ebooklib_plugins_tidyhtml_html_before_write["ebooklib.plugins.tidyhtml.html_before_write"]
ebooklib_plugins_tidyhtml_html_after_read["ebooklib.plugins.tidyhtml.html_after_read"]
ebooklib_utils_parse_html_string["ebooklib.utils.parse_html_string"]
ebooklib_utils_get_pages["ebooklib.utils.get_pages"]
ebooklib_utils_get_pages_for_items["ebooklib.utils.get_pages_for_items"]
ebooklib_utils_get_pages_for_items -- "calls" --> ebooklib_utils_get_pages
ebooklib_utils_get_pages_for_items -- "calls" --> ebooklib_utils_parse_html_string
ebooklib_plugins_standard -- "provides" --> ebooklib_plugins_standard_html_before_write
ebooklib_plugins_tidyhtml -- "provides" --> ebooklib_plugins_tidyhtml_html_before_write
ebooklib_plugins_tidyhtml -- "provides" --> ebooklib_plugins_tidyhtml_html_after_read
ebooklib_utils -- "provides" --> ebooklib_utils_parse_html_string
ebooklib_utils -- "provides" --> ebooklib_utils_get_pages
ebooklib_utils -- "provides" --> ebooklib_utils_get_pages_for_items
ebooklib_plugins_standard_html_before_write -- "calls" --> ebooklib_utils_parse_html_string
ebooklib_utils_get_pages -- "calls" --> ebooklib_utils_parse_html_string
The ebooklib project's content processing subsystem is primarily composed of ebooklib.plugins.standard, ebooklib.plugins.tidyhtml, and ebooklib.utils. The standard and tidyhtml plugins are responsible for transforming and cleaning HTML content at different stages of the EPUB creation or parsing process, specifically before writing and after reading. These plugins leverage the core utilities provided by ebooklib.utils. The ebooklib.utils component offers fundamental functionalities such as parse_html_string for converting raw HTML into a manipulable structure and get_pages and get_pages_for_items for extracting page-related information. This architecture ensures a modular and extensible approach to content manipulation, allowing for various HTML processing steps to be applied through a plugin-based system, all underpinned by a set of robust HTML utility functions.
Manages standard, pre-defined HTML content transformations. It acts as a container for common plugin functionalities that modify HTML before serialization.
Related Classes/Methods:
Manages HTML cleaning and tidying operations. This component encapsulates functionalities for ensuring HTML content is well-formed and consistent.
Related Classes/Methods:
Provides a collection of general-purpose content manipulation and extraction utilities. This component offers foundational helper functions for various content-related tasks.
Related Classes/Methods:
A specific plugin hook for applying standard HTML modifications before content serialization. It represents an extension point in the content writing pipeline.
Related Classes/Methods:
A plugin hook for tidying HTML content before serialization. This hook ensures content cleanliness prior to being written to an EPUB file.
Related Classes/Methods:
A plugin hook for tidying HTML content immediately after parsing. This hook ensures content cleanliness as soon as it's read into the system.
Related Classes/Methods:
A fundamental utility for parsing raw HTML strings into a structured, manipulable format (e.g., a DOM-like object). It's a core building block for HTML processing.
Related Classes/Methods:
A utility for extracting and structuring page-related information from HTML content. It leverages HTML parsing to derive page breaks or content sections.
Related Classes/Methods:
Orchestrates calls to foundational content extraction utilities for multiple items.
Related Classes/Methods: