graph LR
BrowserSession["BrowserSession"]
DOMService["DOMService"]
BrowserUseServer["BrowserUseServer"]
DOMWatchdog["DOMWatchdog"]
DefaultActionWatchdog["DefaultActionWatchdog"]
BrowserProfile["BrowserProfile"]
BrowserSession -- "utilizes" --> DOMService
BrowserProfile -- "configures" --> BrowserSession
DOMWatchdog -- "utilizes" --> DOMService
BrowserUseServer -- "manages" --> BrowserSession
BrowserUseServer -- "orchestrates" --> DOMWatchdog
BrowserUseServer -- "orchestrates" --> DefaultActionWatchdog
DOMWatchdog -- "interacts with" --> BrowserSession
DefaultActionWatchdog -- "interacts with" --> BrowserSession
The Browser Automation Module is a self-contained subsystem responsible for all browser interactions within the project. It leverages Playwright for low-level control and provides a structured interface for AI agents to interact with web pages.
Manages the lifecycle of a browser instance, including launching, closing, tab management (creation, switching, closing), URL navigation, and direct low-level communication with the browser via Playwright's CDP (Chrome DevTools Protocol) capabilities. It is the primary interface for direct browser control.
Related Classes/Methods:
Extracts, processes, and provides a structured, enhanced representation of the Document Object Model (DOM) of web pages, including accessibility information. It acts as the authoritative source for the current state and content of the web page.
Related Classes/Methods:
Serves as the Multi-Component Protocol (MCP) server endpoint for external requests related to browser automation. It receives high-level commands (e.g., navigate, click, type) from an AI Agent or other clients and translates them into calls to the underlying BrowserSession and other components. It initializes and manages BrowserSession instances.
Related Classes/Methods:
Monitors and responds to browser events related to the DOM and page state, such as network stability, page load completion, and DOM mutations. It ensures that the DOMService has the most current information and provides mechanisms for waiting on specific page conditions.
Related Classes/Methods:
Implements the default execution logic for common user interactions within the browser, such as clicking elements, typing text into input fields, scrolling, and handling file uploads. It provides a standardized way to perform these actions.
Related Classes/Methods:
Manages browser-specific configurations, including launch arguments, user data directories, and browser extensions. It prepares the environment for a BrowserSession to ensure consistent and customized browser behavior.
Related Classes/Methods: