graph LR
pathway_io__utils["pathway.io._utils"]
pathway_io_airbyte["pathway.io.airbyte"]
pathway_io_bigquery["pathway.io.bigquery"]
pathway_io_deltalake["pathway.io.deltalake"]
pathway_io_s3["pathway.io.s3"]
pathway_io_kafka["pathway.io.kafka"]
pathway_io_postgres["pathway.io.postgres"]
pathway_io_http["pathway.io.http"]
pathway_io_airbyte -- "depends on" --> pathway_io__utils
pathway_io_bigquery -- "depends on" --> pathway_io__utils
pathway_io_deltalake -- "depends on" --> pathway_io__utils
pathway_io_s3 -- "depends on" --> pathway_io__utils
The pathway.io subsystem is designed to provide a unified interface for ingesting and egressing data from various external sources into the Pathway data processing framework. It achieves this through a collection of specialized connector components, each tailored to interact with a specific data source technology such as Airbyte, BigQuery, Delta Lake, S3, Kafka, PostgreSQL, and HTTP. A central _utils component provides common functionalities and helper methods, ensuring consistency and reducing redundancy across these diverse connectors. Each connector component typically exposes a read function as its primary entry point, abstracting the complexities of data source-specific protocols, authentication, and data format conversions. This modular design allows for easy extension to new data sources while maintaining a consistent interaction pattern for the Pathway core engine. The connectors primarily depend on the _utils component for shared functionalities, forming a clear and efficient data ingestion and egress architecture.
Provides foundational utilities and helper functions commonly used across multiple specific I/O connectors. This includes functionalities for schema/data format construction, connection settings preparation, and other shared helper functions that standardize connector development.
Related Classes/Methods:
Manages connections to Airbyte data sources, handling data ingestion and egress, protocol specifics, authentication, error handling, and data serialization/deserialization for Airbyte-compatible data streams.
Related Classes/Methods:
Facilitates data exchange with Google BigQuery, managing connection establishment, data ingestion and egress, BigQuery-specific protocol handling, authentication, error management, and data format conversions.
Related Classes/Methods:
Provides capabilities for reading from and writing to Delta Lake tables, including managing connections, handling Delta Lake protocol specifics, ensuring data consistency, and performing serialization/deserialization.
Related Classes/Methods:
Handles interactions with Amazon S3 (and compatible object storage services), managing file-based data ingestion and egress, S3 protocol specifics, authentication, error handling, and data serialization/deserialization for various file formats.
Related Classes/Methods:
Manages real-time data streaming with Apache Kafka, responsible for establishing and maintaining connections, handling message production and consumption, Kafka protocol specifics, and serialization/deserialization of stream data.
Related Classes/Methods:
Provides connectivity to PostgreSQL databases, handling data ingestion from and egress to tables, managing SQL protocol specifics, authentication, error handling, and data type mapping between Python and PostgreSQL.
Related Classes/Methods:
Enables data exchange over HTTP/HTTPS, managing requests and responses for data ingestion (e.g., from REST APIs) and egress, handling HTTP protocol specifics, authentication, and serialization/deserialization of data payloads (e.g., JSON, XML).
Related Classes/Methods: