Skip to content

Latest commit

 

History

History
98 lines (52 loc) · 6.01 KB

File metadata and controls

98 lines (52 loc) · 6.01 KB
graph LR
    pathway_io__utils["pathway.io._utils"]
    pathway_io_airbyte["pathway.io.airbyte"]
    pathway_io_bigquery["pathway.io.bigquery"]
    pathway_io_deltalake["pathway.io.deltalake"]
    pathway_io_s3["pathway.io.s3"]
    pathway_io_kafka["pathway.io.kafka"]
    pathway_io_postgres["pathway.io.postgres"]
    pathway_io_http["pathway.io.http"]
    pathway_io_airbyte -- "depends on" --> pathway_io__utils
    pathway_io_bigquery -- "depends on" --> pathway_io__utils
    pathway_io_deltalake -- "depends on" --> pathway_io__utils
    pathway_io_s3 -- "depends on" --> pathway_io__utils
Loading

CodeBoardingDemoContact

Details

The pathway.io subsystem is designed to provide a unified interface for ingesting and egressing data from various external sources into the Pathway data processing framework. It achieves this through a collection of specialized connector components, each tailored to interact with a specific data source technology such as Airbyte, BigQuery, Delta Lake, S3, Kafka, PostgreSQL, and HTTP. A central _utils component provides common functionalities and helper methods, ensuring consistency and reducing redundancy across these diverse connectors. Each connector component typically exposes a read function as its primary entry point, abstracting the complexities of data source-specific protocols, authentication, and data format conversions. This modular design allows for easy extension to new data sources while maintaining a consistent interaction pattern for the Pathway core engine. The connectors primarily depend on the _utils component for shared functionalities, forming a clear and efficient data ingestion and egress architecture.

pathway.io._utils

Provides foundational utilities and helper functions commonly used across multiple specific I/O connectors. This includes functionalities for schema/data format construction, connection settings preparation, and other shared helper functions that standardize connector development.

Related Classes/Methods:

pathway.io.airbyte

Manages connections to Airbyte data sources, handling data ingestion and egress, protocol specifics, authentication, error handling, and data serialization/deserialization for Airbyte-compatible data streams.

Related Classes/Methods:

pathway.io.bigquery

Facilitates data exchange with Google BigQuery, managing connection establishment, data ingestion and egress, BigQuery-specific protocol handling, authentication, error management, and data format conversions.

Related Classes/Methods:

pathway.io.deltalake

Provides capabilities for reading from and writing to Delta Lake tables, including managing connections, handling Delta Lake protocol specifics, ensuring data consistency, and performing serialization/deserialization.

Related Classes/Methods:

pathway.io.s3

Handles interactions with Amazon S3 (and compatible object storage services), managing file-based data ingestion and egress, S3 protocol specifics, authentication, error handling, and data serialization/deserialization for various file formats.

Related Classes/Methods:

pathway.io.kafka

Manages real-time data streaming with Apache Kafka, responsible for establishing and maintaining connections, handling message production and consumption, Kafka protocol specifics, and serialization/deserialization of stream data.

Related Classes/Methods:

pathway.io.postgres

Provides connectivity to PostgreSQL databases, handling data ingestion from and egress to tables, managing SQL protocol specifics, authentication, error handling, and data type mapping between Python and PostgreSQL.

Related Classes/Methods:

pathway.io.http

Enables data exchange over HTTP/HTTPS, managing requests and responses for data ingestion (e.g., from REST APIs) and egress, handling HTTP protocol specifics, authentication, and serialization/deserialization of data payloads (e.g., JSON, XML).

Related Classes/Methods: