:::{warning}
The Python API surface is not yet complete and is subject to change. Many operations available in
the Rust API are not yet exposed. See the {doc}/api/python/index for the full reference.
:::
```bash
pip install vortex-data
```
```bash
uv add vortex-data
```
{func}~vortex.array constructs a Vortex array from Python values:
>>> import vortex as vx
>>> arr = vx.array([1, 2, 3, 4])
>>> arr.dtype
int(64, nullable=False)
>>> len(arr)
4
Python's {obj}None represents a missing value and makes the dtype nullable:
>>> arr = vx.array([1, 2, None, 4])
>>> arr.dtype
int(64, nullable=True)
A list of {class}dict produces a struct array. Missing values may appear at any level:
>>> arr = vx.array([
... {'name': 'Joseph', 'age': 25},
... {'name': None, 'age': 31},
... None,
... ])
>>> arr.dtype
struct({"age": int(64, nullable=True), "name": utf8(nullable=True)}, nullable=True)
{func}~vortex.array also accepts {class}pyarrow.Array, {class}pyarrow.Table,
{class}pandas.DataFrame, and {class}range objects.
DType factory functions are available at the top level of the vortex module:
>>> vx.int_(32)
int(32, nullable=False)
>>> vx.utf8(nullable=True)
utf8(nullable=True)
>>> vx.list_(vx.float_(64))
list(float(64, nullable=False), nullable=False)
>>> vx.struct({'x': vx.int_(32), 'y': vx.int_(32)})
struct({"x": int(32, nullable=False), "y": int(32, nullable=False)}, nullable=False)
Available types: {func}~vortex.null, {func}~vortex.bool_,
{func}~vortex.int_, {func}~vortex.uint, {func}~vortex.float_,
{func}~vortex.decimal, {func}~vortex.utf8, {func}~vortex.binary,
{func}~vortex.struct, {func}~vortex.list_,
{func}~vortex.fixed_size_list, {func}~vortex.date,
{func}~vortex.time, {func}~vortex.timestamp.
>>> arr = vx.array([10, 20, 30, 40, 50])
>>> arr.scalar_at(0).as_py()
10
>>> arr.to_arrow_array().to_pylist()
[10, 20, 30, 40, 50]
>>> arr.slice(1, 3).to_arrow_array().to_pylist()
[20, 30]
>>> indices = vx.array([0, 2, 4])
>>> arr.take(indices).to_arrow_array().to_pylist()
[10, 30, 50]
>>> mask = vx.array([True, False, True, False, True])
>>> arr.filter(mask).to_arrow_array().to_pylist()
[10, 30, 50]
>>> other = vx.array([10, 25, 25, 45, 50])
>>> (arr > other).to_arrow_array().to_pylist()
[False, False, True, False, False]
The vortex.expr module provides expressions for filtering and projecting. Use vx.col or
vortex.expr.col to build the stable predicate DSL for pushdown:
>>> import vortex.expr as ve
>>> arr = vx.array([
... {'name': 'Alice', 'age': 30},
... {'name': 'Bob', 'age': 25},
... {'name': 'Carol', 'age': 35},
... ])
>>> expr = vx.col('age') > 28
>>> arr.apply(expr).to_arrow_array().to_pylist()
[True, False, True]
When a filter is used to read a file, PyVortex plans it against the file schema. Planning inserts the casts required by the Vortex expression engine, simplifies the expression, and validates that filters return Boolean values. You can run the same step directly:
>>> planned = ve.plan(vx.col('age') > 28, schema=arr.dtype.to_arrow_schema(), kind="filter")
>>> isinstance(planned, ve.Expr)
True
{func}~vortex.open lazily opens a Vortex file for reading:
>>> import pyarrow.parquet as pq
>>> vx.io.write(pq.read_table("_static/example.parquet"), 'example.vortex')
>>>
>>> f = vx.open('example.vortex')
>>> len(f)
1000
Use {meth}.VortexFile.to_table or {meth}.VortexFile.to_arrow to read Arrow data with optional
column projection, filtering, and limit:
>>> table = f.to_table(columns=['tip_amount'], limit=3)
>>> table.to_pydict()
{'tip_amount': [0.0, 5.1, 16.54]}
>>> filtered = f.to_table(columns=['tip_amount'], filter=vx.col('tip_amount') > 10)
>>> filtered.num_rows > 0
True
{class}.ArrayIterator streams batches of arrays from a scan or other source. It supports
iteration, collecting into a single array, and conversion to Arrow.
{meth}.ArrayIterator.read_all collects all batches into a single in-memory {class}.Array:
>>> arr = f.scan(['tip_amount'], limit=5).read_all()
>>> len(arr)
5
{meth}.ArrayIterator.to_arrow converts to a {class}pyarrow.RecordBatchReader for use with
Arrow-based tools:
>>> reader = f.scan(['tip_amount']).to_arrow()
>>> reader.schema
tip_amount: double
>>> table = reader.read_all()
>>> len(table)
1000
Arrays convert to other formats:
| Method | Result |
|---|---|
{meth}.Array.to_arrow_array |
{class}pyarrow.Array |
{meth}.Array.to_arrow_table |
{class}pyarrow.Table |
{meth}.Array.to_numpy |
{class}numpy.ndarray |
{meth}.Array.to_pandas |
{class}pandas.DataFrame |
{meth}.Array.to_pylist |
{class}list |