Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 37 additions & 1 deletion docs/features/profile_values.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,43 @@
# Accessing profile files

## Json output structure
ydata-profiling allows you to access and export the computed profile data
programmatically, beyond just the HTML report.

## JSON output structure

You can export the full profile as a JSON file:
```python
import pandas as pd
from ydata_profiling import ProfileReport

df = pd.read_csv("your_data.csv")
profile = ProfileReport(df, title="My Report")
profile.to_file("report.json")
```

The JSON output contains all computed statistics organized by variable name,
including type, missing values, descriptive statistics, and correlations.

## Univariate variables statistics through description_set

You can access per-variable statistics directly in Python via `description_set`:
```python
description = profile.get_description()
# Access stats for a specific variable
print(description.variables["your_column_name"])
```

This returns a dictionary of computed metrics for each variable — type,
missing count, distinct count, mean, std, quantiles, and more.

## Correlation matrices through description_set

Correlation matrices computed during profiling are also accessible:
```python
description = profile.get_description()
# Pearson correlation matrix
print(description.correlations["pearson"])
```

Available correlation keys depend on your configuration but typically include
`pearson`, `spearman`, `kendall`, and `cramers`.
16 changes: 15 additions & 1 deletion docs/getting-started/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,21 @@ This section provides a comprehensive overview of individual variables within a
as it automatically calculated detailed statistics, visualizations, and insights for each variable in the dataset. It offers information such as data type, missing values, unique values, basic descriptive statistics
, histogram plots, and distribution plots. This allows data analysts and scientists to quickly understand the characteristics of each variable, identify potential data quality issues, and gain initial insights into the data's distribution and variability.

For more details about the different metrics and visualizations check the Univariate section details page.

**Univariate analysis** examines each variable individually. For every column in your dataset, ydata-profiling automatically computes:

- **Descriptive statistics** — count, mean, median, standard deviation, min/max
- **Missing values** — count and percentage of null entries
- **Unique values** — number and percentage of distinct values
- **Distribution plots** — histogram and density curve
- **Data type** — inferred type (Numerical, Categorical, Date, etc.)

**Multivariate analysis** examines relationships between variables. ydata-profiling computes:

- **Correlations** — Pearson, Spearman, Kendall, and Cramér's V matrices
- **Interactions** — pairwise scatter plots between numerical variables
- **Missing data patterns** — which variables tend to be missing together
- **Duplicate rows** — detection of identical records across the dataset

## Multivariate profiling

Expand Down
4 changes: 2 additions & 2 deletions src/ydata_profiling/profile_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

with warnings.catch_warnings():
warnings.simplefilter("ignore")
import pkg_resources
from importlib.metadata import version

if not is_pyspark_installed():
from typing import TypeVar
Expand Down Expand Up @@ -359,7 +359,7 @@ def to_file(self, output_file: Union[str, Path], silent: bool = True) -> None:
"""
with warnings.catch_warnings():
warnings.simplefilter("ignore")
pillow_version = pkg_resources.get_distribution("Pillow").version
pillow_version = version("Pillow")
version_tuple = tuple(map(int, pillow_version.split(".")))
if version_tuple < (9, 5, 0):
warnings.warn(
Expand Down