validatex.profiler package

Submodules

validatex.profiler.profiler module

Data Profiler — analyse a dataset and auto-suggest expectations.

The profiler computes summary statistics for every column and proposes a reasonable set of expectations that can serve as a starting point for a quality suite.

class validatex.profiler.profiler.ColumnProfile(name: str, dtype: str = '', total_count: int = 0, null_count: int = 0, null_percent: float = 0.0, unique_count: int = 0, unique_percent: float = 0.0, min_value: Any = None, max_value: Any = None, mean_value: float | None = None, std_value: float | None = None, median_value: float | None = None, min_length: int | None = None, max_length: int | None = None, top_values: Dict[str, ~typing.Any]]=<factory>, sample_values: List[Any] = <factory>)[source]

Bases: object

Statistical profile of a single column.

dtype: str = ''
max_length: int | None = None
max_value: Any = None
mean_value: float | None = None
median_value: float | None = None
min_length: int | None = None
min_value: Any = None
name: str
null_count: int = 0
null_percent: float = 0.0
sample_values: List[Any]
std_value: float | None = None
to_dict() Dict[str, Any][source]
top_values: List[Dict[str, Any]]
total_count: int = 0
unique_count: int = 0
unique_percent: float = 0.0
class validatex.profiler.profiler.DataProfile(row_count: int = 0, column_count: int = 0, columns: List[ColumnProfile] = <factory>)[source]

Bases: object

Full profile of a DataFrame.

column_count: int = 0
columns: List[ColumnProfile]
row_count: int = 0
summary() str[source]

Return a human-readable summary.

to_dict() Dict[str, Any][source]
class validatex.profiler.profiler.DataProfiler[source]

Bases: object

Analyse a Pandas DataFrame and produce a DataProfile.

Usage

>>> profiler = DataProfiler()
>>> profile = profiler.profile(df)
>>> print(profile.summary())
>>> suite = profiler.suggest_expectations(df, suite_name="auto_suite")
profile(df: DataFrame) DataProfile[source]

Profile every column in df.

Return type:

DataProfile

suggest_expectations(df: DataFrame, suite_name: str = 'auto_generated_suite') ExpectationSuite[source]

Auto-generate an ExpectationSuite based on the data profile.

Heuristics

  • If a column has zero nulls → expect_column_to_not_be_null

  • If a column is fully unique → expect_column_values_to_be_unique

  • For numeric columns → expect_column_values_to_be_between with observed min/max.

  • For string columns with few distinct values → expect_column_values_to_be_in_set

  • For string columns → expect_column_value_lengths_to_be_between

Module contents

Profiler module — automatic data profiling and expectation suggestion.