validatex.profiler package
Submodules
validatex.profiler.profiler module
Data Profiler — analyse a dataset and auto-suggest expectations.
The profiler computes summary statistics for every column and proposes a reasonable set of expectations that can serve as a starting point for a quality suite.
- class validatex.profiler.profiler.ColumnProfile(name: str, dtype: str = '', total_count: int = 0, null_count: int = 0, null_percent: float = 0.0, unique_count: int = 0, unique_percent: float = 0.0, min_value: Any = None, max_value: Any = None, mean_value: float | None = None, std_value: float | None = None, median_value: float | None = None, min_length: int | None = None, max_length: int | None = None, top_values: Dict[str, ~typing.Any]]=<factory>, sample_values: List[Any] = <factory>)[source]
Bases:
objectStatistical profile of a single column.
- dtype: str = ''
- max_length: int | None = None
- max_value: Any = None
- mean_value: float | None = None
- median_value: float | None = None
- min_length: int | None = None
- min_value: Any = None
- name: str
- null_count: int = 0
- null_percent: float = 0.0
- sample_values: List[Any]
- std_value: float | None = None
- top_values: List[Dict[str, Any]]
- total_count: int = 0
- unique_count: int = 0
- unique_percent: float = 0.0
- class validatex.profiler.profiler.DataProfile(row_count: int = 0, column_count: int = 0, columns: List[ColumnProfile] = <factory>)[source]
Bases:
objectFull profile of a DataFrame.
- column_count: int = 0
- columns: List[ColumnProfile]
- row_count: int = 0
- class validatex.profiler.profiler.DataProfiler[source]
Bases:
objectAnalyse a Pandas DataFrame and produce a
DataProfile.Usage
>>> profiler = DataProfiler() >>> profile = profiler.profile(df) >>> print(profile.summary()) >>> suite = profiler.suggest_expectations(df, suite_name="auto_suite")
- profile(df: DataFrame) DataProfile[source]
Profile every column in df.
- Return type:
- suggest_expectations(df: DataFrame, suite_name: str = 'auto_generated_suite') ExpectationSuite[source]
Auto-generate an
ExpectationSuitebased on the data profile.Heuristics
If a column has zero nulls →
expect_column_to_not_be_nullIf a column is fully unique →
expect_column_values_to_be_uniqueFor numeric columns →
expect_column_values_to_be_betweenwith observed min/max.For string columns with few distinct values →
expect_column_values_to_be_in_setFor string columns →
expect_column_value_lengths_to_be_between
Module contents
Profiler module — automatic data profiling and expectation suggestion.