validatex package
Subpackages
- validatex.cli package
- validatex.config package
- Submodules
- validatex.config.loader module
CheckpointConfigCheckpointConfig.nameCheckpointConfig.suite_pathCheckpointConfig.data_sourceCheckpointConfig.engineCheckpointConfig.reportCheckpointConfig.data_sourceCheckpointConfig.engineCheckpointConfig.load_data()CheckpointConfig.load_suite()CheckpointConfig.nameCheckpointConfig.reportCheckpointConfig.suite_path
load_checkpoint()
- Module contents
- validatex.core package
- Submodules
- validatex.core.expectation module
- validatex.core.result module
ColumnHealthSummaryColumnHealthSummary.checksColumnHealthSummary.columnColumnHealthSummary.errorsColumnHealthSummary.failedColumnHealthSummary.health_scoreColumnHealthSummary.null_countColumnHealthSummary.null_percentColumnHealthSummary.passedColumnHealthSummary.to_dict()ColumnHealthSummary.total_rowsColumnHealthSummary.unique_countColumnHealthSummary.unique_percent
ExpectationResultExpectationResult.columnExpectationResult.detailsExpectationResult.element_countExpectationResult.exception_infoExpectationResult.expectation_typeExpectationResult.human_observedExpectationResult.metaExpectationResult.observed_valueExpectationResult.severityExpectationResult.severity_iconExpectationResult.statusExpectationResult.status_iconExpectationResult.successExpectationResult.to_dict()ExpectationResult.unexpected_countExpectationResult.unexpected_percentExpectationResult.unexpected_values
ValidationResultValidationResult.column_health()ValidationResult.compute_quality_score()ValidationResult.compute_statistics()ValidationResult.data_sourceValidationResult.engineValidationResult.errored_expectationsValidationResult.failed_expectationsValidationResult.resultsValidationResult.run_duration_secondsValidationResult.run_timeValidationResult.statisticsValidationResult.successValidationResult.success_percentValidationResult.successful_expectationsValidationResult.suite_nameValidationResult.summary()ValidationResult.to_dict()ValidationResult.to_html()ValidationResult.to_json()ValidationResult.to_json_file()ValidationResult.total_expectations
get_severity()to_native()
- validatex.core.suite module
ExpectationSuiteExpectationSuite.add()ExpectationSuite.add_expectation()ExpectationSuite.clear()ExpectationSuite.expectationsExpectationSuite.from_dict()ExpectationSuite.load()ExpectationSuite.metaExpectationSuite.nameExpectationSuite.remove()ExpectationSuite.save()ExpectationSuite.to_dict()ExpectationSuite.to_json()ExpectationSuite.to_yaml()
- validatex.core.validator module
- Module contents
ExpectationExpectationResultExpectationResult.columnExpectationResult.detailsExpectationResult.element_countExpectationResult.exception_infoExpectationResult.expectation_typeExpectationResult.human_observedExpectationResult.metaExpectationResult.observed_valueExpectationResult.severityExpectationResult.severity_iconExpectationResult.statusExpectationResult.status_iconExpectationResult.successExpectationResult.to_dict()ExpectationResult.unexpected_countExpectationResult.unexpected_percentExpectationResult.unexpected_values
ExpectationSuiteExpectationSuite.add()ExpectationSuite.add_expectation()ExpectationSuite.clear()ExpectationSuite.expectationsExpectationSuite.from_dict()ExpectationSuite.load()ExpectationSuite.metaExpectationSuite.nameExpectationSuite.remove()ExpectationSuite.save()ExpectationSuite.to_dict()ExpectationSuite.to_json()ExpectationSuite.to_yaml()
ValidationResultValidationResult.column_health()ValidationResult.compute_quality_score()ValidationResult.compute_statistics()ValidationResult.data_sourceValidationResult.engineValidationResult.errored_expectationsValidationResult.failed_expectationsValidationResult.resultsValidationResult.run_duration_secondsValidationResult.run_timeValidationResult.statisticsValidationResult.successValidationResult.success_percentValidationResult.successful_expectationsValidationResult.suite_nameValidationResult.summary()ValidationResult.to_dict()ValidationResult.to_html()ValidationResult.to_json()ValidationResult.to_json_file()ValidationResult.total_expectations
Validator
- validatex.datasources package
- Submodules
- validatex.datasources.base_source module
- validatex.datasources.csv_source module
- validatex.datasources.database_source module
- validatex.datasources.dataframe_source module
- validatex.datasources.parquet_source module
- Module contents
- validatex.expectations package
- Submodules
- validatex.expectations.aggregate_expectations module
- validatex.expectations.column_expectations module
ExpectColumnDistinctValuesToBeInSetExpectColumnMaxToBeBetweenExpectColumnMeanToBeBetweenExpectColumnMinToBeBetweenExpectColumnProportionOfUniqueValuesToBeBetweenExpectColumnStdevToBeBetweenExpectColumnToExistExpectColumnToNotBeNullExpectColumnValueLengthsToBeBetweenExpectColumnValuesToBeBetweenExpectColumnValuesToBeDateutilParseableExpectColumnValuesToBeInSetExpectColumnValuesToBeOfTypeExpectColumnValuesToBeUniqueExpectColumnValuesToMatchRegexExpectColumnValuesToNotBeInSet
- validatex.expectations.table_expectations module
- Module contents
- validatex.profiler package
- Submodules
- validatex.profiler.profiler module
ColumnProfileColumnProfile.dtypeColumnProfile.max_lengthColumnProfile.max_valueColumnProfile.mean_valueColumnProfile.median_valueColumnProfile.min_lengthColumnProfile.min_valueColumnProfile.nameColumnProfile.null_countColumnProfile.null_percentColumnProfile.sample_valuesColumnProfile.std_valueColumnProfile.to_dict()ColumnProfile.top_valuesColumnProfile.total_countColumnProfile.unique_countColumnProfile.unique_percent
DataProfileDataProfiler
- Module contents
- validatex.reporting package
Module contents
ValidateX - A powerful data quality validation framework.
ValidateX provides a comprehensive suite of tools for validating, profiling, and monitoring data quality across Pandas and PySpark DataFrames.
- Usage:
>>> import validatex as vx >>> suite = vx.ExpectationSuite("my_suite") >>> suite.add("expect_column_to_not_be_null", column="user_id") >>> suite.add("expect_column_values_to_be_between", column="age", min_value=0, max_value=150) >>> result = vx.validate(df, suite) >>> result.to_html("report.html")
- class validatex.ColumnHealthSummary(column: str, checks: int = 0, passed: int = 0, failed: int = 0, errors: int = 0, null_count: int | None = None, null_percent: float | None = None, unique_count: int | None = None, unique_percent: float | None = None, total_rows: int | None = None)[source]
Bases:
objectAggregated health metrics for a single column.
- checks: int = 0
- column: str
- errors: int = 0
- failed: int = 0
- property health_score: float
- null_count: int | None = None
- null_percent: float | None = None
- passed: int = 0
- total_rows: int | None = None
- unique_count: int | None = None
- unique_percent: float | None = None
- class validatex.DataProfiler[source]
Bases:
objectAnalyse a Pandas DataFrame and produce a
DataProfile.Usage
>>> profiler = DataProfiler() >>> profile = profiler.profile(df) >>> print(profile.summary()) >>> suite = profiler.suggest_expectations(df, suite_name="auto_suite")
- profile(df: DataFrame) DataProfile[source]
Profile every column in df.
- Return type:
- suggest_expectations(df: DataFrame, suite_name: str = 'auto_generated_suite') ExpectationSuite[source]
Auto-generate an
ExpectationSuitebased on the data profile.Heuristics
If a column has zero nulls →
expect_column_to_not_be_nullIf a column is fully unique →
expect_column_values_to_be_uniqueFor numeric columns →
expect_column_values_to_be_betweenwith observed min/max.For string columns with few distinct values →
expect_column_values_to_be_in_setFor string columns →
expect_column_value_lengths_to_be_between
- class validatex.DriftDetector(psi_threshold: float = 0.2, bins: int = 10)[source]
Bases:
objectDetects data drift between a baseline and a current Pandas DataFrame. Calculates Population Stability Index (PSI) to detect statistical shifts in distributions.
- compare(df_base: DataFrame, df_current: DataFrame) DriftReport[source]
Run schema and statistical drift comparison between two DataFrames.
- class validatex.DriftReport(schema_added_columns: List[str], schema_removed_columns: List[str], schema_type_changes: Dict[str, Dict[str, str]], column_drifts: Dict[str, ColumnDriftResult])[source]
Bases:
objectRepresents a full data drift comparison report.
- column_drifts: Dict[str, ColumnDriftResult]
- schema_added_columns: List[str]
- schema_removed_columns: List[str]
- schema_type_changes: Dict[str, Dict[str, str]]
- class validatex.Expectation(column: str | None = None, kwargs: Dict[str, ~typing.Any]=<factory>, meta: Dict[str, ~typing.Any]=<factory>)[source]
Bases:
ABCAbstract base class for all expectations.
- Subclasses must:
Set the class attribute
expectation_type(a unique string id).Implement
_validate_pandas()and/or_validate_spark().
- column: str | None = None
- expectation_type: str = 'base_expectation'
- classmethod from_dict(d: Dict[str, Any]) Expectation[source]
Deserialize from a dictionary.
- kwargs: Dict[str, Any]
- meta: Dict[str, Any]
- validate(data: Any, engine: str = 'pandas') ExpectationResult[source]
Run this expectation against data using the specified engine.
- Parameters:
data (Any) – The dataset (pd.DataFrame or pyspark.sql.DataFrame).
engine (str) –
"pandas"or"spark".
- Return type:
- class validatex.ExpectationResult(expectation_type: str, success: bool, column: str | None = None, observed_value: Any = None, element_count: int = 0, unexpected_count: int = 0, unexpected_percent: float = 0.0, unexpected_values: List[Any] = <factory>, details: Dict[str, ~typing.Any]=<factory>, exception_info: str | None = None, meta: Dict[str, ~typing.Any]=<factory>)[source]
Bases:
objectResult of a single expectation evaluation.
- column: str | None = None
- details: Dict[str, Any]
- element_count: int = 0
- exception_info: str | None = None
- expectation_type: str
- property human_observed: str
Return a human-readable string for the observed value.
Converts raw dicts / technical strings into executive-friendly text.
- meta: Dict[str, Any]
- observed_value: Any = None
- property severity: str
Return severity level for this expectation.
- property severity_icon: str
- property status: str
- property status_icon: str
- success: bool
- unexpected_count: int = 0
- unexpected_percent: float = 0.0
- unexpected_values: List[Any]
- class validatex.ExpectationSuite(name: str, expectations: List[Expectation] = <factory>, meta: Dict[str, ~typing.Any]=<factory>)[source]
Bases:
objectA named collection of expectations.
Examples
>>> suite = ExpectationSuite("user_data_quality") >>> suite.add("expect_column_to_not_be_null", column="user_id") >>> suite.add("expect_column_values_to_be_between", ... column="age", min_value=0, max_value=150)
- add(expectation_type: str, column: str | None = None, meta: Dict[str, Any] | None = None, **kwargs: Any) ExpectationSuite[source]
Add an expectation to this suite.
- Parameters:
expectation_type (str) – The registered name of the expectation (e.g.
"expect_column_to_not_be_null").column (str, optional) – Target column name.
meta (dict, optional) – Arbitrary metadata to attach.
**kwargs – Additional arguments forwarded to the expectation (e.g.
min_value,regex).
- Returns:
selffor fluent chaining.- Return type:
- add_expectation(expectation: Expectation) ExpectationSuite[source]
Add a pre-built Expectation instance.
- clear() ExpectationSuite[source]
Remove all expectations.
- expectations: List[Expectation]
- classmethod from_dict(data: Dict[str, Any]) ExpectationSuite[source]
Create a suite from a plain dictionary.
- classmethod load(filepath: str) ExpectationSuite[source]
Load from a YAML or JSON file.
- meta: Dict[str, Any]
- name: str
- remove(index: int) ExpectationSuite[source]
Remove an expectation by index.
- class validatex.ValidationResult(suite_name: str, results: List[ExpectationResult] = <factory>, run_time: datetime | None = None, run_duration_seconds: float = 0.0, data_source: str | None = None, engine: str = 'pandas', statistics: Dict[str, ~typing.Any]=<factory>)[source]
Bases:
objectAggregate result of running an entire expectation suite.
- column_health() List[ColumnHealthSummary][source]
Aggregate expectation results by column.
Extracts null % and unique % from specific expectation types when present.
- compute_quality_score() float[source]
Compute a weighted data quality score (0–100).
- Severity weights:
Critical: ×3
Warning : ×2
Info : ×1
Score = 100 × (weighted_passed / weighted_total)
- data_source: str | None = None
- engine: str = 'pandas'
- property errored_expectations: int
- property failed_expectations: int
- results: List[ExpectationResult]
- run_duration_seconds: float = 0.0
- run_time: datetime | None = None
- statistics: Dict[str, Any]
- property success: bool
True only if every expectation passed.
- property success_percent: float
- property successful_expectations: int
- suite_name: str
- property total_expectations: int
- class validatex.Validator(suite: ExpectationSuite, engine: str = 'pandas')[source]
Bases:
objectRuns an
ExpectationSuiteagainst a dataset.- Parameters:
suite (ExpectationSuite) – The suite of expectations to evaluate.
engine (str) –
"pandas"or"spark".
- run(data: Any, data_source: str | None = None) ValidationResult[source]
Execute every expectation in the suite against data.
- Parameters:
data (pd.DataFrame | pyspark.sql.DataFrame) – The dataset to validate.
data_source (str, optional) – A label describing where the data came from.
- Return type:
- validatex.validate(data: Any, suite: ExpectationSuite, engine: str = 'pandas', data_source: str | None = None) ValidationResult[source]
Convenience function to validate data against a suite.
- Parameters:
data (pd.DataFrame | pyspark.sql.DataFrame)
suite (ExpectationSuite)
engine (str) –
"pandas"or"spark".data_source (str, optional)
- Return type: