validatex.core package

Submodules

validatex.core.expectation module

Expectation base classes and registry.

This module defines the base Expectation class and the global registry that maps expectation type names to their implementation classes.

class validatex.core.expectation.Expectation(column: str | None = None, kwargs: Dict[str, ~typing.Any]=<factory>, meta: Dict[str, ~typing.Any]=<factory>)[source]

Bases: ABC

Abstract base class for all expectations.

Subclasses must:

Set the class attribute expectation_type (a unique string id).
Implement _validate_pandas() and/or _validate_spark().

column: str | None = None

expectation_type: str = 'base_expectation'

classmethod from_dict(d: Dict[str, Any]) → Expectation[source]: Deserialize from a dictionary.

kwargs: Dict[str, Any]

meta: Dict[str, Any]

to_dict() → Dict[str, Any][source]: Serialize to a plain dictionary (for YAML / JSON configs).

validate(data: Any, engine: str = 'pandas') → ExpectationResult[source]

Run this expectation against data using the specified engine.

Parameters:

data (Any) – The dataset (pd.DataFrame or pyspark.sql.DataFrame).
engine (str) – "pandas" or "spark".

Return type:

ExpectationResult

validatex.core.expectation.get_expectation_class(name: str) → Type[Expectation][source]: Look up an expectation class by its registered type name.

validatex.core.expectation.list_expectations() → List[str][source]: Return a sorted list of all registered expectation type names.

validatex.core.expectation.register_expectation(cls: Type[Expectation]) → Type[Expectation][source]: Decorator that registers an expectation class by its type_name.

validatex.core.result module

Validation result data models.

Every expectation run produces an ExpectationResult. A full validation run aggregates them into a ValidationResult.

class validatex.core.result.ColumnHealthSummary(column: str, checks: int = 0, passed: int = 0, failed: int = 0, errors: int = 0, null_count: int | None = None, null_percent: float | None = None, unique_count: int | None = None, unique_percent: float | None = None, total_rows: int | None = None)[source]

Bases: object

Aggregated health metrics for a single column.

checks: int = 0

column: str

errors: int = 0

failed: int = 0

property health_score: float

null_count: int | None = None

null_percent: float | None = None

passed: int = 0

to_dict() → Dict[str, Any][source]

total_rows: int | None = None

unique_count: int | None = None

unique_percent: float | None = None

class validatex.core.result.ExpectationResult(expectation_type: str, success: bool, column: str | None = None, observed_value: Any = None, element_count: int = 0, unexpected_count: int = 0, unexpected_percent: float = 0.0, unexpected_values: List[Any] = <factory>, details: Dict[str, ~typing.Any]=<factory>, exception_info: str | None = None, meta: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Result of a single expectation evaluation.

column: str | None = None

details: Dict[str, Any]

element_count: int = 0

exception_info: str | None = None

expectation_type: str

property human_observed: str

Return a human-readable string for the observed value.

Converts raw dicts / technical strings into executive-friendly text.

meta: Dict[str, Any]

observed_value: Any = None

property severity: str: Return severity level for this expectation.

property severity_icon: str

property status: str

property status_icon: str

success: bool

to_dict() → Dict[str, Any][source]

unexpected_count: int = 0

unexpected_percent: float = 0.0

unexpected_values: List[Any]

class validatex.core.result.ValidationResult(suite_name: str, results: List[ExpectationResult] = <factory>, run_time: datetime | None = None, run_duration_seconds: float = 0.0, data_source: str | None = None, engine: str = 'pandas', statistics: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Aggregate result of running an entire expectation suite.

column_health() → List[ColumnHealthSummary][source]

Aggregate expectation results by column.

Extracts null % and unique % from specific expectation types when present.

compute_quality_score() → float[source]

Compute a weighted data quality score (0–100).

Severity weights:

Critical: ×3
Warning : ×2
Info : ×1

Score = 100 × (weighted_passed / weighted_total)

compute_statistics() → Dict[str, Any][source]: Compute summary statistics and store them.

data_source: str | None = None

engine: str = 'pandas'

property errored_expectations: int

property failed_expectations: int

results: List[ExpectationResult]

run_duration_seconds: float = 0.0

run_time: datetime | None = None

statistics: Dict[str, Any]

property success: bool: True only if every expectation passed.

property success_percent: float

property successful_expectations: int

suite_name: str

summary() → str[source]: Return a human-readable summary string.

to_dict() → Dict[str, Any][source]

to_html(filepath: str) → None[source]: Generate a rich HTML report and write to filepath.

to_json(indent: int = 2) → str[source]: Serialize the full result to a JSON string.

to_json_file(filepath: str) → None[source]: Write the validation result to a JSON file.

property total_expectations: int

validatex.core.result.get_severity(expectation_type: str, meta: Dict | None = None) → str[source]: Return severity for an expectation type (user meta overrides default).

validatex.core.result.to_native(value: Any) → Any[source]

Convert numpy / pandas scalar types to native Python types.

Professional tools NEVER leak internal types like np.int64(20).

validatex.core.suite module

Expectation Suite — a named, ordered collection of expectations.

Suites can be built programmatically or loaded from YAML / JSON configs.

class validatex.core.suite.ExpectationSuite(name: str, expectations: List[Expectation] = <factory>, meta: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

A named collection of expectations.

Examples

>>> suite = ExpectationSuite("user_data_quality")
>>> suite.add("expect_column_to_not_be_null", column="user_id")
>>> suite.add("expect_column_values_to_be_between",
...           column="age", min_value=0, max_value=150)

add(expectation_type: str, column: str | None = None, meta: Dict[str, Any] | None = None, **kwargs: Any) → ExpectationSuite[source]

Add an expectation to this suite.

Parameters:

expectation_type (str) – The registered name of the expectation (e.g. "expect_column_to_not_be_null").
column (str, optional) – Target column name.
meta (dict, optional) – Arbitrary metadata to attach.
**kwargs – Additional arguments forwarded to the expectation (e.g. min_value, regex).

Returns:

self for fluent chaining.

Return type:

ExpectationSuite

add_expectation(expectation: Expectation) → ExpectationSuite[source]: Add a pre-built Expectation instance.

clear() → ExpectationSuite[source]: Remove all expectations.

expectations: List[Expectation]

classmethod from_dict(data: Dict[str, Any]) → ExpectationSuite[source]: Create a suite from a plain dictionary.

classmethod load(filepath: str) → ExpectationSuite[source]: Load from a YAML or JSON file.

meta: Dict[str, Any]

name: str

remove(index: int) → ExpectationSuite[source]: Remove an expectation by index.

save(filepath: str) → None[source]: Save to YAML or JSON based on file extension.

to_dict() → Dict[str, Any][source]

to_json(indent: int = 2) → str[source]

to_yaml() → str[source]

validatex.core.validator module

Validator — orchestrates expectation suite execution against a dataset.

The validate() convenience function is the primary public entry point.

class validatex.core.validator.Validator(suite: ExpectationSuite, engine: str = 'pandas')[source]

Bases: object

Runs an ExpectationSuite against a dataset.

Parameters:

suite (ExpectationSuite) – The suite of expectations to evaluate.
engine (str) – "pandas" or "spark".

run(data: Any, data_source: str | None = None) → ValidationResult[source]

Execute every expectation in the suite against data.

Parameters:

data (pd.DataFrame | pyspark.sql.DataFrame) – The dataset to validate.
data_source (str, optional) – A label describing where the data came from.

Return type:

ValidationResult

validatex.core.validator.validate(data: Any, suite: ExpectationSuite, engine: str = 'pandas', data_source: str | None = None) → ValidationResult[source]

Convenience function to validate data against a suite.

Parameters:

data (pd.DataFrame | pyspark.sql.DataFrame)
suite (ExpectationSuite)
engine (str) – "pandas" or "spark".
data_source (str, optional)

Return type:

ValidationResult

Module contents

Core module for ValidateX - contains the fundamental building blocks.

class validatex.core.Expectation(column: str | None = None, kwargs: Dict[str, ~typing.Any]=<factory>, meta: Dict[str, ~typing.Any]=<factory>)[source]

Bases: ABC

Abstract base class for all expectations.

Subclasses must:

Set the class attribute expectation_type (a unique string id).
Implement _validate_pandas() and/or _validate_spark().

column: str | None = None

expectation_type: str = 'base_expectation'

classmethod from_dict(d: Dict[str, Any]) → Expectation[source]: Deserialize from a dictionary.

kwargs: Dict[str, Any]

meta: Dict[str, Any]

to_dict() → Dict[str, Any][source]: Serialize to a plain dictionary (for YAML / JSON configs).

validate(data: Any, engine: str = 'pandas') → ExpectationResult[source]

Run this expectation against data using the specified engine.

Parameters:

data (Any) – The dataset (pd.DataFrame or pyspark.sql.DataFrame).
engine (str) – "pandas" or "spark".

Return type:

ExpectationResult

class validatex.core.ExpectationResult(expectation_type: str, success: bool, column: str | None = None, observed_value: Any = None, element_count: int = 0, unexpected_count: int = 0, unexpected_percent: float = 0.0, unexpected_values: List[Any] = <factory>, details: Dict[str, ~typing.Any]=<factory>, exception_info: str | None = None, meta: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Result of a single expectation evaluation.

column: str | None = None

details: Dict[str, Any]

element_count: int = 0

exception_info: str | None = None

expectation_type: str

property human_observed: str

Return a human-readable string for the observed value.

Converts raw dicts / technical strings into executive-friendly text.

meta: Dict[str, Any]

observed_value: Any = None

property severity: str: Return severity level for this expectation.

property severity_icon: str

property status: str

property status_icon: str

success: bool

to_dict() → Dict[str, Any][source]

unexpected_count: int = 0

unexpected_percent: float = 0.0

unexpected_values: List[Any]

class validatex.core.ExpectationSuite(name: str, expectations: List[Expectation] = <factory>, meta: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

A named collection of expectations.

Examples

>>> suite = ExpectationSuite("user_data_quality")
>>> suite.add("expect_column_to_not_be_null", column="user_id")
>>> suite.add("expect_column_values_to_be_between",
...           column="age", min_value=0, max_value=150)

add(expectation_type: str, column: str | None = None, meta: Dict[str, Any] | None = None, **kwargs: Any) → ExpectationSuite[source]

Add an expectation to this suite.

Parameters:

expectation_type (str) – The registered name of the expectation (e.g. "expect_column_to_not_be_null").
column (str, optional) – Target column name.
meta (dict, optional) – Arbitrary metadata to attach.
**kwargs – Additional arguments forwarded to the expectation (e.g. min_value, regex).

Returns:

self for fluent chaining.

Return type:

ExpectationSuite

add_expectation(expectation: Expectation) → ExpectationSuite[source]: Add a pre-built Expectation instance.

clear() → ExpectationSuite[source]: Remove all expectations.

expectations: List[Expectation]

classmethod from_dict(data: Dict[str, Any]) → ExpectationSuite[source]: Create a suite from a plain dictionary.

classmethod load(filepath: str) → ExpectationSuite[source]: Load from a YAML or JSON file.

meta: Dict[str, Any]

name: str

remove(index: int) → ExpectationSuite[source]: Remove an expectation by index.

save(filepath: str) → None[source]: Save to YAML or JSON based on file extension.

to_dict() → Dict[str, Any][source]

to_json(indent: int = 2) → str[source]

to_yaml() → str[source]

class validatex.core.ValidationResult(suite_name: str, results: List[ExpectationResult] = <factory>, run_time: datetime | None = None, run_duration_seconds: float = 0.0, data_source: str | None = None, engine: str = 'pandas', statistics: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Aggregate result of running an entire expectation suite.

column_health() → List[ColumnHealthSummary][source]

Aggregate expectation results by column.

Extracts null % and unique % from specific expectation types when present.

compute_quality_score() → float[source]

Compute a weighted data quality score (0–100).

Severity weights:

Critical: ×3
Warning : ×2
Info : ×1

Score = 100 × (weighted_passed / weighted_total)

compute_statistics() → Dict[str, Any][source]: Compute summary statistics and store them.

data_source: str | None = None

engine: str = 'pandas'

property errored_expectations: int

property failed_expectations: int

results: List[ExpectationResult]

run_duration_seconds: float = 0.0

run_time: datetime | None = None

statistics: Dict[str, Any]

property success: bool: True only if every expectation passed.

property success_percent: float

property successful_expectations: int

suite_name: str

summary() → str[source]: Return a human-readable summary string.

to_dict() → Dict[str, Any][source]

to_html(filepath: str) → None[source]: Generate a rich HTML report and write to filepath.

to_json(indent: int = 2) → str[source]: Serialize the full result to a JSON string.

to_json_file(filepath: str) → None[source]: Write the validation result to a JSON file.

property total_expectations: int

class validatex.core.Validator(suite: ExpectationSuite, engine: str = 'pandas')[source]

Bases: object

Runs an ExpectationSuite against a dataset.

Parameters:

suite (ExpectationSuite) – The suite of expectations to evaluate.
engine (str) – "pandas" or "spark".

run(data: Any, data_source: str | None = None) → ValidationResult[source]

Execute every expectation in the suite against data.

Parameters:

data (pd.DataFrame | pyspark.sql.DataFrame) – The dataset to validate.
data_source (str, optional) – A label describing where the data came from.

Return type:

ValidationResult