validatex.core package

Submodules

validatex.core.expectation module

Expectation base classes and registry.

This module defines the base Expectation class and the global registry that maps expectation type names to their implementation classes.

class validatex.core.expectation.Expectation(column: str | None = None, kwargs: Dict[str, ~typing.Any]=<factory>, meta: Dict[str, ~typing.Any]=<factory>)[source]

Bases: ABC

Abstract base class for all expectations.

Subclasses must:
  1. Set the class attribute expectation_type (a unique string id).

  2. Implement _validate_pandas() and/or _validate_spark().

column: str | None = None
expectation_type: str = 'base_expectation'
classmethod from_dict(d: Dict[str, Any]) Expectation[source]

Deserialize from a dictionary.

kwargs: Dict[str, Any]
meta: Dict[str, Any]
to_dict() Dict[str, Any][source]

Serialize to a plain dictionary (for YAML / JSON configs).

validate(data: Any, engine: str = 'pandas') ExpectationResult[source]

Run this expectation against data using the specified engine.

Parameters:
  • data (Any) – The dataset (pd.DataFrame or pyspark.sql.DataFrame).

  • engine (str) – "pandas" or "spark".

Return type:

ExpectationResult

validatex.core.expectation.get_expectation_class(name: str) Type[Expectation][source]

Look up an expectation class by its registered type name.

validatex.core.expectation.list_expectations() List[str][source]

Return a sorted list of all registered expectation type names.

validatex.core.expectation.register_expectation(cls: Type[Expectation]) Type[Expectation][source]

Decorator that registers an expectation class by its type_name.

validatex.core.result module

Validation result data models.

Every expectation run produces an ExpectationResult. A full validation run aggregates them into a ValidationResult.

class validatex.core.result.ColumnHealthSummary(column: str, checks: int = 0, passed: int = 0, failed: int = 0, errors: int = 0, null_count: int | None = None, null_percent: float | None = None, unique_count: int | None = None, unique_percent: float | None = None, total_rows: int | None = None)[source]

Bases: object

Aggregated health metrics for a single column.

checks: int = 0
column: str
errors: int = 0
failed: int = 0
property health_score: float
null_count: int | None = None
null_percent: float | None = None
passed: int = 0
to_dict() Dict[str, Any][source]
total_rows: int | None = None
unique_count: int | None = None
unique_percent: float | None = None
class validatex.core.result.ExpectationResult(expectation_type: str, success: bool, column: str | None = None, observed_value: Any = None, element_count: int = 0, unexpected_count: int = 0, unexpected_percent: float = 0.0, unexpected_values: List[Any] = <factory>, details: Dict[str, ~typing.Any]=<factory>, exception_info: str | None = None, meta: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Result of a single expectation evaluation.

column: str | None = None
details: Dict[str, Any]
element_count: int = 0
exception_info: str | None = None
expectation_type: str
property human_observed: str

Return a human-readable string for the observed value.

Converts raw dicts / technical strings into executive-friendly text.

meta: Dict[str, Any]
observed_value: Any = None
property severity: str

Return severity level for this expectation.

property severity_icon: str
property status: str
property status_icon: str
success: bool
to_dict() Dict[str, Any][source]
unexpected_count: int = 0
unexpected_percent: float = 0.0
unexpected_values: List[Any]
class validatex.core.result.ValidationResult(suite_name: str, results: List[ExpectationResult] = <factory>, run_time: datetime | None = None, run_duration_seconds: float = 0.0, data_source: str | None = None, engine: str = 'pandas', statistics: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Aggregate result of running an entire expectation suite.

column_health() List[ColumnHealthSummary][source]

Aggregate expectation results by column.

Extracts null % and unique % from specific expectation types when present.

compute_quality_score() float[source]

Compute a weighted data quality score (0–100).

Severity weights:
  • Critical: ×3

  • Warning : ×2

  • Info : ×1

Score = 100 × (weighted_passed / weighted_total)

compute_statistics() Dict[str, Any][source]

Compute summary statistics and store them.

data_source: str | None = None
engine: str = 'pandas'
property errored_expectations: int
property failed_expectations: int
results: List[ExpectationResult]
run_duration_seconds: float = 0.0
run_time: datetime | None = None
statistics: Dict[str, Any]
property success: bool

True only if every expectation passed.

property success_percent: float
property successful_expectations: int
suite_name: str
summary() str[source]

Return a human-readable summary string.

to_dict() Dict[str, Any][source]
to_html(filepath: str) None[source]

Generate a rich HTML report and write to filepath.

to_json(indent: int = 2) str[source]

Serialize the full result to a JSON string.

to_json_file(filepath: str) None[source]

Write the validation result to a JSON file.

property total_expectations: int
validatex.core.result.get_severity(expectation_type: str, meta: Dict | None = None) str[source]

Return severity for an expectation type (user meta overrides default).

validatex.core.result.to_native(value: Any) Any[source]

Convert numpy / pandas scalar types to native Python types.

Professional tools NEVER leak internal types like np.int64(20).

validatex.core.suite module

Expectation Suite — a named, ordered collection of expectations.

Suites can be built programmatically or loaded from YAML / JSON configs.

class validatex.core.suite.ExpectationSuite(name: str, expectations: List[Expectation] = <factory>, meta: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

A named collection of expectations.

Examples

>>> suite = ExpectationSuite("user_data_quality")
>>> suite.add("expect_column_to_not_be_null", column="user_id")
>>> suite.add("expect_column_values_to_be_between",
...           column="age", min_value=0, max_value=150)
add(expectation_type: str, column: str | None = None, meta: Dict[str, Any] | None = None, **kwargs: Any) ExpectationSuite[source]

Add an expectation to this suite.

Parameters:
  • expectation_type (str) – The registered name of the expectation (e.g. "expect_column_to_not_be_null").

  • column (str, optional) – Target column name.

  • meta (dict, optional) – Arbitrary metadata to attach.

  • **kwargs – Additional arguments forwarded to the expectation (e.g. min_value, regex).

Returns:

self for fluent chaining.

Return type:

ExpectationSuite

add_expectation(expectation: Expectation) ExpectationSuite[source]

Add a pre-built Expectation instance.

clear() ExpectationSuite[source]

Remove all expectations.

expectations: List[Expectation]
classmethod from_dict(data: Dict[str, Any]) ExpectationSuite[source]

Create a suite from a plain dictionary.

classmethod load(filepath: str) ExpectationSuite[source]

Load from a YAML or JSON file.

meta: Dict[str, Any]
name: str
remove(index: int) ExpectationSuite[source]

Remove an expectation by index.

save(filepath: str) None[source]

Save to YAML or JSON based on file extension.

to_dict() Dict[str, Any][source]
to_json(indent: int = 2) str[source]
to_yaml() str[source]

validatex.core.validator module

Validator — orchestrates expectation suite execution against a dataset.

The validate() convenience function is the primary public entry point.

class validatex.core.validator.Validator(suite: ExpectationSuite, engine: str = 'pandas')[source]

Bases: object

Runs an ExpectationSuite against a dataset.

Parameters:
  • suite (ExpectationSuite) – The suite of expectations to evaluate.

  • engine (str) – "pandas" or "spark".

run(data: Any, data_source: str | None = None) ValidationResult[source]

Execute every expectation in the suite against data.

Parameters:
  • data (pd.DataFrame | pyspark.sql.DataFrame) – The dataset to validate.

  • data_source (str, optional) – A label describing where the data came from.

Return type:

ValidationResult

validatex.core.validator.validate(data: Any, suite: ExpectationSuite, engine: str = 'pandas', data_source: str | None = None) ValidationResult[source]

Convenience function to validate data against a suite.

Parameters:
  • data (pd.DataFrame | pyspark.sql.DataFrame)

  • suite (ExpectationSuite)

  • engine (str) – "pandas" or "spark".

  • data_source (str, optional)

Return type:

ValidationResult

Module contents

Core module for ValidateX - contains the fundamental building blocks.

class validatex.core.Expectation(column: str | None = None, kwargs: Dict[str, ~typing.Any]=<factory>, meta: Dict[str, ~typing.Any]=<factory>)[source]

Bases: ABC

Abstract base class for all expectations.

Subclasses must:
  1. Set the class attribute expectation_type (a unique string id).

  2. Implement _validate_pandas() and/or _validate_spark().

column: str | None = None
expectation_type: str = 'base_expectation'
classmethod from_dict(d: Dict[str, Any]) Expectation[source]

Deserialize from a dictionary.

kwargs: Dict[str, Any]
meta: Dict[str, Any]
to_dict() Dict[str, Any][source]

Serialize to a plain dictionary (for YAML / JSON configs).

validate(data: Any, engine: str = 'pandas') ExpectationResult[source]

Run this expectation against data using the specified engine.

Parameters:
  • data (Any) – The dataset (pd.DataFrame or pyspark.sql.DataFrame).

  • engine (str) – "pandas" or "spark".

Return type:

ExpectationResult

class validatex.core.ExpectationResult(expectation_type: str, success: bool, column: str | None = None, observed_value: Any = None, element_count: int = 0, unexpected_count: int = 0, unexpected_percent: float = 0.0, unexpected_values: List[Any] = <factory>, details: Dict[str, ~typing.Any]=<factory>, exception_info: str | None = None, meta: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Result of a single expectation evaluation.

column: str | None = None
details: Dict[str, Any]
element_count: int = 0
exception_info: str | None = None
expectation_type: str
property human_observed: str

Return a human-readable string for the observed value.

Converts raw dicts / technical strings into executive-friendly text.

meta: Dict[str, Any]
observed_value: Any = None
property severity: str

Return severity level for this expectation.

property severity_icon: str
property status: str
property status_icon: str
success: bool
to_dict() Dict[str, Any][source]
unexpected_count: int = 0
unexpected_percent: float = 0.0
unexpected_values: List[Any]
class validatex.core.ExpectationSuite(name: str, expectations: List[Expectation] = <factory>, meta: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

A named collection of expectations.

Examples

>>> suite = ExpectationSuite("user_data_quality")
>>> suite.add("expect_column_to_not_be_null", column="user_id")
>>> suite.add("expect_column_values_to_be_between",
...           column="age", min_value=0, max_value=150)
add(expectation_type: str, column: str | None = None, meta: Dict[str, Any] | None = None, **kwargs: Any) ExpectationSuite[source]

Add an expectation to this suite.

Parameters:
  • expectation_type (str) – The registered name of the expectation (e.g. "expect_column_to_not_be_null").

  • column (str, optional) – Target column name.

  • meta (dict, optional) – Arbitrary metadata to attach.

  • **kwargs – Additional arguments forwarded to the expectation (e.g. min_value, regex).

Returns:

self for fluent chaining.

Return type:

ExpectationSuite

add_expectation(expectation: Expectation) ExpectationSuite[source]

Add a pre-built Expectation instance.

clear() ExpectationSuite[source]

Remove all expectations.

expectations: List[Expectation]
classmethod from_dict(data: Dict[str, Any]) ExpectationSuite[source]

Create a suite from a plain dictionary.

classmethod load(filepath: str) ExpectationSuite[source]

Load from a YAML or JSON file.

meta: Dict[str, Any]
name: str
remove(index: int) ExpectationSuite[source]

Remove an expectation by index.

save(filepath: str) None[source]

Save to YAML or JSON based on file extension.

to_dict() Dict[str, Any][source]
to_json(indent: int = 2) str[source]
to_yaml() str[source]
class validatex.core.ValidationResult(suite_name: str, results: List[ExpectationResult] = <factory>, run_time: datetime | None = None, run_duration_seconds: float = 0.0, data_source: str | None = None, engine: str = 'pandas', statistics: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Aggregate result of running an entire expectation suite.

column_health() List[ColumnHealthSummary][source]

Aggregate expectation results by column.

Extracts null % and unique % from specific expectation types when present.

compute_quality_score() float[source]

Compute a weighted data quality score (0–100).

Severity weights:
  • Critical: ×3

  • Warning : ×2

  • Info : ×1

Score = 100 × (weighted_passed / weighted_total)

compute_statistics() Dict[str, Any][source]

Compute summary statistics and store them.

data_source: str | None = None
engine: str = 'pandas'
property errored_expectations: int
property failed_expectations: int
results: List[ExpectationResult]
run_duration_seconds: float = 0.0
run_time: datetime | None = None
statistics: Dict[str, Any]
property success: bool

True only if every expectation passed.

property success_percent: float
property successful_expectations: int
suite_name: str
summary() str[source]

Return a human-readable summary string.

to_dict() Dict[str, Any][source]
to_html(filepath: str) None[source]

Generate a rich HTML report and write to filepath.

to_json(indent: int = 2) str[source]

Serialize the full result to a JSON string.

to_json_file(filepath: str) None[source]

Write the validation result to a JSON file.

property total_expectations: int
class validatex.core.Validator(suite: ExpectationSuite, engine: str = 'pandas')[source]

Bases: object

Runs an ExpectationSuite against a dataset.

Parameters:
  • suite (ExpectationSuite) – The suite of expectations to evaluate.

  • engine (str) – "pandas" or "spark".

run(data: Any, data_source: str | None = None) ValidationResult[source]

Execute every expectation in the suite against data.

Parameters:
  • data (pd.DataFrame | pyspark.sql.DataFrame) – The dataset to validate.

  • data_source (str, optional) – A label describing where the data came from.

Return type:

ValidationResult