validatex.datasources package
Submodules
validatex.datasources.base_source module
Abstract base class for data sources.
- class validatex.datasources.base_source.DataSource(name: str | None = None)[source]
Bases:
ABCBase class for all data sources.
A DataSource knows how to load data into either a Pandas or PySpark DataFrame depending on the requested engine.
- load(engine: str = 'pandas', spark_session: Any = None) Any[source]
Load data using the specified engine.
validatex.datasources.csv_source module
CSV data source.
- class validatex.datasources.csv_source.CSVDataSource(filepath: str, read_options: Dict[str, Any] | None = None, name: str | None = None)[source]
Bases:
DataSourceLoad data from a CSV file.
- Parameters:
filepath (str) – Path to the CSV file.
read_options (dict, optional) – Extra keyword arguments forwarded to
pd.read_csv/ Spark reader.name (str, optional) – A label for this source.
validatex.datasources.database_source module
Database (SQL) data source using SQLAlchemy.
- class validatex.datasources.database_source.DatabaseDataSource(connection_string: str, query: str, name: str | None = None)[source]
Bases:
DataSourceLoad data from a SQL database.
- Parameters:
connection_string (str) – SQLAlchemy connection string, e.g.
"postgresql://user:pass@host/db"or"sqlite:///data.db".query (str) – SQL query to execute.
name (str, optional)
validatex.datasources.dataframe_source module
Direct DataFrame data source (pass an already-loaded DataFrame).
- class validatex.datasources.dataframe_source.DataFrameSource(dataframe: Any, name: str | None = None)[source]
Bases:
DataSourceWraps an existing Pandas or PySpark DataFrame as a DataSource.
- Parameters:
dataframe (pd.DataFrame | pyspark.sql.DataFrame) – The DataFrame to validate.
name (str, optional)
validatex.datasources.parquet_source module
Parquet data source.
- class validatex.datasources.parquet_source.ParquetDataSource(filepath: str, read_options: Dict[str, Any] | None = None, name: str | None = None)[source]
Bases:
DataSourceLoad data from a Parquet file.
- Parameters:
filepath (str) – Path to the Parquet file or directory.
read_options (dict, optional) – Extra kwargs forwarded to
pd.read_parquet/ Spark reader.name (str, optional)
Module contents
Data source connectors for ValidateX.
- class validatex.datasources.CSVDataSource(filepath: str, read_options: Dict[str, Any] | None = None, name: str | None = None)[source]
Bases:
DataSourceLoad data from a CSV file.
- Parameters:
filepath (str) – Path to the CSV file.
read_options (dict, optional) – Extra keyword arguments forwarded to
pd.read_csv/ Spark reader.name (str, optional) – A label for this source.
- class validatex.datasources.DataFrameSource(dataframe: Any, name: str | None = None)[source]
Bases:
DataSourceWraps an existing Pandas or PySpark DataFrame as a DataSource.
- Parameters:
dataframe (pd.DataFrame | pyspark.sql.DataFrame) – The DataFrame to validate.
name (str, optional)
- class validatex.datasources.DataSource(name: str | None = None)[source]
Bases:
ABCBase class for all data sources.
A DataSource knows how to load data into either a Pandas or PySpark DataFrame depending on the requested engine.
- load(engine: str = 'pandas', spark_session: Any = None) Any[source]
Load data using the specified engine.
- class validatex.datasources.DatabaseDataSource(connection_string: str, query: str, name: str | None = None)[source]
Bases:
DataSourceLoad data from a SQL database.
- Parameters:
connection_string (str) – SQLAlchemy connection string, e.g.
"postgresql://user:pass@host/db"or"sqlite:///data.db".query (str) – SQL query to execute.
name (str, optional)
- class validatex.datasources.ParquetDataSource(filepath: str, read_options: Dict[str, Any] | None = None, name: str | None = None)[source]
Bases:
DataSourceLoad data from a Parquet file.
- Parameters:
filepath (str) – Path to the Parquet file or directory.
read_options (dict, optional) – Extra kwargs forwarded to
pd.read_parquet/ Spark reader.name (str, optional)