bauplan.standard_expectations
This module contains standard expectations that can be used to test data artifacts in a Bauplan pipeline. Using these expectations instead of hand-made ones will make your pipeline easier to maintain, and significantly faster and more memory-efficient.
Each function returns a boolean, so that the wrapping function can assert or print out messages in case of failure.
Expect all values in the column to come from the list of accepted values.
Returns:
a boolean.
def expect_column_accepted_values(
table: pa.Table,
column_name: str,
accepted_values: list,
) -> bool: ...
Expect the column to have all null values.
Returns:
a boolean.
def expect_column_all_null(
table: pa.Table,
column_name: str,
) -> bool: ...
Expect the column to have all unique values (i.e. no duplicates).
Returns:
a boolean.
def expect_column_all_unique(
table: pa.Table,
column_name: str,
) -> bool: ...
Expect the target column to be equal to the concatenation of the columns in the list.
Returns:
a boolean.
def expect_column_equal_concatenation(
table: pa.Table,
target_column: str,
columns: list,
separator: str = '',
) -> bool: ...
If the columns are not of type pa.string(), the function will attempt to convert them to string. If a custom separator is needed (default: the empty string), it can be passed as an argument.
Expect the mean of a column to be equal or greater than the supplied value.
Returns:
a boolean.
def expect_column_mean_greater_or_equal_than(
table: pa.Table,
column_name: str,
value: float,
) -> bool: ...
Expect the mean of a column to be greater than the supplied value.
Returns:
a boolean.
def expect_column_mean_greater_than(
table: pa.Table,
column_name: str,
value: float,
) -> bool: ...
Expect the mean of a column to be equal or smaller than the supplied value.
Returns:
a boolean.
def expect_column_mean_smaller_or_equal_than(
table: pa.Table,
column_name: str,
value: float,
) -> bool: ...
Expect the mean of a column to be smaller than the supplied value.
Returns:
a boolean.
def expect_column_mean_smaller_than(
table: pa.Table,
column_name: str,
value: float,
) -> bool: ...
Expect the column to have no null values.
Returns:
a boolean.
def expect_column_no_nulls(
table: pa.Table,
column_name: str,
) -> bool: ...
Expect the column to have at least one duplicate value.
Returns:
a boolean.
def expect_column_not_unique(
table: pa.Table,
column_name: str,
) -> bool: ...