bauplan.standard_expectations
This module contains standard expectations that can be used to test data artifact in a Bauplan pipeline. Using these expectations instead of hand-made ones will make your pipeline easier to maintain, and significantly faster and more memory-efficient.
Each function returns a boolean, so that the wrapping function can assert or print out messages in case of failure.
def
expect_column_accepted_values
Expect all values in the column to come from the list of accepted values.
Parameters
tableTable
the pyarrow table to test.
column_namestr
the column to test.
accepted_valueslist
the list of accepted values.
def
expect_column_all_null
Expect the column to have all null values.
Parameters
tableTable
the pyarrow table to test.
column_namestr
the column to test.
def
expect_column_all_unique
Expect the column to have all unique values (i.e. no duplicates).
Parameters
tableTable
the pyarrow table to test.
column_namestr
the column to test.
def
expect_column_equal_concatenation
Expect the target column to be equal to the concatenation of the columns in the list.
If the columns are not of type pa.string(), the function will attempt to convert them to string. If a custom separator is needed (default: the empty string), it can be passed as an argument.
Parameters
tableTable
the pyarrow table to test.
target_columnstr
the column to compare with the concatenation of the columns.
columnslist
the list of columns to concatenate.
separatorstr
the separator to use when concatenating the columns.
def
expect_column_mean_greater_or_equal_than
Expect the mean of a column to be equal or greater than the supplied value.
Parameters
tableTable
the pyarrow table to test.
column_namestr
the column to calculate the mean of.
valuefloat
the value to compare the mean with.
def
expect_column_mean_greater_than
Expect the mean of a column to be greater than the supplied value.
Parameters
tableTable
the pyarrow table to test.
column_namestr
the column to calculate the mean of.
valuefloat
the value to compare the mean with.
def
expect_column_mean_smaller_or_equal_than
Expect the mean of a column to be equal or smaller than the supplied value.
Parameters
tableTable
the pyarrow table to test.
column_namestr
the column to calculate the mean of.
valuefloat
the value to compare the mean with.
def
expect_column_mean_smaller_than
Expect the mean of a column to be smaller than the supplied value.
Parameters
tableTable
the pyarrow table to test.
column_namestr
the column to calculate the mean of.
valuefloat
the value to compare the mean with.
def
expect_column_no_nulls
Expect the column to have no null values.
Parameters
tableTable
the pyarrow table to test.
column_namestr
the column to test.
def
expect_column_not_unique
Expect the column to have at least one duplicate value.
Parameters
tableTable
the pyarrow table to test.
column_namestr
the column to test.
def
expect_column_some_null
Expect the column to have at least one null.
Parameters
tableTable
the pyarrow table to test.
column_namestr
the column to test.