bauplan.standard_expectations module

This module contains standard expectations that can be used to test data artifact in a Bauplan pipeline. Using these expectations instead of hand-made ones will make your pipeline easier to maintain, and significantly faster and more memory-efficient.

Each function returns a boolean, so that the wrapping function can assert or print out messages in case of failure.

bauplan.standard_expectations.expect_column_accepted_values(table: Table, column_name: str, accepted_values: list) bool

Expect all values in the column to come from the list of accepted values.

Parameters:
  • table – the pyarrow table to test.

  • column_name – the column to test.

  • accepted_values – the list of accepted values.

Returns:

a boolean.

bauplan.standard_expectations.expect_column_all_null(table: Table, column_name: str) bool

Expect the column to have all null values.

Parameters:
  • table – the pyarrow table to test.

  • column_name – the column to test.

Returns:

a boolean.

bauplan.standard_expectations.expect_column_all_unique(table: Table, column_name: str) bool

Expect the column to have all unique values (i.e. no duplicates).

Parameters:
  • table – the pyarrow table to test.

  • column_name – the column to test.

Returns:

a boolean.

bauplan.standard_expectations.expect_column_equal_concatenation(table: Table, target_column: str, columns: list, separator: str = '') bool

Expect the target column to be equal to the concatenation of the columns in the list.

If the columns are not of type pa.string(), the function will attempt to convert them to string. If a custom separator is needed (default: the empty string), it can be passed as an argument.

Parameters:
  • table – the pyarrow table to test.

  • target_column – the column to compare with the concatenation of the columns.

  • columns – the list of columns to concatenate.

  • separator – the separator to use when concatenating the columns.

Returns:

a boolean.

bauplan.standard_expectations.expect_column_mean_greater_or_equal_than(table: Table, column_name: str, value: float) bool

Expect the mean of a column to be equal or greater than the supplied value.

Parameters:
  • table – the pyarrow table to test.

  • column_name – the column to calculate the mean of.

  • value – the value to compare the mean with.

Returns:

a boolean.

bauplan.standard_expectations.expect_column_mean_greater_than(table: Table, column_name: str, value: float) bool

Expect the mean of a column to be greater than the supplied value.

Parameters:
  • table – the pyarrow table to test.

  • column_name – the column to calculate the mean of.

  • value – the value to compare the mean with.

Returns:

a boolean.

bauplan.standard_expectations.expect_column_mean_smaller_or_equal_than(table: Table, column_name: str, value: float) bool

Expect the mean of a column to be equal or smaller than the supplied value.

Parameters:
  • table – the pyarrow table to test.

  • column_name – the column to calculate the mean of.

  • value – the value to compare the mean with.

Returns:

a boolean.

bauplan.standard_expectations.expect_column_mean_smaller_than(table: Table, column_name: str, value: float) bool

Expect the mean of a column to be smaller than the supplied value.

Parameters:
  • table – the pyarrow table to test.

  • column_name – the column to calculate the mean of.

  • value – the value to compare the mean with.

Returns:

a boolean.

bauplan.standard_expectations.expect_column_no_nulls(table: Table, column_name: str) bool

Expect the column to have no null values.

Parameters:
  • table – the pyarrow table to test.

  • column_name – the column to test.

Returns:

a boolean.

bauplan.standard_expectations.expect_column_not_unique(table: Table, column_name: str) bool

Expect the column to have at least one duplicate value.

Parameters:
  • table – the pyarrow table to test.

  • column_name – the column to test.

Returns:

a boolean.

bauplan.standard_expectations.expect_column_some_null(table: Table, column_name: str) bool

Expect the column to have at least one null.

Parameters:
  • table – the pyarrow table to test.

  • column_name – the column to test.

Returns:

a boolean.