Choose result format
When you validate data with GX 1.0 you can set the level of detail returned in your Validation Results by specifying a value for the optional result_format
parameter. These settings will be applied to the results returned by each validated Expectation.
Typical use cases customizing Result Format settings include summarizing values that cause Expectations to fail durring data exploration, retrieving failed rows to facilitate cleaning data, or excluding excess Validation Result data in published Data Docs.
Define a Result Format configuration
The result_format
parameter takes in a dictionary of configuration settings.
-
Create a dictionary and set the verbosity of returned Validation Results.
The verbosity of your Validation Results can be set as the value of the key
"result_format"
in your Result Format dictionary. In order from least verbosity to greatest detail, the valid values for the"result_format"
key are:"BOOLEAN_ONLY"
"BASIC"
"SUMMARY"
"COMPLETE"
.
The default verbosity level of Validation Results generated by Expectations is
"SUMMARY"
.Select a value below to see example code for that Result Format and what information is returned at that level of verbosity:
- "BOOLEAN_ONLY"
- "BASIC"
- "SUMMARY"
- "COMPLETE"
When the
result_format
is"BOOLEAN_ONLY"
Validation Results do not include additional information in aresult
dictionary. The successful evaluation of the Expectation is exclusively returned via theTrue
orFalse
value of thesuccess
key in the returned Validation Result.To create a
"BOOLEAN_ONLY"
result format configuration use the following code:Pythonboolean_only_rf_dict = {"result_format": "BOOLEAN_ONLY"}
When the
result_format
is set to"BASIC"
the Validation Results of each Expectation includes aresult
dictionary with information providing a basic explanation for why it failed or succeeded. The format is intended for quick feedback and it works well in Jupyter Notebooks.You can check the Validation Results reference tables to see what information is provided in the
result
dictionary.To create a
"BASIC"
result format configuration use the following code:basic_rf_dict = {"result_format": "BASIC"}
When the
result_format
key is set to"SUMMARY"
the Validation Results of each Expectation includes aresult
dictionary with information that summarizes values to show why it failed or succeeded. This format is intended for more detailed exploratory work and includes additional information beyond what is included byBASIC
.You can check the Validation Results reference tables to see what information is provided in the
result
dictionary. To create a"SUMMARY"
result format configuration use the following code:Pythonsummary_rf_dict = {"result_format": "SUMMARY"}
When the
result_format
key is set to"COMPLETE"
the Validation Results of each Expectation includes aresult
dictionary with all available information to explain why it failed or succeeded. This format is intended for debugging pipelines or developing detailed regression tests and includes additional information beyond what is provided by"SUMMARY"
.You can check the Validation Results reference tables to see what information is provided in the
result
dictionary.To create a
"COMPLETE"
result format configuration use the following code:Pythoncomplete_rf_dict = {"result_format": "COMPLETE"}
-
Optional. Specify configurations for additional settings available to the base
result_format
.Once you have defined the base configuration in your
result_format
key, you can further tailor the format of your Validation Results by defining additional key/value pairs in your Result Format dictionary.Reference the table below for valid keys and how they influence the format of generated Validation Results:
- "BOOLEAN_ONLY"
- "BASIC"
- "SUMMARY"
- "COMPLETE"
A '"BOOLEAN_ONLY"' result format does not support additional settings.
Dictionary key Purpose "unexpected_index_column_names"
Defines the columns that can be used to identify unexpected results. For example, primary key (PK) column(s) or other columns with unique identifiers. Supports multiple column names as a list. "return_unexpected_index_query"
When running validations, a query (or a set of indices) is returned that allows you to retrieve the full set of unexpected results as well as the values of the identifying columns specified in "unexpected_index_column_names"
. Setting this value toFalse
suppresses the output (default isTrue
)."partial_unexpected_count"
Sets the number of results to include in "partial_unexpected_list"
. Set the value to zero to suppress the unexpected counts.Dictionary key Purpose "unexpected_index_column_names"
Defines the columns that can be used to identify unexpected results. For example, primary key (PK) column(s) or other columns with unique identifiers. Supports multiple column names as a list. "return_unexpected_index_query"
When running validations, a query (or a set of indices) is returned that allows you to retrieve the full set of unexpected results as well as the values of the identifying columns specified in "unexpected_index_column_names"
. Setting this value toFalse
suppresses the output (default isTrue
)."partial_unexpected_count"
Sets the number of results to include in "partial_unexpected_counts"
,"partial_unexpected_list"
, and"partial_unexpected_index_list"
. Set the value to zero to suppress the unexpected counts.Dictionary key Purpose "unexpected_index_column_names"
Defines the columns that can be used to identify unexpected results. For example, primary key (PK) column(s) or other columns with unique identifiers. Supports multiple column names as a list. "return_unexpected_index_query"
When running validations, a query (or a set of indices) is returned that allows you to retrieve the full set of unexpected results as well as the values of the identifying columns specified in "unexpected_index_column_names"
. Setting this value toFalse
suppresses the output (default isTrue
)."partial_unexpected_count"
Sets the number of results to include in "partial_unexpected_counts"
,"partial_unexpected_list"
, and"partial_unexpected_index_list"
. Set the value to zero to suppress the unexpected counts."exclude_unexpected_values"
When running validations, a set of unexpected results' indices and values is returned. Setting this value to True
suppresses values from the output to only have indices (default isFalse
)."include_unexpected_rows"
When True
this returns the entire row for each unexpected value in dictionary form. This setting only applies when"result_format"
has been explicitly set to a value other than"BOOLEAN_ONLY"
.
Apply a Result Format configuration
You can pass a result_format
configuration in to a Validation Definition's .run(...)
method.
Prerequisites
- Python version 3.8 to 3.11.
- An installation of GX 1.0.
- A preconfigured Data Context.
- A Validation Definition.
- A Result Format configuration. In these examples your result format is stored as a dictionary in the variable
my_result_format
.
Apply a Result Format configuration to a Validation Definition
-
Retrieve your Validation Definition.
Update the value of
definition_name
in the following code and execute it to retrieve your Validation Definition:import great_expectations as gx
context = gx.get_context
definition_name = "my_validation_definition"
my_validation_definition = context.validation_definitions.get(name=definition_name) -
Pass the Result Format to the Validation Definition at runtime:
Pythonmy_validation_definition.run(result_format=my_result_format)
tipYou can also create a persisting Result Format configuration by passing it in as the
result_format
parameter when a Checkpoint is created. The Result Format will be applied every time the Checkpoint is run. -
Review your results:
Pythonprint(validation_result)
Validation Results reference tables
- Information in result fields
- Result fields provided by verbosity level
- Result Format keys
The following table lists the fields that can be found in the result
dictionary of a Validation Result, and what information is provided by that field.
Field within result | Value |
---|---|
element_count | The total number of values in the column. |
missing_count | The number of missing values in the column. |
missing_percent | The total percent of rows missing values for the column. |
unexpected_count | The total count of unexpected values in in a column. |
unexpected_percent | The overall percent of unexpected values in a column. |
unexpected_percent_nonmissing | The percent of unexpected values in a column, excluding rows that have no value for that column. |
observed_value | The aggregate statistic computed for the column. This only applies to Expectations that pertain to the aggregate value of a column, rather than the individual values in each row for the column. |
partial_unexpected_list | A partial list of values that violate the Expectation. (Up to 20 values by default.) |
partial_unexpected_index_list | A partial list the unexpected values in the column, as defined by the columns in unexpected_index_column_names . (Up to 20 indecies by default.) |
partial_unexpected_counts | A partial list of values and counts, showing the number of times each of the unexpected values occur. (Up to 20 unexpected value/count pairs by default.) |
unexpected_index_list | A list of the indices of the unexpected values in the column, as defined by the columns in unexpected_index_column_names . |
unexpected_index_query | A query that can be used to retrieve all unexpected values (SQL and Spark), or the full list of unexpected indices (Pandas). |
unexpected_list | A list of all values that violate the Expectation. |
The following table lists the fields that can be found in the result
dictionary of a Validation Result and the Result Format verbosity levels that return that field.
Fields within result | BOOLEAN_ONLY | BASIC | SUMMARY | COMPLETE |
---|---|---|---|---|
element_count | no | yes | yes | yes |
missing_count | no | yes | yes | yes |
missing_percent | no | yes | yes | yes |
unexpected_count | no | yes | yes | yes |
unexpected_percent | no | yes | yes | yes |
unexpected_percent_nonmissing | no | yes | yes | yes |
observed_value | no | yes | yes | yes |
partial_unexpected_list | no | yes | yes | yes |
partial_unexpected_index_list | no | no | yes | yes |
partial_unexpected_counts | no | no | yes | yes |
unexpected_index_list | no | no | no | yes |
unexpected_index_query | no | no | no | yes |
unexpected_list | no | no | no | yes |
The following table lists the valid keys for a Result Format dictionary and what their purpose is. Not all keys are used by every verbosity level.
Dictionary key | Purpose |
---|---|
"result_format" | Sets the fields to return in Validation Results. Valid values are "BASIC" , "BOOLEAN_ONLY" , "COMPLETE" , and "SUMMARY" . The default value is "SUMMARY" . |
"unexpected_index_column_names" | Defines the columns that can be used to identify unexpected results. For example, primary key (PK) column(s) or other columns with unique identifiers. Supports multiple column names as a list. |
"return_unexpected_index_query" | When running validations, a query (or a set of indices) is returned that allows you to retrieve the full set of unexpected results as well as the values of the identifying columns specified in "unexpected_index_column_names" . Setting this value to False suppresses the output (default is True ). |
"partial_unexpected_count" | Sets the number of results to include in "partial_unexpected_counts" , "partial_unexpected_list" , and "partial_unexpected_index_list" if applicable. Set the value to zero to suppress the unexpected counts. |
"exclude_unexpected_values" | When running validations, a set of unexpected results' indices and values is returned. Setting this value to True suppresses values from the output to only have indices (default is False ). |
"include_unexpected_rows" | When True this returns the entire row for each unexpected value in dictionary form. This setting only applies when "result_format" has been explicitly set to a value other than "BOOLEAN_ONLY" . |
include_unexpected_rows
returns EVERY row for each unexpected value. In large tables, this could result in an unmanageable amount of data.