checksit.generic
Generic functions to be called by specs.
Functions intended to be the entry point for spec checks, and can do direct checks (e.g. is a equal to b) or call to vocab and rule checks. All functions called by the specs MUST return two lists, errors and warnings, even if one will always be empty, and MUST take skip_spellcheck as a parameter, even if not used.
Functions
|
Checks that only defined global attributes, dimensions and variables are present. |
|
Check that dimensions exist in file. |
|
Check dimension exists matching regex. |
|
Checks format of NCAS-GENERAL file name. |
|
Checks file name against series of vocab checks. |
|
Run checks against global attributes. |
|
Finds moment variables in radar file and checks attributes of those variables. |
|
Check variable exists and attributes defined and/or meet rules. |
|
Check that variable attributes are defined. |
|
Check that variables exist in file. |
|
All edits that are one edit away from word. |
|
Find potential misspelt strings. |
|
All edits that are two edits away from word. |
- checksit.generic.check_defined_only(dct: Dict[str, Dict[str, Any]], all_global_attrs: List[str], all_dimensions: List[str], all_variables: List[str], skip_spellcheck: bool = False)[source]
Checks that only defined global attributes, dimensions and variables are present.
- Parameters:
dct – dictionary of file data, as made by the to_dict() function in each reader class, with “variables”, “dimensions” and “global_attributes” as keys.
all_global_attrs – list of all allowed global attributes.
all_dimensions – list of all allowed dimensions.
all_variables – list of all allowed variables.
- Returns:
A list of errors and a list of warnings
- checksit.generic.check_dim_exists(dct: Dict[str, Dict[str, Any]], dimensions: List[str], skip_spellcheck: bool = False) Tuple[List[str], List[str]][source]
Check that dimensions exist in file.
Checks a list of dimensions to see if they exist in given file. Optional dimensions can be defined by having “:__OPTIONAL__” after the dimension name. Missing optional dimensions will be returned as warnings, and other missing dimensions will be returned as errors.
- Parameters:
dct – dictionary of file data, as made by the to_dict() function in each reader class, with “dimension” as a key.
dimensions – list of dimension names to check exist
skip_spellcheck – skip looking for close misspelling of attribute if not found in variable. Default False.
- Returns:
A list of errors and a list of warnings
- checksit.generic.check_dim_regex(dct: Dict[str, Dict[str, Any]], regex_dims: List[str], skip_spellcheck: bool = False) Tuple[List[str], List[str]][source]
Check dimension exists matching regex.
For each regex string in regex_dims, checks if a dimension exists matching that regex. Optional dimensions can be specified by appending “:__OPTIONAL__” to the end of the regex string.
- Parameters:
dct – dictionary of file data, as made by the to_dict() function in each reader class, with “dimension” as a key.
regex_dims – list of regex strings to check dimensions for matches.
- Returns:
A list of errors and a list of warnings
- checksit.generic.check_file_name(file_name: str, vocab_checks: Dict[str, str] | None = None, rule_checks: Dict[str, str] | None = None, skip_spellcheck: bool = False) Tuple[List[str], List[str]][source]
Checks format of NCAS-GENERAL file name.
Checks format of NCAS-GENERAL file name is correct. Requires vocab checks for “instrument” and “data_product”, plus rule_check for “platform”, to be defined.
- Parameters:
file_name – Name of NCAS-GENERAL file.
vocab_checks – Dictionary with “instrument” and “data_product” as keys, and vocabs for each as values.
rule_checks – Dictionary with “platform” as key, and rule check for platform as value.
skip_spellcheck – skip looking for close misspelling of attribute if not found in variable. Default False.
- Returns:
A list of errors and a list of warnings
- checksit.generic.check_generic_file_name(file_name: str, vocab_checks: Dict[str, str] | None = None, segregator: Dict[str, str] | None = None, extension: Dict[str, str] | None = None, spec_verbose: Dict[str, str] | None = None, skip_spellcheck: bool = False) Tuple[List[str], List[str]][source]
Checks file name against series of vocab checks.
For a given file_name, splits name into parts based on the segregator and checks each part based on vocab_checks.
- Parameters:
file_name – Name of the file to check.
vocab_checks – Dictionary of vocab checks for each part of the file name. Keys must be “field00”, “field01” e.t.c., and values for each are the vocab checks for each section.
segregator – Character on which to split the file name. Should be dictionary with key “seg” and value being the character to separate on. Default segregator is “_”.
extension – File extension. Should be dictionary with key “ext” and value being the file extension. Default file extension is “.test”.
spec_verbose – Print additional information. Can be defined in the spec file, which gets passed through as dictionary. Should have key “spec_verb” and value True/False.
skip_spellcheck – skip looking for close misspelling of attribute if not found in variable. Default False.
- Returns:
A list of errors and a list of warnings
- checksit.generic.check_global_attrs(dct: Dict[str, Dict[str, Any]], defined_attrs: List[str] | None = None, vocab_attrs: Dict[str, str] | None = None, regex_attrs: Dict[str, str] | None = None, rules_attrs: Dict[str, str] | None = None, skip_spellcheck: bool = False) Tuple[List[str], List[str]][source]
Run checks against global attributes.
- Run series of checks against global attributes in file. Can check for any or all of:
defined_attrs (i.e. does the attribute exist),
vocab_attrs (i.e. does the value of the attribute match value defined in controlled vocabulary),
regex_attrs (i.e. does the value of the attribute match a regex expression),
rules_attrs (i.e. does the attribute value pass a defined rule).
- Parameters:
dct – dictionary of file data, as made by the to_dict() function in each reader class, with “global_attributes” as a key.
defined_attrs – list of attributes to check exist and are defined.
vocab_attrs – dictionary with attribute to check as keys and vocab rule to check against as value.
regex_attrs – dictionary with attribute to check as keys and regex rule to check against as value.
rules_attrs – dictionary with attribute to check as keys and rule to check against, and any options needed, as string value (e.g. “rule-func:string-of-length:3+”). See documentation on the check function in the Rules class for more information on formatting.
skip_spellcheck – skip looking for close misspelling of attribute if not found in variable. Default False.
- Returns:
A list of errors and a list of warnings
- checksit.generic.check_radar_moment_variables(dct: Dict[str, Dict[str, Any]], exist_attrs: List[str] | None = None, rule_attrs: Dict[str, str] | None = None, one_of_attrs: List[str] | None = None, skip_spellcheck: bool = False) Tuple[List[str], List[str]][source]
Finds moment variables in radar file and checks attributes of those variables.
Finds all the moment variables in a radar file based on the existence of the “coordinates” attribute, and for all of those variables checks all the attributes listed in “exist_attrs” exist, all of the rules listed in “rule_attrs” are met, and one of the attributes in each string in “one_of_attrs” are defined.
- Parameters:
dct – dictionary of file data, as made by the to_dict() function in each reader class, with “global_attributes” as a key.
exist_attrs – list of attributes to check exist.
rules_attrs – dictionary with attribute to check as keys and rule to check against, and any options needed, as string value (e.g. “rule-func:string-of-length:3+”). See documentation on the check function in the Rules class for more information on formatting.
one_of_attrs – list of attribute choices. Each string in the list should have a number of attributes separated by “|”, and one of those attributes in each string should be present as an attribute in each variable.
skip_spellcheck – skip looking for close misspelling of attribute if not found in variable. Default False.
- Returns:
A list of errors and a list of warnings
- checksit.generic.check_var(dct: Dict[str, Dict[str, Any]], variable: str | List[str], defined_attrs: List[str], rules_attrs: Dict[str, str] | None = None, additional_attrs_allowed: bool = True, skip_spellcheck: bool = False) Tuple[List[str], List[str]][source]
Check variable exists and attributes defined and/or meet rules.
For a given variable, check it exists, all defined_attrs exist as variable attributes, and all rules_attrs are met for variable attributes. Variable can be marked as an optional variable by appending “:__OPTIONAL__” to the variable name - if optional variable does not exist this message is returned as a warning, all other messages are returned as errors.
- Parameters:
dct – dictionary of file data, as made by the to_dict() function in each reader class, with “global_attributes” as a key.
variable – variable to check. If list, only first variable is checked.
defined_attrs – list of attributes to check exist and are defined.
rules_attrs – dictionary with attribute to check as keys and rule to check against, and any options needed, as string value (e.g. “rule-func:string-of-length:3+”). See documentation on the check function in the Rules class for more information on formatting.
additional_attrs_allowed – if False, will return an error if variable has any attributes not defined in defined_attrs or rules_attrs. Default True.
skip_spellcheck – skip looking for close misspelling of attribute if not found in variable. Default False.
- Returns:
A list of errors and a list of warnings
- checksit.generic.check_var_attrs(dct: Dict[str, Dict[str, Any]], defined_attrs: List[str], ignore_bounds: bool = True, skip_spellcheck: bool = False) Tuple[List[str], List[str]][source]
Check that variable attributes are defined.
Checks that all given attributes are defined for all variables in file.
- Parameters:
dct – dictionary of file data, as made by the to_dict() function in each reader class, with “variables” as a key.
defined_attrs – list of attributes to check exist in each variable in dct.
ignore_bounds – ignore checking attributes in boundary variables. Default True.
skip_spellcheck – skip looking for close misspelling of attribute if not found in variable. Default False.
- Returns:
A list of errors and a list of warnings
- checksit.generic.check_var_exists(dct: Dict[str, Dict[str, Any]], variables: List[str], skip_spellcheck: bool = False) Tuple[List[str], List[str]][source]
Check that variables exist in file.
Checks a list of variables to see if they exist in given file. Optional variables can be defined by having “:__OPTIONAL__” after the variable name. Missing optional variables will be returned as warnings, and other missing variables will be returned as errors.
- Parameters:
dct – dictionary of file data, as made by the to_dict() function in each reader class, with “variables” as a key.
variables – list of variable names to check exist
skip_spellcheck – skip looking for close misspelling of attribute if not found in variable. Default False.
- Returns:
A list of errors and a list of warnings
- checksit.generic.one_spelling_mistake(word: str) Set[str][source]
All edits that are one edit away from word.
Part of spell checking, finds all possible strings that have one error in them, for example one character missing, one extra character, two characters switched positions, or one character replaced with another. Letters are considered to be lower case a-z, digits 0-9, and the characters ., _, and -. Adapted from https://norvig.com/spell-correct.html
- Parameters:
word – string to find all single edits from.
- Returns:
Set of all possible single edits from word.
- checksit.generic.search_close_match(search_for: str, search_in: Iterable[str]) str[source]
Find potential misspelt strings.
Search within search_in to identify a string that is close to search_for as a potential misspelling.
- Parameters:
search_for – correctly spelt string to search against.
search_in – list of strings to search within for potentially misspelt string.
- Returns:
String with message if potential misspelling found, otherwise empty string.
- checksit.generic.two_spelling_mistakes(word: str) Set[str][source]
All edits that are two edits away from word.
Part of spell checking, finds all possible strings that have two errors in them, taking the results from one_spelling_mistake(word) and checking for one spelling mistake in all those values. From https://norvig.com/spell-correct.html
- Parameters:
word – string to find all double edits from.
- Returns:
Set of all possible double edits from word.