Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add settings validation docs #1648

Merged
merged 6 commits into from
Nov 8, 2023
Merged

Conversation

ThomasHepworth
Copy link
Contributor

Type of PR

  • BUG
  • FEAT
  • MAINT
  • DOC

Is your Pull Request linked to an existing Issue or Pull Request?

#1252

Give a brief description for the solution you have provided

Adds documentation to help in the process of extending our existing settings validation code.

These docs cover:

  • The settings schema and some basic information on how to update it
  • Our additional validation checks, giving an overview of the current code and information on how to extend the existing checks.

PR Checklist

  • Added documentation for changes
  • Added feature to example notebooks or tutorial (if appropriate)
  • Added tests (if appropriate)
  • Made changes based off the latest version of Splink
  • Run the linter

@RossKen
Copy link
Contributor

RossKen commented Oct 17, 2023

Can we add a link and one liner about this to the index.md in dev guides?

To extend the column checks, you simply need to add an additional validation method to the `InvalidColValidator` class. The implementation of this will vary depending on whether you merely need to assess the validity of a single column or columns within a SQL statement.

#### Single Column Checks
To assess single columns, the `validate_settings_column` should typically be used. This takes in a `setting_id` (analogous to the title you want to give your log string) and a list of columns to be checked.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should valid_settings_column have a link? Also, it is not clear what it is. A function? A class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, here I think it may be helpful to state the context at the start i.e. Checks are applied on all columns by default, but there is functionality available to apply more specific checks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it helpful here to refer to as "specific column checks" or "named column checks" as this mentions being able to refer to multiple columns, not just one? So I'm not sure if "single column checks" is the clearest naming. However I may be misunderstanding!

Copy link
Contributor

@RossKen RossKen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really helpful content - just a couple of stylistic/wording changes

Copy link
Contributor

@RossKen RossKen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really helpful content - just a couple of stylistic/wording changes

Base automatically changed from settings_validation_refactor_and_improved_logging to master October 23, 2023 12:54
@@ -29,4 +29,4 @@ Splink is quite a large, complex codebase. The guides in this section lay out so
* [Comparison and Comparison Level Libraries](./comparisons/new_library_comparisons_and_levels.md) - demonstrates how `Comparison` Library and `ComparisonLevel` Library functions are structured within Splink, including how to add new functions and edit existing functions.
* [Charts](./charts.ipynb) - demonstrates how charts are built in Splink, including how to add new charts and edit existing charts.
* [User-Defined Functions](./udfs.md) - demonstrates how User Defined Functions (UDFs) are used to provide functionality within Splink that is not native to a given SQL backend.

* [Settings Validation](./settings_validation.md) - summarises how to use and expand the existing settings schema and validation functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link is broken

Copy link
Contributor

@RossKen RossKen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making those changes - I think it is much clearer now. 🙌

Approving, but have left some minor comments to tweak before merging

@ThomasHepworth ThomasHepworth merged commit d673eaa into master Nov 8, 2023
2 checks passed
@ThomasHepworth ThomasHepworth deleted the settings_validation_docs branch November 8, 2023 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants