Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementing QC on delayed mode data sets #355

Open
leilabbb opened this issue Jun 25, 2024 · 11 comments
Open

implementing QC on delayed mode data sets #355

leilabbb opened this issue Jun 25, 2024 · 11 comments
Assignees
Labels

Comments

@leilabbb
Copy link
Contributor

leilabbb commented Jun 25, 2024

The data providers or the data users should be involved directly in implementing QC on delayed mode datasets. They should be responsible for submitting their QC results to the DAC with the transparency of what library they have used (ioos_qc or others) and the process that they have followed.

Having the Glider DAC implementing the QC on delayed mode dataset requires the addition of a human in the loop capacity. Otherwise, unchecked QC results will always be at less than 50% completion and not ready to be used by the user.

@kerfoot
Copy link
Contributor

kerfoot commented Jul 15, 2024

A few questions related to this:

  1. You mention a human in the loop for delayed mode data sets. Does this mean that the QC results from the real-time data sets are being sanity checked? If so, how are they being checked? How would this be any different from the delayed-mode data sets sanity checks?
  2. If the DAC is not going to apply QC to delayed mode data sets, can you please remove the empty QC variables from the XML used to serve the delayed-mode data sets? Including these variables, which will always be empty, will only cause confusion for end users. This means that the template being used for real-time data sets cannot be the same template used for delayed mode data sets. For example, this delayed mode dataset has a number of QC variables that start with qartod*_, but the variables are empty (not even _FillValue). So there is no need to include these variables in the delayed mode data sets as they will never contain any useful information and would likely just cause confusion as to which QC results should be used.

@lgarzio
Copy link

lgarzio commented Jul 25, 2024

@leilabbb - I'm ready to push several delayed-mode datasets to the DAC that have our QC variables in them. What's the status of removing empty DAC QC variables from delayed-mode XMLs so the QC variables aren't duplicated?

@leilabbb
Copy link
Contributor Author

Please see my comments below:

A few questions related to this:

  1. You mention a human in the loop for delayed mode data sets. Does this mean that the QC results from the real-time data sets are being sanity checked? If so, how are they being checked? How would this be any different from the delayed-mode data sets sanity checks?

The real-time data sets are not currently undergoing sanity checks. The quick filtering process should get GTS a reliable subset of the data. This process is still not complete as we have encountered problems.

My commentary above was about answering the following questions:

  • How useful are automated flags to end-users?
  • Should the data providers' QC flags be prioritized in the DAC, provided that effort has been invested in generating them?
  1. If the DAC is not going to apply QC to delayed mode data sets, can you please remove the empty QC variables from the XML used to serve the delayed-mode data sets? Including these variables, which will always be empty, will only cause confusion for end users. This means that the template being used for real-time data sets cannot be the same template used for delayed mode data sets. For example, this delayed mode dataset has a number of QC variables that start with qartod*_, but the variables are empty (not even _FillValue). So there is no need to include these variables in the delayed mode data sets as they will never contain any useful information and would likely just cause confusion as to which QC results should be used.

Correct, changes will be applied to fix the final product. Discussions are underway to address the issues and implement the necessary improvements.

@leilabbb
Copy link
Contributor Author

@leilabbb - I'm ready to push several delayed-mode datasets to the DAC that have our QC variables in them. What's the status of removing empty DAC QC variables from delayed-mode XMLs so the QC variables aren't duplicated?

Hey Lori, submitting the delayed mode data should not pose any issues. I recommend using distinct QC variable names from those employed by the GDAC for QARTOD, to avoid any potential confusion.

@lgarzio
Copy link

lgarzio commented Jul 26, 2024

@leilabbb Thanks Leila! Our QC variable names are different from the GDACs QC variable names, however I still think it will be confusing for users if there are QC variables in the files that are empty (added by the GDAC, e.g. qartod_conductivity_spike_flag) and additional QC variables that are named something slightly different that aren't empty (from us, e.g. conductivity_qartod_spike_test). That just makes the variable lists in these datasets unnecessarily long and confusing. So just to clarify, if I submit a delayed mode dataset, is the GDAC still adding empty QC variables to the dataset?

@leilabbb
Copy link
Contributor Author

@leilabbb Thanks Leila! Our QC variable names are different from the GDACs QC variable names, however I still think it will be confusing for users if there are QC variables in the files that are empty (added by the GDAC, e.g. qartod_conductivity_spike_flag) and additional QC variables that are named something slightly different that aren't empty (from us, e.g. conductivity_qartod_spike_test). That just makes the variable lists in these datasets unnecessarily long and confusing. So just to clarify, if I submit a delayed mode dataset, is the GDAC still adding empty QC variables to the dataset?

Yes, that’s likely, as changes need to be implemented before the QARTOD GDAC variables for delayed mode data sets are no longer on the list of variables.

@lgarzio
Copy link

lgarzio commented Jul 26, 2024

Ok, if I submit these datasets now, will the empty QARTOD GDAC variables be removed from the datasets when that change is implemented? Or will the change only be implemented for datasets moving forward after the change is made?

@leilabbb
Copy link
Contributor Author

Ok, if I submit these datasets now, will the empty QARTOD GDAC variables be removed from the datasets when that change is implemented? Or will the change only be implemented for datasets moving forward after the change is made?

The changes will be applied to all files including the ones already in the system.

@leilabbb leilabbb self-assigned this Aug 13, 2024
@leilabbb leilabbb added the QC label Aug 13, 2024
@leilabbb
Copy link
Contributor Author

leilabbb commented Aug 13, 2024

@sarinamann-noaa
Copy link

request review by @kbailey-noaa for confirmation on this process. Can be closed per KB input.

@kbailey-noaa
Copy link
Contributor

Publish guidance on variable names etc for other users to follow as they attempt to submit delayed mode data sets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants