Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Flow #27

Open
Patowhiz opened this issue Mar 15, 2024 · 1 comment
Open

Data Flow #27

Patowhiz opened this issue Mar 15, 2024 · 1 comment

Comments

@Patowhiz
Copy link
Collaborator

Patowhiz commented Mar 15, 2024

Overview:

This proposal aims to standardise the data flow within Climsoft from initial entry to the generation of final products, emphasizing the need for consistent data source identification, unified storage, robust QC checks, and transparent logging for auditability.

Detailed Description:

Data Ingestion:
Data ingestion is done through 3 source types that define the data ingestion methods.

  • Forms: Allow users to manually input data via forms, capturing real-time observations.
  • Machine: Enable automated data capture from instruments and sensors.
  • Import: Provide functionality for batch imports of data from external sources.

Each entry method must clearly document the source of the data to ensure traceability. Each data source is associated with the source type.

Observations Table:

  • Centralise data storage by saving entries from all sources into one Observations table, maintaining data in its original form.
  • Ensure that the Observations table structure is conducive to identifying and querying the data source.

Quality Control (QC) Protocol:

  • Establish a comprehensive QC protocol that scrutinises data for accuracy and consistency.
  • Make corrections within the Observations table, allowing for real-time data integrity enhancement.

Logging and Audit Trails:

  • Create a robust logging system that captures every action taken on the data, including QC checks and edits.
  • Ensure that data change logs and QC test logs are transparent and easily retrievable for audit purposes.

Final Product Generation:

  • Define a clear pathway for data to be classified as 'final' post-QC for use in Climsoft's product generation.
  • Emphasize that final products are based on the highest quality, QC-verified observations.

Proposal for Enhancements:

  1. Streamlined Data Entry:

    • Formalize data entry procedures that require source identification for every data input.
  2. Quality Control Reinforcement:

    • Implement a unified QC system that is both rigorous and standardized across all data types.
  3. Auditability and Transparency:

    • Develop an enhanced logging system for full transparency and accountability of data modifications and QC results.
  4. Finality in Product Creation:

    • Introduce criteria within Climsoft to determine and label data as 'final' for the production of climatological outputs.

Rationale:

The integrity of Climsoft's data and the trust in its climatological products hinge on a clear, accountable, and verifiable data management process. This proposal seeks to reinforce these aspects, ensuring Climsoft remains a reliable and authoritative tool for meteorological and hydrological data processing.

Request for Team Feedback:

I request feedback from the development community to refine this proposal. Contributions from the development team are essential to the successful enhancement of Climsoft's data workflow.

1

@Patowhiz Patowhiz changed the title Data Flow in Climsoft Climsoft Data Flow Mar 18, 2024
@Patowhiz Patowhiz changed the title Climsoft Data Flow Data Flow Mar 30, 2024
@Patowhiz
Copy link
Collaborator Author

Patowhiz commented Jun 3, 2024

After reflecting on this, I think we can assume that there is only 2 primary sources: Form and Import.
Import will be data that comes from a file (through http, ftp etc) or an API.

This means there is no need to have the Machine/Digital data source. Note also, Automatic stations record data and save it to their data logger, it's from these data loggers that we can import the data. So there is no need of a machine to machine concept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant