-
Notifications
You must be signed in to change notification settings - Fork 23
Add more cell info checks #7067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
## Walkthrough
This update introduces new quality assurance (QA) checks for the FlowETL pipeline, specifically targeting cell information in the staging phase. It adds checks for missing or invalid latitude and longitude, new and duplicate cell IDs, and a general count of staging table rows. The QA check discovery and DAG creation logic are extended to support stage-specific QA checks.
## Changes
| File(s) | Change Summary |
|-------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|
| flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql,<br>garbage_lon.sql,<br>garbage_lat_or_lon.sql | Added SQL QA checks for missing/invalid latitude, longitude, or either coordinate in cell info staging tables. |
| flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql,<br>count_duped_ids.sql | Added SQL QA checks for new cell IDs and duplicate cell IDs in cell info staging tables. |
| flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql | Added SQL QA check to count all rows in a staging table. |
| flowetl/flowetl/flowetl/util.py | Introduced ETLStage enum, updated QA check discovery and DAG creation to support stage-specific QA checks and new parameters. |
| flowetl/tests/unit/test_qa_check_discovery.py | Updated and parameterised tests to support and validate multi-stage QA check discovery. |
| flowetl/flowetl/MANIFEST.in | Updated to include deeper directory levels for packaging QA check SQL files. |
| flowetl/flowetl/flowetl/qa_checks/final/*,<br>flowetl/flowetl/flowetl/qa_checks/staging/* | Added or refactored SQL QA checks for various CDR types and stages, including new checks for duplicates, counts, and references. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant DAG
participant Sensor
participant QA_Staging
participant Extract
participant QA_Extract
participant Gather
participant Attach
participant QA_Final
Sensor->>QA_Staging: Run staging QA checks
QA_Staging->>Extract: Proceed if passed
Extract->>QA_Extract: Run extract QA checks
QA_Extract->>Gather: Proceed if passed
Gather->>Attach: Continue DAG
Attach->>QA_Final: Run final QA checks Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Poem
|
d4a4b3c
to
4e5ca24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🔭 Outside diff range comments (11)
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imeis.sql (1)
1-7
: Use a symbolic link instead of duplicating SQL
This file duplicates the CDR-levelcount_null_imeis.sql
logic rather than linking to it. Per the PR objectives, it should be a Git symlink to../cdr/count_null_imeis.sql
to keep the checks DRY and simplify maintenance.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imsis.sql (1)
1-7
: Remove duplicated SQL and rely solely on the delegated script.
This file currently contains both the include reference and inline SQL that duplicates../cdr/count_null_imsis.sql
. For consistency, strip out lines 2–7 so the stub only references the shared script, and add the standard licence header at the top.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imei.sql (1)
1-19
: Remove inline logic and defer entirely to the shared script.
The wrapper duplicates the core query in lines 2–19. Replace this stub with only the reference to../cdr/max_msisdns_per_imei.sql
and prepend the project’s licence header.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_location_ids.sql (1)
1-5
: Consolidate to a single include stub.
Inline SQL in lines 2–5 duplicates../cdr/count_location_ids.sql
and risks divergence. Remove those lines so the file only contains the include, and add the FlowKit licence header.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imei.sql (1)
2-2
: Missing SELECT before aggregation
The linemax(msisdn_count)
needs a leadingSELECT
. Without it, the SQL will error.Suggested diff:
- max(msisdn_count) +SELECT max(msisdn_count)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_location_ids.sql (2)
2-2
: Missing SELECT before count(*)
The aggregationcount(*)
must be prefixed bySELECT
. Otherwise, the SQL will fail.Suggested diff:
- count(*) +SELECT count(*)
12-12
: Use proper NOT NULL syntax
Replacecl.geom_point NOTNULL
withcl.geom_point IS NOT NULL
to adhere to SQL standards.Suggested diff:
- AND cl.geom_point NOTNULL + AND cl.geom_point IS NOT NULLflowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imsis.sql (1)
2-2
: Missing SELECT before count(*)
The linecount(*)
needs to be preceded bySELECT
for the query to execute.Suggested diff:
- count(*) +SELECT count(*)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_events.sql (1)
1-9
: Insert project licence header
All new SQL files must include the organisation’s licence header at the top to ensure proper attribution and compliance.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_location_ids.sql (1)
1-14
: Inconsistent wrapper: duplicate query logicThis file both references the shared script (
../cdr/count_locatable_location_ids.sql
) and re-implements its SQL inline. To keep the code DRY and maintain consistency with other CDR-type directories, it should be a pure wrapper (or symlink) that includes only the relative path to thecdr
script.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imei.sql (1)
1-19
: Inconsistent wrapper: duplicate query logicThis wrapper references
../cdr/max_msisdns_per_imei.sql
but also contains the full query inline. Wrapper files should exclusively include (or symlink to) the shared CDR script to avoid duplication and simplify maintenance.
♻️ Duplicate comments (13)
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imsis.sql (1)
1-7
: Duplicate check: wrap CDR logic via symlink
As with the Topups version, this SMS-specific check should be a symlink to../cdr/count_null_imsis.sql
instead of embedding or referencing the SQL directly.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imeis.sql (1)
1-11
: Duplicate check: convert to symlink
The IMEI-count logic is currently inlined here. It ought to be replaced by a symlink pointing to../cdr/count_imeis.sql
to centralise the query and avoid fragmentation.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imsis.sql (1)
1-7
: Duplicate check: wrap CDR logic via symlink
Consistent with other CDR-type checks, this file should link to../cdr/count_null_imsis.sql
rather than maintaining its own copy of the SQL.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imeis.sql (1)
1-11
: Duplicate check: convert to symlink
The MDS-specific IMEI-count is implemented inline here. Replace this with a Git symlink to../cdr/count_imeis.sql
to unify the logic.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_events.sql (2)
1-9
: Duplicate: Explicitly include or symlink CDR-level SQL
Same as in the SMS version—please confirm the include mechanism or switch to a proper symlink.
1-9
: Duplicate: Insert project licence header
New SQL files require the standard licence header.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_location_ids.sql (2)
1-14
: Duplicate: Explicitly include or symlink CDR-level SQL
Ensure this wrapper either uses a proper include directive or is a true symlink to avoid code duplication.
1-14
: Duplicate: Insert project licence header
Please add the project’s licence header to this new file.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_events.sql (2)
1-9
: Duplicate: Explicitly include or symlink CDR-level SQL
As above, confirm the include strategy or convert to a real filesystem symlink.
1-9
: Duplicate: Insert project licence header
Add the standard licence header to comply with project guidelines.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_location_ids.sql (2)
1-7
: Duplicate: Explicitly include or symlink CDR-level SQL
Verify that the relative path is correctly included at execution time or replace with a proper symlink.
1-7
: Duplicate: Insert project licence header
Don't forget to prepend the organisation’s licence header to this newly added SQL file.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imei.sql (1)
1-19
: Duplicate of calls wrapper issueThis SMS wrapper exhibits the same pattern: it includes the shared CDR script and duplicates its logic inline. Please convert to a pure wrapper or symlink to avoid code duplication.
🧹 Nitpick comments (31)
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_location_ids.sql (1)
1-5
: Add standard license header.
New QA SQL files should include the project’s license header and a brief file-level comment for context, consistent with existing checks.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imeis.sql (1)
1-11
: Add standard license header.
Include the FlowKit license header and a file-level comment to maintain consistency across QA checks.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/latest_timestamp.sql (1)
1-1
: Add standard license header.
New wrapper files must include the project’s license header and a short description of the check.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_location_ids.sql (1)
1-7
: Add standard license header.
Please add the FlowKit project license header and a brief file description.flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lon.sql (1)
1-1
: Add standard license header.
Include the project’s license header and a short description to align with existing QA checks.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_location_ids.sql (1)
1-1
: Add a descriptive header and licence boilerplate.Wrapper files should start with a brief comment describing the QA check (e.g. “-- QA check: count rows with null location_id for SMS events”) and include the project’s standard licence header.
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imsi.sql (1)
1-1
: Add a descriptive header and licence boilerplate.Please prepend a short comment summarising the purpose of this check (e.g. “-- QA check: maximum distinct MSISDNs per IMSI for SMS staging”) along with the standard project licence header.
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imeis.sql (1)
1-1
: Add a descriptive header and licence boilerplate.As with other QA‐wrapper files, add a one-line comment describing the check (e.g. “-- QA check: count rows with null IMEI for calls”) and include the project’s licence header at the top.
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_events.sql (2)
1-1
: Add a descriptive header and licence boilerplate.Include a high-level comment like “-- QA check: count locatable call events joined to active cells” and the standard licence header before referencing the shared script.
9-9
: Use standardIS NOT NULL
syntax for clarity.PostgreSQL idiom prefers
cl.geom_point IS NOT NULL
overNOTNULL
. This improves readability and consistency.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_location_ids.sql (1)
1-1
: Add a descriptive header and licence boilerplate.Please add a comment indicating “-- QA check: distinct location_id count for SMS staging” plus the required project licence header at the top of this wrapper.
flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/latest_timestamp.sql (1)
1-1
: Verify reference and add license header. Ensure that the referenced fileflowetl/flowetl/flowetl/qa_checks/qa_checks/cdr/latest_timestamp.sql
exists at the correct path. Also include the project’s license header at the top of this new SQL file to maintain consistency with existing QA checks.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imeis.sql (1)
1-1
: Verify reference and add license header. Confirm thatflowetl/flowetl/flowetl/qa_checks/qa_checks/cdr/count_null_imeis.sql
is present and correctly referenced. Add the standard license header to this new file for consistency.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/earliest_timestamp.sql (1)
1-1
: Verify reference and add license header. Ensure thatflowetl/flowetl/flowetl/qa_checks/qa_checks/cdr/earliest_timestamp.sql
exists and is referenced correctly. Add the project license header to this new file to align with other QA scripts.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/earliest_timestamp.sql (1)
1-1
: Verify reference and add license header. Confirm the existence offlowetl/flowetl/flowetl/qa_checks/qa_checks/cdr/earliest_timestamp.sql
and correct the include path if needed. Include the project’s license header at the top for consistency.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imsi.sql (1)
1-1
: Verify reference and add license header. Check thatflowetl/flowetl/flowetl/qa_checks/qa_checks/cdr/max_msisdns_per_imsi.sql
exists at the expected location. Add the standard license header to this new file to match other QA checks.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_location_ids.sql (1)
13-14
: Use a descriptive aliasThe subquery alias
_
is non-descriptive. Please rename it to something likelocatable_ids
to improve readability and self-documentation.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicates.sql (1)
1-17
: Clarify duplicate-count query namingThis query sums all extra duplicate records (
sum(n_dupes)
). Withcount_duplicated.sql
counting duplicate groups, file names can be easily confused. Consider renaming this to something likesum_duplicate_records.sql
for clearer intent.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicated.sql (1)
1-17
: Clarify duplicate-group count namingThis script counts duplicate groups rather than individual extra records. To distinguish it from
count_duplicates.sql
, consider renaming it tocount_duplicate_groups.sql
.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_duplicated.sql (1)
1-17
: Consider simplifying subquery layers
You can remove the extra nesting by directly grouping and then counting, for example:WITH dupes AS ( SELECT msisdn, datetime, imsi, imei, tac, location_id, msisdn_counterpart, outgoing, duration, network, operator_code, country_code FROM {{ final_table }} GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12 HAVING COUNT(*) > 1 ) SELECT COUNT(*) FROM dupes;This flattens the query and may improve readability.
CHANGELOG.md (1)
10-16
: Fix markdown list indentation
Nested bullet items under the “Added” section use four spaces; Keep a Changelog style expects two. Please adjust indentation to two spaces for consistency.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_msisdns.sql (2)
1-10
: Consider using COUNT(DISTINCT) for clarity
You could streamline the logic to a single pass over the data:SELECT COUNT(DISTINCT msisdn) FROM ( SELECT msisdn FROM {{ final_table }} {% if cdr_type == "calls" or cdr_type == "sms" %} UNION ALL SELECT msisdn_counterpart FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL {% endif %} ) AS all_msisdns;This reduces nesting and may perform better on large tables.
1-10
: Use descriptive subquery aliases
Aliasing both inner and outer queries as_
can be confusing. Consider clearer names likeraw_msisdns
andunique_msisdns
.flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/new_cell_ids.sql (1)
1-5
: Optional: Switch to NOT EXISTS for new IDs
For potentially better performance and semantic clarity, try:SELECT COUNT(*) FROM {{ extract_table }} e WHERE NOT EXISTS ( SELECT 1 FROM infrastructure.cells c WHERE c.id = e.id AND c.version IS NOT NULL );flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_msisdns.sql (2)
1-10
: Simplify MSISDN count with COUNT(DISTINCT)
Rather than nested DISTINCTs, consider:SELECT COUNT(DISTINCT msisdn) FROM ( SELECT msisdn FROM {{ final_table }} {% if cdr_type in ["calls", "sms"] %} UNION ALL SELECT msisdn_counterpart FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL {% endif %} ) AS all_msisdns;This is more concise and may yield performance gains.
5-8
: Refine Jinja conditional syntax
For readability, replacecdr_type == "calls" or cdr_type == "sms"
withcdr_type in ["calls", "sms"]
.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_msisdns.sql (1)
1-10
: Add licence header, documentation and improve aliases.
This new SQL file lacks the project’s standard licence header and a descriptive comment at the top. Also, using_
for both inner and outer subquery aliases reduces readability. Consider renaming aliases (e.g.inner_msisdn
,distinct_msisdn
) and adding a brief explanation of the check.Would you like me to draft the header and summary?
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_duplicates.sql (2)
11-32
: Remove redundant CDR‐type branches.
Since this file is in thesms
directory,cdr_type
will always be'sms'
. Thecalls
,mds
andtopups
branches are unreachable and add noise. Simplify theGROUP BY
to only the SMS‐relevant columns:GROUP BY msisdn, datetime, imsi, imei, tac, location_id, msisdn_counterpart, outgoing, network, operator_code, country_code
1-4
: Add licence header and file description.
Please include the project’s licence header and a top‐of‐file comment summarising the purpose of this duplicate‐count check.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicated.sql (1)
1-3
: Add licence header and comment.
This file also needs the standard licence header and a brief description at the top explaining that it counts the number of top-ups groups with duplicates.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicates.sql (1)
1-35
: Simplify fortopups
only and add header.
Thecdr_type
conditionals for other types are redundant in thistopups
-specific file. Restrict theGROUP BY
to thetopups
fields only. Additionally, include the project licence header and a brief summary at the top.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (65)
CHANGELOG.md
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/count_duped_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat_or_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/new_cell_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/count_staging_rows.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/util.py
(0 hunks)
💤 Files with no reviewable changes (1)
- flowetl/flowetl/flowetl/util.py
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md
11-11: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
12-12: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
13-13: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (27)
flowetl/flowetl/flowetl/qa_checks/qa_checks/count_staging_rows.sql (1)
1-1
: Simple row count QA check is correctly implemented
This basicSELECT COUNT(*)
check is fine and consistent with existing QA patterns.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/latest_timestamp.sql (1)
1-1
: Approve wrapper for shared timestamp QA logic
This lightweight include of../cdr/latest_timestamp.sql
aligns with the modular QA checks pattern.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/latest_timestamp.sql (1)
1-1
: Approve wrapper for shared timestamp QA logic
Consistent reference to the core timestamp check undercdr
makes maintenance easier.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_location_ids.sql (1)
1-1
: Verify include syntax and subquery pattern.
The line../cdr/count_location_ids.sql
presumes the QA runner will inline this file. Confirm that this matches the established include mechanism (e.g. Jinja{% include %}
) and that the inner subquery is correctly consumed by the generic count logic.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imeis.sql (1)
1-11
: Ensure consistent wrapper structure.
Wrapper files normally define only the subquery (e.g.SELECT imei … GROUP BY imei
), relying on the generic CDR macro to supplySELECT count(*) FROM
. Here, thecount(*)
andFROM
clauses are in the wrapper itself, which may lead to double counting. Please confirm the intended include behaviour and remove redundant clauses if necessary.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/latest_timestamp.sql (1)
1-1
: Confirm include mechanism.
This file contains only../cdr/latest_timestamp.sql
. Ensure that this matches the loader’s include syntax and that the generic timestamp check is applied correctly.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_location_ids.sql (1)
1-7
: Clarify wrapper responsibilities.
This wrapper explicitly includescount(*) FROM {{ final_table }}
where other wrappers only provide the subquery. Confirm whether the generic null-check macro already wraps withcount(*)
, and remove the redundant clauses if so.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imsis.sql (1)
1-1
: Include path reference is correct
The relative include to../cdr/count_imsis.sql
aligns with the existing QA-check structure and will resolve properly during execution.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imeis.sql (1)
1-1
: Include directive follows established pattern
Referencing../cdr/count_null_imeis.sql
here is consistent with other CDR-type QA checks and maintains modularity.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imsi.sql (1)
1-1
: Consistent inclusion of generic SQL logic
The include of../cdr/max_msisdns_per_imsi.sql
matches the pattern used across CDR types, ensuring reuse and consistency.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imsis.sql (1)
1-1
: Modular reference is correct
Using../cdr/count_imsis.sql
here mirrors the approach in other subdirectories and promotes shared logic.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imsi.sql (1)
1-1
: File inclusion aligns with QA framework conventions
The path../cdr/max_msisdns_per_imsi.sql
is accurate and consistent with the overall QA-check design.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imeis.sql (1)
2-7
: Remove duplicate SQL logic. This wrapper file should only include the single-line reference to the shared CDR QA query. The inline query block (lines 2–7) duplicates the logic incdr/count_null_imeis.sql
and should be removed.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imsi.sql (1)
2-19
: Remove duplicate SQL logic. According to the PR’s design, this wrapper should only contain the reference to the shared CDR query. The inline subquery block (lines 2–19) duplicates the logic incdr/max_msisdns_per_imsi.sql
and must be removed.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_location_ids.sql (1)
1-5
: Inconsistent AI summary vs code: query logic present
The AI summary states this file only references../cdr/count_location_ids.sql
, but it includes an additional subquery withSELECT DISTINCT location_id
. Please reconcile the summary with the actual content or adjust the file to remove unintended logic.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imei.sql (1)
1-1
: Inconsistent AI summary vs code: unexpected SQL block
The AI summary claims this file only references../cdr/max_msisdns_per_imei.sql
, yet there’s an inline aggregation query below. Please update the summary or remove unintended code.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_location_ids.sql (1)
1-1
: Inconsistent AI summary vs code: file contains more than an include
The summary indicates this file only references../cdr/count_locatable_location_ids.sql
, but it defines the query logic as well. Please align the summary with the file content or remove unintended logic.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imsis.sql (1)
1-1
: Inconsistent AI summary vs code: extra query content
The AI summary suggests only a reference to../cdr/count_imsis.sql
, but this file also contains inline SQL. Please update the summary or trim the file to the intended include.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_location_ids.sql (1)
1-14
: Consistent reuse of shared CDR logic
This thin wrapper correctly delegates the locatable location ID count to the shared../cdr/count_locatable_location_ids.sql
script, promoting modularity and avoiding duplication.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imsis.sql (1)
1-11
: Thin wrapper delegating IMSI count to shared script
The wrapper correctly references../cdr/count_imsis.sql
and scopes it to the SMS final table, maintaining consistency with the project’s modular QA design.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_location_ids.sql (1)
1-7
: Wrapper correctly targets null location IDs
This new file cleanly delegates null location ID counting to the shared../cdr/count_null_location_ids.sql
script, fitting the QA framework’s DRY approach.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imsis.sql (1)
1-7
: Null IMSI count wrapper is consistent
Good reuse of../cdr/count_null_imsis.sql
, applying it to the calls CDR table and aligning with the existing QA check pattern.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imeis.sql (1)
1-11
: IMEI count wrapper follows established pattern
This wrapper cleanly delegates the IMEI count to the central../cdr/count_imeis.sql
script, preserving modularity and consistency across QA checks.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_events.sql (1)
1-9
: Explicitly include or symlink CDR-level SQL
The file begins with a bare relative path but no explicit include directive (e.g.\i
,-- include
) or actual filesystem symlink. Please verify that this is intended or adjust to use a proper include mechanism so the shared CDR logic is correctly pulled in at runtime.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_duplicated.sql (1)
1-17
: Verify duplication semantics
The query currently returns the number of groups with more than one occurrence. If the QA requirement is to count total duplicated rows (i.e. all extra occurrences beyond the first), please confirm whether this aligns with expectations.CHANGELOG.md (1)
19-19
: Verify CDR task id naming consistency
Ensure all generic CDR checks have been updated to include the<cdr_type>.
prefix (e.g.calls.count_imeis
) as described here.flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/new_cell_ids.sql (1)
3-5
: Verifyversion IS NULL
semantics
Confirm that a nullversion
truly indicates a new cell ID and that records with any non-nullversion
should be excluded.
@@ -0,0 +1 @@ | |||
SELECT count(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat ISNULL No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect NULL check syntax
The condition cell_lat ISNULL
is invalid SQL; use IS NULL
instead (and consider adding a semicolon at the end).
-SELECT count(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat ISNULL
+SELECT COUNT(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat IS NULL;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
SELECT count(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat ISNULL | |
SELECT COUNT(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat IS NULL; |
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat.sql at line
1, the SQL condition uses incorrect syntax `cell_lat ISNULL` for checking NULL
values. Replace `ISNULL` with the correct `IS NULL` syntax and add a semicolon
at the end of the SQL statement to properly terminate it.
SELECT COALESCE(sum(n_dupes), 0) FROM | ||
(SELECT count(*) - 1 as n_dupes | ||
FROM {{ staging_table }} | ||
GROUP BY | ||
cell_id | ||
HAVING count(*) - 1 > 1) tableWithCount No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct HAVING clause to capture all duplicates
The current HAVING count(*) - 1 > 1
filter only flags cell IDs appearing more than twice. To detect any duplicates (i.e., count > 1), adjust to either HAVING count(*) > 1
or HAVING count(*) - 1 > 0
. For clarity:
-HAVING count(*) - 1 > 1
+HAVING count(*) > 1
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
SELECT COALESCE(sum(n_dupes), 0) FROM | |
(SELECT count(*) - 1 as n_dupes | |
FROM {{ staging_table }} | |
GROUP BY | |
cell_id | |
HAVING count(*) - 1 > 1) tableWithCount | |
SELECT COALESCE(sum(n_dupes), 0) FROM | |
(SELECT count(*) - 1 as n_dupes | |
FROM {{ staging_table }} | |
GROUP BY | |
cell_id | |
HAVING count(*) > 1) tableWithCount |
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/count_duped_ids.sql
lines 1 to 6, the HAVING clause currently filters for cell IDs with more than
two occurrences by using "HAVING count(*) - 1 > 1". To correctly capture all
duplicates, change this condition to "HAVING count(*) > 1" or "HAVING count(*) -
1 > 0" so that any cell ID appearing more than once is included.
@@ -0,0 +1 @@ | |||
SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL syntax fix: use IS NULL
.
Replace the non-standard ISNULL
check with standard SQL IS NULL
for ANSI compliance.
Apply this diff:
-SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL
+SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long IS NULL
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lon.sql at line
1, replace the non-standard SQL syntax `ISNULL` with the standard `IS NULL` to
ensure ANSI SQL compliance. Update the WHERE clause condition from `cell_long
ISNULL` to `cell_long IS NULL`.
@@ -0,0 +1 @@ | |||
../cdr/earliest_timestamp.sql No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add licence header and consider symlink for wrapper file.
This stub correctly delegates to ../cdr/earliest_timestamp.sql
but is missing the project’s standard licence header. You could also replace it with a symbolic link to avoid managing duplicated stubs.
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/earliest_timestamp.sql at
line 1, add the project's standard licence header at the top of the file to
comply with licensing requirements. Additionally, instead of keeping this stub
SQL file with duplicated content, replace it with a symbolic link pointing to
../cdr/earliest_timestamp.sql to avoid duplication and simplify maintenance.
@@ -0,0 +1 @@ | |||
../cdr/earliest_timestamp.sql No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add licence header and consider symlink for wrapper file.
As above, include the standard FlowKit licence header and optionally use a symbolic link to ../cdr/earliest_timestamp.sql
rather than a separate stub.
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/earliest_timestamp.sql at
line 1, add the standard FlowKit licence header at the top of the file.
Additionally, consider replacing this file with a symbolic link to
../cdr/earliest_timestamp.sql to avoid duplication and maintain consistency.
@@ -0,0 +1 @@ | |||
SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL OR cell_lat = -1 OR cell_lat ISNULL No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use proper NULL check syntax
PostgreSQL requires IS NULL
rather than ISNULL
. Change both cell_long ISNULL
and cell_lat ISNULL
accordingly and consider adding a trailing semicolon for consistency.
Suggested diff:
-SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL OR cell_lat = -1 OR cell_lat ISNULL
+SELECT count(*)
+FROM {{ staging_table }}
+WHERE cell_long = -1
+ OR cell_long IS NULL
+ OR cell_lat = -1
+ OR cell_lat IS NULL;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL OR cell_lat = -1 OR cell_lat ISNULL | |
SELECT count(*) | |
FROM {{ staging_table }} | |
WHERE cell_long = -1 | |
OR cell_long IS NULL | |
OR cell_lat = -1 | |
OR cell_lat IS NULL; |
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat_or_lon.sql
at line 1, replace the incorrect NULL checks using ISNULL with the correct
PostgreSQL syntax IS NULL for both cell_long and cell_lat. Also, add a trailing
semicolon at the end of the SQL statement for consistency.
{{ final_table }} | ||
{% if cdr_type == "calls" or cdr_type == "sms" %} | ||
UNION ALL | ||
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct SQL null‐check syntax.
The clause WHERE msisdn_counterpart NOTNULL
is not valid in standard SQL; it should read IS NOT NULL
.
Apply this diff:
- (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
+ (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL)
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_msisdns.sql at line 7,
replace the invalid SQL null-check syntax `WHERE msisdn_counterpart NOTNULL`
with the correct standard SQL syntax `WHERE msisdn_counterpart IS NOT NULL`.
SELECT count(*) FROM | ||
(SELECT count(*) as n_dupes | ||
FROM {{ final_table }} | ||
GROUP BY | ||
msisdn, | ||
datetime, | ||
imsi, | ||
imei, | ||
tac, | ||
location_id, | ||
{% if cdr_type == 'calls' %} | ||
msisdn_counterpart, | ||
outgoing, | ||
duration, | ||
network, | ||
{% elif cdr_type == 'sms' %} | ||
msisdn_counterpart, | ||
outgoing, | ||
network, | ||
{% elif cdr_type == 'mds' %} | ||
duration, | ||
volume_total, | ||
volume_upload, | ||
volume_download, | ||
{% elif cdr_type == 'topups' %} | ||
type, | ||
recharge_amount, | ||
airtime_fee, | ||
tax_and_fee, | ||
pre_event_balance, | ||
post_event_balance, | ||
{% endif %} | ||
operator_code, | ||
country_code | ||
HAVING count(*) > 1) tableWithCount No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unify duplicate‐count filename.
The filename count_duplicated.sql
deviates from the count_duplicates.sql
naming used in other CDR‐type directories. This discrepancy may break automated discovery of QA checks. Please rename to count_duplicates.sql
(and update any references).
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/topups/count_duplicated.sql lines 1 to 35,
the filename count_duplicated.sql does not match the naming convention
count_duplicates.sql used in other CDR-type directories. Rename the file to
count_duplicates.sql and update any references to this file in the codebase or
configuration to ensure automated QA check discovery works correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🔭 Outside diff range comments (20)
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_location_ids.sql (1)
1-5
: Invalid SQL inclusion and incomplete query structure
Simply placing a relative file path (../cdr/count_location_ids.sql
) is not valid SQL. You need to use the templating engine’s include or import directive, and wrap theSELECT DISTINCT
block inside a properSELECT count(*)
if you intend to count uniquelocation_id
s. Please clarify the intended templating mechanism or replace it with an explicit query.Could you confirm how shared CDR-level SQL files are included here? For example, with Jinja you might use:
{% include "../cdr/count_location_ids.sql" %}Or rewrite explicitly:
SELECT COUNT(*) AS distinct_location_ids FROM ( SELECT DISTINCT location_id FROM {{ final_table }} ) AS locations;flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imsi.sql (1)
1-19
: Avoid duplicating SQL logic
This file should solely reference the shared../cdr/max_msisdns_per_imsi.sql
definition. The embedded SQL block duplicates the CDR-level query and may drift out of sync. Remove the inline SQL and rely on the include only.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_location_ids.sql (1)
1-14
: Remove inline SQL duplication
This type-specific file should solely reference../cdr/count_locatable_location_ids.sql
. The inlineSELECT
block duplicates shared logic and risks inconsistency. Remove the embedded SQL and rely on the single include.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_location_ids.sql (1)
1-6
: Remove inline SQL duplication
The inlineSELECT
block duplicates the shared definition in../cdr/count_null_location_ids.sql
. Keep only the include line to ensure a single source of truth.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imsi.sql (1)
1-19
: Remove duplicate SQL logic: use only the shared CDR referenceThis type‐level QA file should only include the
../cdr/max_msisdns_per_imsi.sql
reference. The inline SQL (lines 2–19) duplicates the shared query and breaks the include pattern. Please strip out lines 2–19 so that the file only contains:../cdr/max_msisdns_per_imsi.sql
Apply this diff:
- ../cdr/max_msisdns_per_imsi.sql - max(msisdn_count) - FROM ( - SELECT - imsi, - count(*) AS msisdn_count - FROM ( - SELECT - imsi, - msisdn - FROM - {{ final_table }} - WHERE imsi IS NOT NULL - GROUP BY - imsi, - msisdn) _ - GROUP BY - imsi) _ + ../cdr/max_msisdns_per_imsi.sqlflowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imsis.sql (1)
1-7
: Remove inline SQL after the CDR includeFollowing the established pattern, this file should only reference the shared CDR query. Remove lines 2–7 so that only the include remains:
../cdr/count_null_imsis.sql
- ../cdr/count_null_imsis.sql - count(*) - FROM - {{ final_table }} - WHERE - imsi IS NULL + ../cdr/count_null_imsis.sqlflowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imsis.sql (1)
1-7
: Remove inline SQL after the CDR includeTo match other QA files, strip out lines 2–7 so that the file only contains:
../cdr/count_null_imsis.sql
- ../cdr/count_null_imsis.sql - count(*) - FROM - {{ final_table }} - WHERE - imsi IS NULL + ../cdr/count_null_imsis.sqlflowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imsis.sql (1)
1-11
: Remove inline SQL after the CDR includeThis SMS-type QA file should only include the shared reference. Remove lines 2–11 so that only the include remains:
../cdr/count_imsis.sql
- ../cdr/count_imsis.sql - count(*) - FROM ( - SELECT - imsi - FROM - {{ final_table }} - WHERE imsi IS NOT NULL - GROUP BY - imsi) _ + ../cdr/count_imsis.sqlflowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imeis.sql (1)
1-11
: Remove duplicated SQL logic from wrapper file
This wrapper file should only reference the sharedcdr/count_imeis.sql
. Embedding the full query duplicates logic and risks maintenance drift. Please remove lines 2–11 and retain only the reference line.@@ -1,11 +1,1 -../cdr/count_imeis.sql - count(*) -FROM ( - SELECT - imei - FROM - {{ final_table }} - WHERE imei IS NOT NULL - GROUP BY - imei) _ +../cdr/count_imeis.sqlflowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imsis.sql (1)
1-11
: Remove duplicated SQL logic from wrapper file
This wrapper file should only reference the sharedcdr/count_imsis.sql
. Embedding the full query duplicates logic and risks maintenance drift. Please remove lines 2–11 and retain only the reference line.@@ -1,11 +1,1 -../cdr/count_imsis.sql - count(*) -FROM ( - SELECT - imsi - FROM - {{ final_table }} - WHERE imsi IS NOT NULL - GROUP BY - imsi) _ +../cdr/count_imsis.sqlflowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imsi.sql (1)
1-19
: Remove duplicated SQL logic from wrapper file
This wrapper file should only reference the sharedcdr/max_msisdns_per_imsi.sql
. Embedding the full query duplicates logic and risks maintenance drift. Please remove lines 2–19 and retain only the reference line.@@ -1,19 +1,1 -../cdr/max_msisdns_per_imsi.sql - max(msisdn_count) -FROM ( - SELECT - imsi, - count(*) AS msisdn_count - FROM ( - SELECT - imsi, - msisdn - FROM - {{ final_table }} - WHERE imsi IS NOT NULL - GROUP BY - imsi, - msisdn) _ - GROUP BY - imsi) _ +../cdr/max_msisdns_per_imsi.sqlflowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imsi.sql (1)
1-19
: Remove duplicated SQL logic from wrapper file
This wrapper file should only reference the sharedcdr/max_msisdns_per_imsi.sql
. Embedding the full query duplicates logic and risks maintenance drift. Please remove lines 2–19 and retain only the reference line.@@ -1,19 +1,1 -../cdr/max_msisdns_per_imsi.sql - max(msisdn_count) -FROM ( - SELECT - imsi, - count(*) AS msisdn_count - FROM ( - SELECT - imsi, - msisdn - FROM - {{ final_table }} - WHERE imsi IS NOT NULL - GROUP BY - imsi, - msisdm) _ - GROUP BY - imsi) _ +../cdr/max_msisdns_per_imsi.sqlflowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imsis.sql (1)
1-11
: Remove duplicated SQL logic from wrapper file
This wrapper file should only reference the sharedcdr/count_imsis.sql
. Embedding the full query duplicates logic and risks maintenance drift. Please remove lines 2–11 and retain only the reference line.@@ -1,11 +1,1 -../cdr/count_imsis.sql - count(*) -FROM ( - SELECT - imsi - FROM - {{ final_table }} - WHERE imsi IS NOT NULL - GROUP BY - imsi) _ +../cdr/count_imsis.sqlflowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imsis.sql (1)
1-11
: Wrapper file should only reference shared CDR SQL.
Thecount_imsis.sql
undertopups
currently inlines the full query (lines 2–11) instead of delegating entirely to../cdr/count_imsis.sql
. This duplicates code and breaks the established pattern. Please remove the embedded SQL and leave only the relative-path reference.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imei.sql (1)
1-19
: Remove inlined SQL logic from wrapper.
This MDS wrapper embeds the entiremax_msisdns_per_imei
query rather than deferring to../cdr/max_msisdns_per_imei.sql
. To maintain consistency and avoid duplication, strip out lines 2–19 so the file only contains the path reference.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_location_ids.sql (1)
1-7
: Remove inlined SQL logic from wrapper.
The SMS wrapper currently includes the fullcount_null_location_ids
query (lines 2–7) instead of solely pointing to../cdr/count_null_location_ids.sql
. Please remove the extra SQL so only the relative-path reference remains.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_events.sql (1)
7-9
: Invalid SQL syntax for null check.In PostgreSQL you must use
IS NOT NULL
rather thanNOTNULL
.Apply this patch:
- AND cl.geom_point NOTNULL + AND cl.geom_point IS NOT NULLflowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_events.sql (1)
7-9
: Use correct null check syntax.Replace
cl.geom_point NOTNULL
withcl.geom_point IS NOT NULL
to conform with PostgreSQL syntax.Apply this patch:
- AND cl.geom_point NOTNULL + AND cl.geom_point IS NOT NULLflowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_events.sql (1)
9-9
: Invalid null check syntax
PostgreSQL requiresIS NOT NULL
inWHERE
clauses, notNOTNULL
.
Apply this diff:- AND cl.geom_point NOTNULL + AND cl.geom_point IS NOT NULLflowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_location_ids.sql (1)
12-12
: Invalid null check syntax
Replace the non-standardNOTNULL
withIS NOT NULL
in theWHERE
clause.- AND cl.geom_point NOTNULL + AND cl.geom_point IS NOT NULL
🧹 Nitpick comments (26)
flowetl/flowetl/flowetl/qa_checks/qa_checks/count_staging_rows.sql (1)
1-1
: Add a column alias for clarity
Returning an unnamedCOUNT(*)
can make downstream parsing ambiguous. Consider aliasing the result tototal_rows
and formatting the SQL for readability.- SELECT COUNT(*) FROM {{ staging_table }} + SELECT COUNT(*) AS total_rows + FROM {{ staging_table }}flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_location_ids.sql (1)
1-5
: Add a descriptive header and licence notice.All SQL QA scripts should begin with a brief description of their purpose and include the project’s licence header for consistency.
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/earliest_timestamp.sql (1)
1-1
: Include licence header and description.Please add the project licence header and a concise comment summarising the QA check’s intent for uniformity across all SQL scripts.
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imei.sql (1)
1-19
: Add header and improve alias naming.Include the project’s licence header and a brief description. Also, replace the generic alias
_
with a more descriptive identifier for better readability.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_location_ids.sql (1)
1-7
: Add documentation and test coverage
These newly added QA checks lack accompanying documentation, ADR entries, and test cases for validation. Please update the QA docs and include tests to ensure maintainability and coverage.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/latest_timestamp.sql (1)
1-1
: Add standard license header
All new SQL files should include the project’s standard license header for consistency.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_location_ids.sql (1)
1-5
: Include standard license header
Please add the repository’s SQL license header to this new file.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imeis.sql (1)
1-10
: Add project license header
Ensure the standard license statement appears at the top of this new SQL file.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_location_ids.sql (1)
1-14
: Include standard license header
Add the repository’s SQL license header at the top to maintain compliance.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imeis.sql (1)
1-6
: Add standard license header
Please prepend the project’s license header to this new SQL file for consistency.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_events.sql (1)
1-1
: Missing license header
Please add the standard Flowminder license header at the top of this SQL file to comply with project conventions.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imeis.sql (1)
1-1
: Missing license header
Add the standard Flowminder license header at the top of this SQL file to align with project guidelines.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_location_ids.sql (1)
1-1
: Missing license header
Please include the standard Flowminder license header at the top of this SQL file.flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lon.sql (1)
1-1
: Missing license header
Add the standard Flowminder license header at the top of this SQL file to ensure compliance.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_duplicated.sql (2)
1-1
: Missing license header
Please add the standard Flowminder license header at the top of this SQL file.
17-17
: Use snake_case for table alias
The aliastableWithCount
mixes camelCase; prefer snake_case (table_with_count
) for consistency.- ) tableWithCount + ) table_with_countflowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicated.sql (1)
1-17
: Consider factoring out repeated grouping keysThe
GROUP BY
clause here duplicates the same set of columns used across several QA checks. Extract these common grouping columns into a shared Jinja include to reduce repetition and simplify future updates.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicated.sql (1)
1-35
: Extract common grouping logic into an include fileThis query’s
GROUP BY
block mirrors those in other CDR-type checks with only minor variations. Moving the shared column list into a Jinja include (and then extending it per type) will DRY up the code and make maintenance easier.CHANGELOG.md (2)
10-15
: Fix unordered list indentationThe sub-items under
- Added new cell info qa checks to flowetl:
are indented by 4 spaces, but markdownlint expects 2-space indentation for nested lists. Please adjust the indent to 2 spaces for proper formatting.
19-20
: Remove trailing commaThe entry
'count_imeis' will now be identified as '<cdr_type>.count_imeis',
ends with an extraneous comma. Drop the comma to match the style elsewhere in the changelog.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_msisdns.sql (1)
2-8
: Correct NULL predicate and standardise SQL styleUse
IS NOT NULL
instead ofNOTNULL
for valid SQL syntax, switch to single quotes in the Jinja conditional to match other checks, and uppercase theDISTINCT
keyword. For example:- SELECT count(*) FROM (SELECT distinct msisdn FROM + SELECT count(*) FROM ( + SELECT DISTINCT msisdn FROM {{ final_table }} - {% if cdr_type == "calls" or cdr_type == "sms" %} - UNION ALL - (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL) + {% if cdr_type == 'calls' or cdr_type == 'sms' %} + UNION ALL + SELECT msisdn_counterpart AS msisdn + FROM {{ final_table }} + WHERE msisdn_counterpart IS NOT NULL {% endif %}flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_msisdns.sql (1)
1-10
: Use descriptive aliases instead of_
Aliasing both subqueries as_
can be confusing and may lead to conflicts. Consider more meaningful names, for example:-SELECT count(*) FROM (SELECT distinct msisdn FROM - ( … ) _ -) _ +SELECT count(*) FROM ( + SELECT DISTINCT msisdn + FROM ( + … + ) AS msisdn_union +) AS distinct_msisdnsflowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_msisdns.sql (1)
1-10
: Use descriptive aliases instead of_
Both nested subqueries use_
as an alias, which reduces readability. A clearer pattern is:-SELECT count(*) FROM (SELECT distinct msisdn FROM - ( … ) _ -) _ +SELECT count(*) FROM ( + SELECT DISTINCT msisdn + FROM ( + … + ) AS msisdn_union +) AS distinct_msisdnsflowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicates.sql (2)
11-20
: Consolidate duplicate grouping logic for calls and sms
Both thecdr_type == 'calls'
andcdr_type == 'sms'
branches group by the same columns (msisdn_counterpart, outgoing, network
). Merge these into a single condition:-{% if cdr_type == 'calls' %} - msisdn_counterpart, - outgoing, - network, -{% elif cdr_type == 'sms' %} - msisdn_counterpart, - outgoing, - network, -{% endif %} +{% if cdr_type in ['calls', 'sms'] %} + msisdn_counterpart, + outgoing, + network, +{% endif %}
1-35
: Standardise Jinja string quoting
This template uses single quotes ('calls'
), whereas other QA SQL templates (e.g.,count_msisdns.sql
) use double quotes ("calls"
). For consistency and to reduce confusion, choose and apply one quoting style across all QA checks.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_duplicates.sql (1)
11-20
: Consolidate duplicate grouping logic for calls and sms
Identical grouping is used for bothcdr_type == 'calls'
andcdr_type == 'sms'
. Simplify by merging into a single branch:-{% if cdr_type == 'calls' %} - msisdn_counterpart, - outgoing, - network, -{% elif cdr_type == 'sms' %} - msisdn_counterpart, - outgoing, - network, -{% endif %} +{% if cdr_type in ['calls', 'sms'] %} + msisdn_counterpart, + outgoing, + network, +{% endif %}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (65)
CHANGELOG.md
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/count_duped_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat_or_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/new_cell_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/count_staging_rows.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/util.py
(0 hunks)
💤 Files with no reviewable changes (1)
- flowetl/flowetl/flowetl/util.py
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md
11-11: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
12-12: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
13-13: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (30)
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_location_ids.sql (1)
1-1
: ```shell
#!/bin/bash
set -e1. List all occurrences of count_location_ids.sql in the repo
echo "=== All count_location_ids.sql files ==="
find . -type f -name count_location_ids.sql2. Show the first 10 lines of the calls version to inspect header and include
CALLS_FILE="flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_location_ids.sql"
echo "=== First 10 lines of ${CALLS_FILE} ==="
sed -n '1,10p' "${CALLS_FILE}"3. Check for a relative include of the CDR version
echo "=== Include directives in calls version ==="
grep -n "../cdr/count_location_ids.sql" "${CALLS_FILE}" || echo "No '../cdr/count_location_ids.sql' include found"4. Check for usage of alias '_' in the calls version
echo "=== Alias '_' usage in calls version ==="
grep -n " as _" "${CALLS_FILE}" || echo "No alias ' as _' found"5. Verify that the CDR version exists at the expected path
CDR_FILE="flowetl/flowetl/flowetl/qa_checks/qa_checks/cdr/count_location_ids.sql"
if [ -f "${CDR_FILE}" ]; then
echo "✅ CDR file exists at ${CDR_FILE}"
else
echo "❌ Expected CDR file missing at ${CDR_FILE}"
fi</details> <details> <summary>flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/earliest_timestamp.sql (1)</summary> `1-1`: ```shell #!/usr/bin/env bash set -euo pipefail echo "Finding all earliest_timestamp.sql files:" fd earliest_timestamp.sql || true echo echo "Displaying first 50 lines of the SMS script:" sms_file=$(fd earliest_timestamp.sql | grep '/sms/' || true) if [[ -n "$sms_file" ]]; then sed -n '1,50p' "$sms_file" else echo "⛔ SMS earliest_timestamp.sql not found" fi echo echo "Displaying first 50 lines of the CDR script:" cdr_file=$(fd earliest_timestamp.sql | grep '/cdr/' || true) if [[ -n "$cdr_file" ]]; then sed -n '1,50p' "$cdr_file" else echo "⛔ CDR earliest_timestamp.sql not found" fi echo echo "Checking for a license header in the SMS script:" if [[ -n "$sms_file" ]]; then head -n5 "$sms_file" | rg -i 'license|copyright' || echo "No license header detected in first 5 lines" else echo "Skipping license check; SMS file missing" fi
flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/latest_timestamp.sql (1)
1-1
: Correct include-only patternThis file correctly references the shared CDR query without additional SQL logic. LGTM.
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/latest_timestamp.sql (1)
1-1
: Wrapper correctly references shared CDR SQL.
This file follows the established pattern by containing only../cdr/latest_timestamp.sql
. No further changes needed.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/earliest_timestamp.sql (1)
1-1
: Wrapper correctly references shared CDR SQL.
This file correctly delegates to../cdr/earliest_timestamp.sql
without additional logic. No updates required.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_location_ids.sql (1)
1-1
: ```shell
#!/usr/bin/env bashVerify the symlink points to the expected shared CDR QA check and resolves correctly
target="flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_location_ids.sql"
expected="../cdr/count_locatable_location_ids.sql"actual=$(readlink "$target")
echo "Symlink target: $actual"if [ "$actual" != "$expected" ]; then
echo "Unexpected symlink target: $actual (expected $expected)"
exit 1
fiif [ ! -e "$(dirname "$target")/$actual" ]; then
echo "Broken target path: $(dirname "$target")/$actual"
exit 1
fi</details> <details> <summary>flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imeis.sql (1)</summary> `1-1`: ```shell #!/usr/bin/env bash # Detailed validation of the count_null_imeis.sql symlink symlink="flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imeis.sql" echo "1. Symlink details:" ls -l "$symlink" || { echo "Symlink not found"; exit 1; } echo echo "2. Linked target (readlink):" linked=$(readlink "$symlink") echo " -> $linked" echo echo "3. Resolved reference path and existence check:" ref_path="$(dirname "$symlink")/$linked" echo " Resolved to: $ref_path" if [ -e "$ref_path" ]; then echo " ✅ Target file exists" else echo " ❌ Target file is missing" fi echo echo "4. All occurrences of count_null_imeis.sql in the repo:" rg -l "count_null_imeis\.sql" || echo "No other instances found"
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imsis.sql (1)
1-1
: Further verify symlink resolution and locate the shared SQL file#!/usr/bin/env bash # Inspect the symlink target and ensure the shared SQL exists target="flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imsis.sql" echo "Symlink target:" readlink -f "$target" echo -e "\nAll instances of count_null_imsis.sql in the repo:" rg 'count_null_imsis.sql' -g '*.sql'flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_events.sql (1)
1-1
: Let's verify that the symlink points to the expected shared QA check:#!/usr/bin/env bash # Confirm the symlink destination for count_locatable_events.sql target="flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_events.sql" expected="../cdr/count_locatable_events.sql" actual=$(readlink "$target") if [ "$actual" != "$expected" ]; then echo "Symlink '$target' points to '$actual', expected '$expected'" exit 1 fiflowetl/flowetl/flowetl/qa_checks/qa_checks/topups/earliest_timestamp.sql (1)
1-1
: Verify symlink for earliest_timestamp.sql
Confirm that../cdr/earliest_timestamp.sql
resolves correctly and is not broken in CI or the packaged distribution.#!/usr/bin/env bash # Validate earliest timestamp QA check symlink target="flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/earliest_timestamp.sql" if [ ! -L "$target" ] || [ ! -e "$target" ]; then echo "Broken or missing symlink: $target" exit 1 fiflowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imei.sql (1)
1-19
: Invalid file content; expected symlink
This file contains both the relative-path pointer and the full SQL query, which will lead to SQL parsing errors. According to the PR’s restructuring, this should be a pure symbolic link to../cdr/max_msisdns_per_imei.sql
.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_location_ids.sql (1)
1-7
: Unexpected mixed pointer and SQL; revert to symlink
The file mixes the symlink indicator (../cdr/count_null_location_ids.sql
) with actual query content. It should be a standalone symbolic link or contain only the SQL logic—not both.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imei.sql (1)
1-19
: Invalid file content; expected symlink
As with the SMS version, this file erroneously includes both the path pointer and the SQL body. Please replace it with a pure symbolic link to../cdr/max_msisdns_per_imei.sql
.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/latest_timestamp.sql (1)
1-1
: Symlink pattern is correct
This file correctly points to the shared CDR QA check at../cdr/latest_timestamp.sql
, aligning with the new modular structure.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_location_ids.sql (1)
1-1
: Inconsistent AI summary regarding file contents.The AI-generated summary states this file contains only a symlink reference, but it also includes the
SELECT DISTINCT location_id
logic. Please update the summary or refactor the file to match project conventions.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/earliest_timestamp.sql (1)
1-1
: Verify symlink usage.This file appears to act as a placeholder to the CDR-level
earliest_timestamp.sql
. Confirm that it’s implemented as a filesystem symlink (mode120000
) so that SQL runners follow it correctly, rather than duplicating the content.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_events.sql (1)
1-1
: Inconsistent AI summary: additional SQL logic present.The summary indicates only a reference, but this file includes a JOIN with
infrastructure.cells
and a filter ongeom_point
. Update the summary or split the logic appropriately.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imei.sql (1)
1-1
: Inconsistent AI summary: full query included.The AI summary describes only a symlink, but the file contains the full subquery logic. Please update the summary or adjust the file to match how generic CDR checks are handled.
Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_events.sql (1)
1-1
: Inconsistent AI summary: logic beyond a reference.The summary suggests just a relative include, but the file includes JOIN logic and a
geom_point
filter. Please correct the summary or reorganise this file to align with project structure.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imeis.sql (1)
1-7
: Consistent wrapper for null IMEIs
This thin wrapper correctly references the shared CDR-level query (count_null_imeis.sql
) and uses the{{ final_table }}
placeholder as intended.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imsis.sql (1)
1-7
: Consistent wrapper for null IMSIs
This wrapper accurately delegates to the shared CDR-level query (count_null_imsis.sql
), and the relative path and templating usage are correct.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imeis.sql (1)
1-7
: Consistent wrapper for calls null IMEIs
Reuses the shared CDR-level query for null IMEIs in the calls namespace. The path and{{ final_table }}
placeholder are correctly applied.flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imeis.sql (1)
1-11
: Distinct IMEIs count is correctly implemented
The subquery for distinct non-null IMEIs is appropriately included viacount_imeis.sql
, and the outercount(*)
is applied correctly against{{ final_table }}
.flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_location_ids.sql (1)
1-7
: Consistent wrapper for null location IDs
This wrapper delegates to the shared CDR-level query (count_null_location_ids.sql
) as expected, with correct templating.flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/latest_timestamp.sql (1)
1-1
: Wrapper file referencing shared QA logic
This file correctly delegates to the commoncdr/latest_timestamp.sql
check.flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_location_ids.sql (1)
1-5
: Inconsistent with AI summary: wrapper vs inlined logic
The AI summary indicates this file should only reference../cdr/count_location_ids.sql
, but it also contains an inlined subquery. Please confirm the intended pattern—either drop the duplicated SQL block here or remove the in-file logic and rely solely on the shared CDR script.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imeis.sql (1)
1-10
: Inconsistent with AI summary: wrapper vs inlined logic
The summary states this should be a pure reference to../cdr/count_imeis.sql
, yet the file includes the query body. Confirm whether the inline SQL belongs here or should reside exclusively in the shared CDR folder.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_location_ids.sql (1)
1-14
: Inconsistent with AI summary: wrapper vs inlined logic
According to the summary, this file should only include../cdr/count_locatable_location_ids.sql
, but it also carries the full query. Please clarify and either remove the inline SQL or shift it entirely to the shared CDR script.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imeis.sql (1)
1-6
: Inconsistent with AI summary: wrapper vs inlined logic
The file both references../cdr/count_null_imeis.sql
and provides its own query. Confirm whether the inline SQL is intended here or if it should be consolidated in the shared CDR directory.Likely an incorrect or invalid review comment.
flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicates.sql (1)
1-17
: LGTMThe query correctly computes the total number of duplicate rows by summing
(count(*) - 1)
per group and defaulting to zero. It aligns with existing QA checks.
@@ -0,0 +1 @@ | |||
SELECT count(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat ISNULL No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Invalid NULL check syntax
The use of ISNULL
is not standard SQL (in Postgres it should be IS NULL
). Additionally, consider adding an AS
alias to the count for consistency with other checks.
- SELECT count(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat ISNULL
+ SELECT count(*) AS invalid_lat_count
+ FROM {{ staging_table }}
+ WHERE cell_lat = -1 OR cell_lat IS NULL
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
SELECT count(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat ISNULL | |
SELECT count(*) AS invalid_lat_count | |
FROM {{ staging_table }} | |
WHERE cell_lat = -1 OR cell_lat IS NULL |
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat.sql at line
1, replace the non-standard `ISNULL` with the correct Postgres syntax `IS NULL`
for checking null values. Also, add an `AS` alias to the count result for
consistency, for example `count(*) AS garbage_lat_count`.
SELECT COALESCE(sum(n_dupes), 0) FROM | ||
(SELECT count(*) - 1 as n_dupes | ||
FROM {{ staging_table }} | ||
GROUP BY | ||
cell_id | ||
HAVING count(*) - 1 > 1) tableWithCount No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HAVING clause logic excludes single duplicates
The condition HAVING count(*) - 1 > 1
only flags IDs with more than two occurrences, omitting valid single-duplicate cases (2 occurrences yields 1 duplicate). Update the filter to > 0
so any duplicate is captured, and alias the outer result.
- SELECT COALESCE(sum(n_dupes), 0) FROM
- (SELECT count(*) - 1 as n_dupes
- FROM {{ staging_table }}
- GROUP BY
- cell_id
- HAVING count(*) - 1 > 1) tableWithCount
+ SELECT COALESCE(SUM(n_dupes), 0) AS duplicate_count
+ FROM (
+ SELECT COUNT(*) - 1 AS n_dupes
+ FROM {{ staging_table }}
+ GROUP BY cell_id
+ HAVING COUNT(*) - 1 > 0
+ ) AS table_with_count;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
SELECT COALESCE(sum(n_dupes), 0) FROM | |
(SELECT count(*) - 1 as n_dupes | |
FROM {{ staging_table }} | |
GROUP BY | |
cell_id | |
HAVING count(*) - 1 > 1) tableWithCount | |
SELECT COALESCE(SUM(n_dupes), 0) AS duplicate_count | |
FROM ( | |
SELECT COUNT(*) - 1 AS n_dupes | |
FROM {{ staging_table }} | |
GROUP BY cell_id | |
HAVING COUNT(*) - 1 > 0 | |
) AS table_with_count; |
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/count_duped_ids.sql
lines 1 to 6, the HAVING clause uses `count(*) - 1 > 1`, which excludes IDs with
exactly two occurrences (single duplicates). Change the condition to `count(*) -
1 > 0` to include all duplicates, and add an alias to the outer SELECT result
for clarity.
SELECT | ||
count(*) | ||
FROM {{ extract_table }} LEFT OUTER JOIN infrastructure.cells | ||
USING (id) | ||
WHERE version ISNULL No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid NULL check syntax and missing alias
ISNULL
is non-standard; use IS NULL
. Also alias the count and align the join formatting.
- SELECT
- count(*)
- FROM {{ extract_table }} LEFT OUTER JOIN infrastructure.cells
- USING (id)
- WHERE version ISNULL
+ SELECT COUNT(*) AS new_cell_id_count
+ FROM {{ extract_table }}
+ LEFT JOIN infrastructure.cells USING (id)
+ WHERE version IS NULL;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
SELECT | |
count(*) | |
FROM {{ extract_table }} LEFT OUTER JOIN infrastructure.cells | |
USING (id) | |
WHERE version ISNULL | |
SELECT COUNT(*) AS new_cell_id_count | |
FROM {{ extract_table }} | |
LEFT JOIN infrastructure.cells USING (id) | |
WHERE version IS NULL; |
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/new_cell_ids.sql at
lines 1 to 5, replace the non-standard `ISNULL` with the correct SQL syntax `IS
NULL` for the null check. Add an alias to the count result for clarity, such as
`AS count_missing_version`. Also, adjust the JOIN clause formatting to improve
readability by placing the JOIN keyword and table on separate lines aligned
properly.
@@ -0,0 +1 @@ | |||
SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL OR cell_lat = -1 OR cell_lat ISNULL No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix SQL syntax for NULL checks
The use of ISNULL
is invalid in standard SQL. Replace with IS NULL
and terminate the statement with a semicolon. For example:
SELECT count(*) FROM {{ staging_table }}
WHERE cell_long = -1
OR cell_long IS NULL
OR cell_lat = -1
OR cell_lat IS NULL;
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat_or_lon.sql
at line 1, the SQL uses the invalid syntax ISNULL for null checks. Replace all
occurrences of ISNULL with the correct standard SQL syntax IS NULL and add a
semicolon at the end of the statement to properly terminate it.
@@ -0,0 +1 @@ | |||
SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid null check syntax
Use IS NULL
instead of ISNULL
in the WHERE
clause.
-SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL
+SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long IS NULL
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lon.sql at line
1, the SQL query uses the incorrect syntax `ISNULL` to check for null values.
Replace `cell_long ISNULL` with the correct syntax `cell_long IS NULL` to
properly check for nulls in the WHERE clause.
{% if cdr_type == "calls" or cdr_type == "sms" %} | ||
UNION ALL | ||
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect null‐check syntax
The clause WHERE msisdn_counterpart NOTNULL
is not valid SQL. Replace it with IS NOT NULL
to perform proper null checks.
Apply this diff:
-(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
+(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL)
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_msisdns.sql around
lines 5 to 7, replace the invalid null check syntax `WHERE msisdn_counterpart
NOTNULL` with the correct SQL syntax `WHERE msisdn_counterpart IS NOT NULL` to
properly filter out null values.
{% if cdr_type == "calls" or cdr_type == "sms" %} | ||
UNION ALL | ||
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect null‐check syntax
The clause WHERE msisdn_counterpart NOTNULL
will fail at runtime. Use IS NOT NULL
instead:
-(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
+(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL)
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_msisdns.sql around
lines 5 to 7, the null check syntax `WHERE msisdn_counterpart NOTNULL` is
incorrect and will cause a runtime error. Replace `NOTNULL` with the correct SQL
syntax `IS NOT NULL` to properly filter non-null values.
SELECT COALESCE(sum(n_dupes), 0) FROM | ||
(SELECT count(*) - 1 as n_dupes | ||
FROM {{ final_table }} | ||
GROUP BY | ||
msisdn, | ||
datetime, | ||
imsi, | ||
imei, | ||
tac, | ||
location_id, | ||
{% if cdr_type == 'calls' %} | ||
msisdn_counterpart, | ||
outgoing, | ||
duration, | ||
network, | ||
{% elif cdr_type == 'sms' %} | ||
msisdn_counterpart, | ||
outgoing, | ||
network, | ||
{% elif cdr_type == 'mds' %} | ||
duration, | ||
volume_total, | ||
volume_upload, | ||
volume_download, | ||
{% elif cdr_type == 'topups' %} | ||
type, | ||
recharge_amount, | ||
airtime_fee, | ||
tax_and_fee, | ||
pre_event_balance, | ||
post_event_balance, | ||
{% endif %} | ||
operator_code, | ||
country_code | ||
HAVING count(*) > 1) tableWithCount No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Unify file naming for duplicate counts
The Calls QA check uses count_duplicated.sql
while SMS and Topups use count_duplicates.sql
. Align filenames and identifiers (e.g. standardise on count_duplicates.sql
) to ensure consistency across all CDR types.
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_duplicates.sql lines 1
to 35, the filename and identifiers for counting duplicates differ from those
used in the Calls QA check, which uses count_duplicated.sql. Rename the Calls QA
check file and any related identifiers to count_duplicates.sql to standardize
naming conventions across all CDR types, ensuring consistency in file names and
references.
FlowAuth
|
Project |
FlowAuth
|
Branch Review |
add-garbage-cell-info-check
|
Run status |
|
Run duration | 00m 46s |
Commit |
|
Committer | Jonathan Gray |
View all properties for this run ↗︎ |
Test results | |
---|---|
|
0
|
|
0
|
|
0
|
|
0
|
|
4
|
View all changes introduced in this branch ↗︎ |
4e5ca24
to
771d862
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_msisdns.sql (1)
1-10
: Simplify SQL and reduce nested subqueriesConsider replacing the double-nested
UNION ALL
plus outerDISTINCT
with a single-levelCOUNT(DISTINCT ...)
on aUNION
ofmsisdn
andmsisdn_counterpart
. This reduces query complexity and improves readability.Suggested diff:
-SELECT count(*) FROM (SELECT distinct msisdn FROM - (SELECT msisdn - FROM - {{ final_table }} - {% if cdr_type == "calls" or cdr_type == "sms" %} - UNION ALL - (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL) - {% endif %} - ) _ - ) _ +SELECT COUNT(DISTINCT msisdn) FROM ( + SELECT msisdn FROM {{ final_table }} + {% if cdr_type == "calls" or cdr_type == "sms" %} + UNION + SELECT msisdn_counterpart FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL + {% endif %} +) sub_msisdn;flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_msisdns.sql (1)
1-10
: Simplify SQL by usingCOUNT(DISTINCT)
andUNION
Instead of nested
DISTINCT
subqueries, use a directCOUNT(DISTINCT ...)
on aUNION
ofmsisdn
andmsisdn_counterpart
, and give the subquery a clear alias. This flattens the structure and may improve performance.Proposed diff:
-SELECT count(*) FROM (SELECT distinct msisdn FROM - (SELECT msisdn - FROM - {{ final_table }} - {% if cdr_type == "calls" or cdr_type == "sms" %} - UNION ALL - (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL) - {% endif %} - ) _ - ) _ +SELECT COUNT(DISTINCT msisdn) FROM ( + SELECT msisdn FROM {{ final_table }} + {% if cdr_type == "calls" or cdr_type == "sms" %} + UNION + SELECT msisdn_counterpart FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL + {% endif %} +) sub_msisdn;CHANGELOG.md (1)
10-16
: Fix markdown list indentation for consistency.The unordered list items have inconsistent indentation. According to the markdown linting rules, list items should use 2-space indentation rather than 4-space indentation.
Apply this diff to fix the indentation:
- Added new cell info qa checks to flowetl: - - Missing latitude - - Missing longitude - - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868) - - Number of new cell ids - - Number of duplicate ids +- Added new cell info qa checks to flowetl: + - Missing latitude + - Missing longitude + - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868) + - Number of new cell ids + - Number of duplicate ids
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (68)
CHANGELOG.md
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicated.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicates.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/count_duped_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/new_cell_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/count_staging_rows.sql
(1 hunks)flowetl/flowetl/flowetl/util.py
(7 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(7 hunks)
💤 Files with no reviewable changes (2)
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (57)
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lon.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/new_cell_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/count_staging_rows.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imei.sql
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
38-38: Undefined name DAG
(F821)
🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md
11-11: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
12-12: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
13-13: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: CodeQL-Build
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (4)
flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicates.sql (1)
1-16
: Correctly counts SMS duplicate recordsThe query calculates per-group duplicate counts (
count(*) - 1
), sums them, and defaults to zero if none exist. It aligns with the intended metric and is consistent with other CDR types.flowetl/tests/unit/test_qa_check_discovery.py (1)
10-60
: Well-structured test refactoring for multi-stage QA checks.The refactoring properly separates QA checks by stage and uses parameterised testing to ensure comprehensive coverage across all ETL stages. The code is clean and maintains good test practices.
flowetl/flowetl/flowetl/util.py (2)
26-34
: Well-designed enum for ETL stages.The
ETLStage
enum provides a clean way to represent the different ETL stages and integrates well with the enhanced QA check functionality.
367-447
: Excellent implementation of stage-aware QA checks in DAG creation.The modification to
create_dag
function properly integrates QA checks at different ETL stages, ensuring comprehensive quality assurance throughout the pipeline. The placement of staging checks after flux sensors and extract/final checks at appropriate points in the workflow demonstrates good understanding of the ETL process.
SELECT COALESCE(sum(n_dupes), 0) FROM | ||
(SELECT count(*) - 1 as n_dupes | ||
FROM {{ staging_table }} | ||
GROUP BY | ||
cell_id | ||
HAVING count(*) - 1 > 1) tableWithCount No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix duplicate count logic
The HAVING count(*) - 1 > 1
condition excludes groups with exactly two occurrences, so those duplicates aren’t counted. It should use HAVING count(*) > 1
to capture all duplicates.
Apply this diff:
- HAVING count(*) - 1 > 1
+ HAVING count(*) > 1
🤖 Prompt for AI Agents
In
flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/count_duped_ids.sql
lines 1 to 6, the HAVING clause uses `count(*) - 1 > 1`, which misses groups
with exactly two duplicates. Change the condition to `HAVING count(*) > 1` to
correctly include all groups with duplicates. This will ensure all duplicate
counts are captured accurately.
SELECT count(*) FROM | ||
(SELECT count(*) as n_dupes | ||
FROM {{ final_table }} | ||
GROUP BY | ||
msisdn, | ||
datetime, | ||
imsi, | ||
imei, | ||
tac, | ||
location_id, | ||
duration, | ||
volume_total, | ||
volume_upload, | ||
volume_download, | ||
operator_code, | ||
country_code | ||
HAVING count(*) > 1) tableWithCount No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align duplicate counting logic with other CDR types
This query currently returns the number of duplicate groups instead of the total duplicate records. For consistency with the SMS implementation it should subtract one per group and sum those values, wrapping with COALESCE
.
Recommended diff:
-SELECT count(*) FROM
- (SELECT count(*) as n_dupes
+SELECT COALESCE(sum(n_dupes), 0) FROM (
+ SELECT count(*) - 1 as n_dupes
FROM {{ final_table }}
GROUP BY
msisdn,
datetime,
imsi,
imei,
tac,
location_id,
duration,
volume_total,
volume_upload,
volume_download,
operator_code,
country_code
- HAVING count(*) > 1) tableWithCount
+ HAVING count(*) > 1
+) sub_duplicates;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
SELECT count(*) FROM | |
(SELECT count(*) as n_dupes | |
FROM {{ final_table }} | |
GROUP BY | |
msisdn, | |
datetime, | |
imsi, | |
imei, | |
tac, | |
location_id, | |
duration, | |
volume_total, | |
volume_upload, | |
volume_download, | |
operator_code, | |
country_code | |
HAVING count(*) > 1) tableWithCount | |
SELECT COALESCE(sum(n_dupes), 0) FROM ( | |
SELECT count(*) - 1 as n_dupes | |
FROM {{ final_table }} | |
GROUP BY | |
msisdn, | |
datetime, | |
imsi, | |
imei, | |
tac, | |
location_id, | |
duration, | |
volume_total, | |
volume_upload, | |
volume_download, | |
operator_code, | |
country_code | |
HAVING count(*) > 1 | |
) sub_duplicates; |
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicated.sql
lines 1 to 17, the query counts the number of duplicate groups rather than the
total duplicate records. To fix this, modify the query to sum (count(*) - 1) for
each group to get the total duplicates, and wrap the sum with COALESCE to handle
nulls, ensuring consistency with the SMS implementation.
{{ final_table }} | ||
{% if cdr_type == "calls" or cdr_type == "sms" %} | ||
UNION ALL | ||
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use standard SQL syntax for null checking.
The NOTNULL
condition should use the standard SQL syntax IS NOT NULL
for better compatibility and clarity.
Apply this diff to fix the SQL syntax:
- (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
+ (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL) | |
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL) |
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql at
line 7, replace the non-standard `NOTNULL` condition with the standard SQL
syntax `IS NOT NULL` to ensure better compatibility and clarity in the null
check.
|
||
def get_qa_checks( | ||
*, | ||
dag: Optional["DAG"] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined type annotation.
The DAG
type is used in the type hint but not imported, causing a static analysis error.
Apply this diff to fix the type annotation:
def get_qa_checks(
*,
- dag: Optional["DAG"] = None,
+ dag: Optional["airflow.DAG"] = None,
additional_qa_check_paths: Optional[List[str]] = None,
stage: Optional[ETLStage] = ETLStage.FINAL,
) -> List["QACheckOperator"]:
Alternatively, add the import at the top of the file:
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+ from airflow import DAG
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
dag: Optional["DAG"] = None, | |
def get_qa_checks( | |
*, | |
dag: Optional["airflow.DAG"] = None, | |
additional_qa_check_paths: Optional[List[str]] = None, | |
stage: Optional[ETLStage] = ETLStage.FINAL, | |
) -> List["QACheckOperator"]: |
🧰 Tools
🪛 Ruff (0.11.9)
38-38: Undefined name DAG
(F821)
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/util.py at line 38, the type annotation uses DAG
which is not imported, causing a static analysis error. Fix this by importing
DAG from its module at the top of the file, ensuring the type annotation is
recognized and the error is resolved.
771d862
to
b07e934
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
flowetl/flowetl/flowetl/util.py (1)
40-40
: Undefined type annotation issue persists.The
DAG
type annotation issue flagged in the past review remains unresolved.flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql (1)
7-7
: SQL syntax issue remains unaddressed.The non-standard
NOTNULL
condition should use the standard SQL syntaxIS NOT NULL
as flagged in the previous review.
🧹 Nitpick comments (1)
flowetl/flowetl/flowetl/util.py (1)
8-8
: Remove unused import.The
StrEnum
import is not used in the code. TheETLStage
class usesstr, Enum
instead.-from enum import Enum, StrEnum +from enum import Enum
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (67)
flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicated.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicates.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/count_duped_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/new_cell_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/count_staging_rows.sql
(1 hunks)flowetl/flowetl/flowetl/util.py
(7 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(7 hunks)
💤 Files with no reviewable changes (2)
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicates.sql
🚧 Files skipped from review as they are similar to previous changes (63)
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lon.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/count_duped_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/count_staging_rows.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/new_cell_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicates.sql
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
8-8: enum.StrEnum
imported but unused
Remove unused import: enum.StrEnum
(F401)
38-38: Undefined name DAG
(F821)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: CodeQL-Build
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (7)
flowetl/flowetl/flowetl/util.py (6)
26-34
: Well-implemented enum for ETL stages.The
ETLStage
enum is properly implemented with clear documentation and appropriate inheritance fromstr
andEnum
for string-based operations.
57-59
: Clear documentation for the new stage parameter.The docstring properly documents the new
stage
parameter with appropriate default value and explanation.
73-89
: Robust stage-aware path handling.The implementation correctly:
- Validates the stage parameter using
ETLStage(stage)
- Constructs stage-specific paths for both default and additional QA check paths
- Maintains backward compatibility with existing template search paths
260-261
: Enhanced documentation for stage-specific paths.The updated docstring clearly explains the requirement for organising additional QA check paths under stage subdirectories.
367-378
: Proper integration of staging QA checks.The staging QA checks are correctly integrated into both flux sensor workflows, running after flux checks but before extraction. The consistent pattern between FILE and TABLE flux sensor types ensures uniform behaviour.
Also applies to: 386-397
428-433
: Appropriate placement of extract stage QA checks.The extract stage QA checks are properly positioned after extraction (and optional clustering) but before constraints and indexes, which is the logical sequence for data quality validation.
flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql (1)
1-10
: Well-structured query logic for MSISDN counting.The query design correctly:
- Uses nested subqueries to ensure proper deduplication
- Conditionally includes counterpart MSISDNs only for calls and SMS types
- Handles the union and distinct operations in the appropriate order
The overall approach for counting unique MSISDNs across both regular and counterpart fields is sound.
30eee7b
to
42eb441
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
flowetl/flowetl/flowetl/util.py (1)
38-38
: Fix undefined type annotation.The
DAG
type is used in the type hint but not imported, causing a static analysis error.Apply this diff to fix the type annotation:
+from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from airflow import DAGThen update the type hint:
- dag: Optional["DAG"] = None, + dag: Optional[DAG] = None,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (67)
flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicated.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicates.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/count_duped_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/new_cell_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/count_staging_rows.sql
(1 hunks)flowetl/flowetl/flowetl/util.py
(7 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(7 hunks)
💤 Files with no reviewable changes (2)
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (1)
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/count_duped_ids.sql
🚧 Files skipped from review as they are similar to previous changes (63)
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lon.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/new_cell_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/count_staging_rows.sql
- flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_msisdns.sql
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
38-38: Undefined name DAG
(F821)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: CodeQL-Build
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (6)
flowetl/flowetl/flowetl/util.py (6)
26-34
: Well-designed enum for ETL stage management.The
ETLStage
enum provides a clean way to represent different ETL phases and will help organise QA checks systematically. The string inheritance allows for easy serialisation and template path construction.
40-40
: Good addition of stage parameter with sensible default.The stage parameter with
ETLStage.FINAL
as default maintains backward compatibility whilst enabling stage-specific QA checks.
73-73
: Proper validation of stage parameter.The explicit conversion to
ETLStage
ensures type safety and will raise a clear error if an invalid stage is provided.
75-89
: Well-implemented stage-aware path construction.The modification to include stage subdirectories in both default and additional QA check paths enables proper organisation of checks by ETL stage.
430-435
: Verify the purpose of the EmptyOperator 'gather' task.The
EmptyOperator
named "gather" is inserted after extract QA checks but its purpose isn't immediately clear from the context. Ensure this is necessary for the DAG structure.Could you clarify the purpose of the "gather" task? Is it used to synchronise multiple QA check tasks before proceeding to constraints and indexes?
448-448
: Final QA checks correctly default to FINAL stage.The final QA checks appropriately use the default stage parameter (FINAL), maintaining the existing behaviour whilst being explicit about the stage.
7144700
to
c646b91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (5)
flowetl/flowetl/flowetl/util.py (4)
38-38
: The DAG type annotation issue remains unresolved.
377-377
: Use enum values consistently instead of string literals.
396-396
: Use enum values consistently instead of string literals.
435-435
: Use enum value for extract stage.flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql (1)
1-10
: Duplicate: apply the same SQL enhancements as forfinal/topups/count_msisdns.sql
All critical fixes, optional refactors, and nitpicks aroundIS NOT NULL
,COUNT(DISTINCT ...)
, alias improvements, and macro extraction apply identically here.
🧹 Nitpick comments (5)
flowetl/flowetl/flowetl/util.py (2)
78-78
: Remove debug print statement from production code.This debug print statement should be removed before merging to production.
- print(default_path)
92-92
: Remove debug print statement from production code.This debug print statement should be removed before merging to production.
- print(jinja_env.list_templates())
flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql (3)
1-10
: Optional refactor: simplify using COUNT(DISTINCT)
Nestings can be replaced with a singleCOUNT(DISTINCT msisdn)
over the unioned set for improved readability and performance.-SELECT count(*) FROM ( - SELECT distinct msisdn - FROM - {{ final_table }} - {% if cdr_type == "calls" or cdr_type == "sms" %} - UNION ALL - (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL) - {% endif %} - ) _ -) _ +SELECT COUNT(DISTINCT msisdn) +FROM ( + SELECT msisdn + FROM {{ final_table }} + {% if cdr_type == "calls" or cdr_type == "sms" %} + UNION ALL + SELECT msisdn_counterpart + FROM {{ final_table }} + WHERE msisdn_counterpart IS NOT NULL + {% endif %} +) AS all_msisdns;
9-10
: Nitpick: improve alias naming and terminate statement
Using_
as an alias is ambiguous; consider a descriptive alias (e.g.all_msisdns
) and ensure the statement ends with a semicolon.
1-10
: Advice: consolidate duplicated logic via Jinja macro
This MSISDN-counting logic is repeated across multiple QA checks. Extract it into a shared Jinja template or macro to DRY up the SQL and simplify future maintenance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (68)
flowetl/flowetl/MANIFEST.in
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
(1 hunks)flowetl/flowetl/flowetl/util.py
(8 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(9 hunks)
💤 Files with no reviewable changes (2)
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
✅ Files skipped from review due to trivial changes (58)
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
- flowetl/flowetl/MANIFEST.in
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
🚧 Files skipped from review as they are similar to previous changes (1)
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
38-38: Undefined name DAG
(F821)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: CodeQL-Build
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (3)
flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql (1)
1-1
: LGTM! The query correctly identifies invalid longitude values.The logic properly checks for both explicit invalid values (-1) and null longitude values.
flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql (1)
1-5
: LGTM! The query correctly identifies new cell IDs.The LEFT OUTER JOIN with NULL check properly identifies staging records that don't have corresponding entries in the infrastructure.cells table.
flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql (1)
1-10
: LGTM! The query correctly counts distinct MSISDNs with proper conditional logic.The implementation properly:
- Combines MSISDNs from both primary and counterpart columns for relevant CDR types
- Uses UNION ALL followed by DISTINCT for efficient deduplication
- Handles NULL counterpart values appropriately
SELECT COALESCE(sum(n_dupes), 0) FROM | ||
(SELECT count(*) - 1 as n_dupes | ||
FROM {{ staging_table }} | ||
GROUP BY | ||
cell_id | ||
HAVING count(*) - 1 > 1) tableWithCount No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the duplicate detection logic.
The HAVING condition count(*) - 1 > 1
will miss cases where there are exactly 2 duplicate records (which should count as 1 duplicate). This condition only includes groups with 3 or more records.
Apply this diff to fix the logic:
HAVING count(*) - 1 > 1) tableWithCount
+ HAVING count(*) > 1) tableWithCount
This change ensures that groups with 2 or more records (i.e., at least 1 duplicate) are included in the count.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql lines
1 to 6, the HAVING clause uses `count(*) - 1 > 1`, which excludes groups with
exactly 2 records and thus misses some duplicates. Change the HAVING condition
to `count(*) - 1 >= 1` to include groups with at least one duplicate, ensuring
all duplicate cases are counted correctly.
{{ final_table }} | ||
{% if cdr_type == "calls" or cdr_type == "sms" %} | ||
UNION ALL | ||
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: use proper SQL null check syntax
The condition msisdn_counterpart NOTNULL
is invalid in most SQL dialects. It should read msisdn_counterart IS NOT NULL
.
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql at line 7,
replace the invalid condition "msisdn_counterpart NOTNULL" with the correct SQL
syntax "msisdn_counterpart IS NOT NULL" to properly check for non-null values.
4788654
to
490afbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)
38-38
: Fix the undefined type annotation.The
DAG
type is used in the type hint but not imported, causing a static analysis error.Apply this diff to fix the type annotation:
def get_qa_checks( *, - dag: Optional["DAG"] = None, + dag: Optional["airflow.DAG"] = None, additional_qa_check_paths: Optional[List[str]] = None, stage: Optional[ETLStage] = ETLStage.FINAL, ) -> List["QACheckOperator"]:Alternatively, add the import at the top of the file:
+from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from airflow import DAG
369-379
: Use enum values consistently instead of string literals.The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.
Apply this diff to use enum values consistently:
- stage="staging", + stage=ETLStage.STAGING,
388-398
: Use enum values consistently instead of string literals.The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.
Apply this diff to use enum values consistently:
- stage="staging", + stage=ETLStage.STAGING,
431-436
: Use enum value for extract stage.Consistent with the staging stage issue, use the enum value instead of a string literal.
Apply this diff:
- additional_qa_check_paths=additional_qa_check_paths, stage="extract" + additional_qa_check_paths=additional_qa_check_paths, stage=ETLStage.EXTRACT
🧹 Nitpick comments (1)
CHANGELOG.md (1)
10-22
: Fix markdown list indentation for consistency.The changelog content is comprehensive and well-structured, but the list indentation is inconsistent with the expected markdown formatting.
Apply this diff to fix the indentation:
- Added new cell info qa checks to flowetl: - - Missing latitude - - Missing longitude - - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868) - - Number of new cell ids - - Number of duplicate ids +- Added new cell info qa checks to flowetl: + - Missing latitude + - Missing longitude + - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868) + - Number of new cell ids + - Number of duplicate ids
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (75)
CHANGELOG.md
(1 hunks)flowdb/testdata/bin/run_qa_checks.py
(1 hunks)flowdb_synthetic_data.Dockerfile
(1 hunks)flowdb_synthetic_data.Dockerfile.dockerignore
(1 hunks)flowdb_testdata.Dockerfile
(1 hunks)flowdb_testdata.Dockerfile.dockerignore
(1 hunks)flowetl/flowetl/MANIFEST.in
(1 hunks)flowetl/flowetl/flowetl/operators/qa_check_operator.py
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
(1 hunks)flowetl/flowetl/flowetl/util.py
(8 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(9 hunks)
💤 Files with no reviewable changes (2)
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (2)
- flowdb_testdata.Dockerfile
- flowdb_synthetic_data.Dockerfile
🚧 Files skipped from review as they are similar to previous changes (68)
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
- flowetl/flowetl/MANIFEST.in
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
- flowdb_synthetic_data.Dockerfile.dockerignore
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
- flowdb_testdata.Dockerfile.dockerignore
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
- flowetl/flowetl/flowetl/operators/qa_check_operator.py
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
38-38: Undefined name DAG
(F821)
🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md
11-11: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
12-12: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
13-13: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (5)
flowetl/flowetl/flowetl/util.py (4)
26-34
: LGTM: Well-defined enum for ETL stages.The
ETLStage
enum provides clear type safety for the different ETL stages and follows good practices by inheriting from bothstr
andEnum
.
40-40
: LGTM: Good addition of stage parameter.The addition of the
stage
parameter with a sensible default enhances the flexibility of the QA check system.
73-89
: LGTM: Stage-aware path construction.The logic correctly constructs paths based on the ETL stage, enabling stage-specific QA checks. The use of
ETLStage(stage)
ensures type validation.
186-187
: LGTM: Additional QA check path parameters.The new parameters
additional_staging_qa_check_paths
andadditional_extract_qa_check_paths
provide good flexibility for stage-specific QA checks.flowdb/testdata/bin/run_qa_checks.py (1)
110-110
: LGTM: Simplified display name construction.The change to always use the filename stem for the display name is consistent with the stage-aware QA check reorganisation. This simplification improves clarity whilst maintaining unique identification across the new directory structure.
490afbc
to
0ab5ddd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)
38-38
: Fix undefined type annotation.The
DAG
type is used in the type hint but not imported, causing a static analysis error.Apply this diff to fix the type annotation:
def get_qa_checks( *, - dag: Optional["DAG"] = None, + dag: Optional["airflow.DAG"] = None, additional_qa_check_paths: Optional[List[str]] = None, stage: Optional[ETLStage] = ETLStage.FINAL, ) -> List["QACheckOperator"]:Alternatively, add the import at the top of the file:
+from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from airflow import DAG
369-379
: Use enum values consistently instead of string literals.The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.
Apply this diff to use enum values consistently:
- stage="staging", + stage=ETLStage.STAGING,
388-398
: Use enum values consistently instead of string literals.The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.
Apply this diff to use enum values consistently:
- stage="staging", + stage=ETLStage.STAGING,
430-437
: Use enum value for extract stage and approve the gather operator.The gather operator provides a clean synchronisation point after extract-stage QA checks. However, maintain consistency by using enum values instead of string literals.
Apply this diff:
- additional_qa_check_paths=additional_qa_check_paths, stage="extract" + additional_qa_check_paths=additional_qa_check_paths, stage=ETLStage.EXTRACT
🧹 Nitpick comments (1)
CHANGELOG.md (1)
10-21
: Fix list indentation for consistency.The changelog content accurately documents the new QA check features, but the list indentation is inconsistent according to Markdown standards.
Apply this diff to fix the indentation:
### Added - - Missing latitude - - Missing longitude - - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868) - - Number of new cell ids - - Number of duplicate ids + - Missing latitude + - Missing longitude + - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868) + - Number of new cell ids + - Number of duplicate ids
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (75)
CHANGELOG.md
(1 hunks)flowdb/testdata/bin/run_qa_checks.py
(1 hunks)flowdb_synthetic_data.Dockerfile
(1 hunks)flowdb_synthetic_data.Dockerfile.dockerignore
(1 hunks)flowdb_testdata.Dockerfile
(1 hunks)flowdb_testdata.Dockerfile.dockerignore
(1 hunks)flowetl/flowetl/MANIFEST.in
(1 hunks)flowetl/flowetl/flowetl/operators/qa_check_operator.py
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
(1 hunks)flowetl/flowetl/flowetl/util.py
(8 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(9 hunks)
💤 Files with no reviewable changes (2)
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (2)
- flowdb_testdata.Dockerfile
- flowdb_synthetic_data.Dockerfile
🚧 Files skipped from review as they are similar to previous changes (69)
- flowetl/flowetl/MANIFEST.in
- flowdb_synthetic_data.Dockerfile.dockerignore
- flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
- flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
- flowdb_testdata.Dockerfile.dockerignore
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
- flowetl/flowetl/flowetl/operators/qa_check_operator.py
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
- flowdb/testdata/bin/run_qa_checks.py
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
38-38: Undefined name DAG
(F821)
🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md
11-11: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
12-12: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
13-13: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
🔇 Additional comments (4)
flowetl/flowetl/flowetl/util.py (4)
26-34
: LGTM! Well-structured enum definition.The
ETLStage
enum is properly defined with clear string values representing the different ETL stages.
40-40
: Good enhancement for stage-specific QA checks.The addition of the
stage
parameter with proper default value and documentation enables stage-specific QA check filtering, which aligns well with the new ETL stage architecture.Also applies to: 56-59
73-89
: Approve the path construction logic.The modification to construct stage-specific paths by appending the stage value to the default and additional QA check paths is correctly implemented and maintains backward compatibility.
186-187
: Well-documented new parameters for stage-specific QA checks.The addition of
additional_staging_qa_check_paths
andadditional_extract_qa_check_paths
parameters enhances the flexibility of the DAG creation function and is properly documented.Also applies to: 261-262
ce43417
to
4d2b9cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)
38-38
: Fix undefined type annotation.The
DAG
type is used in the type hint but not imported, causing a static analysis error.Apply this diff to fix the type annotation:
+from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from airflow import DAGAlternatively, use the fully qualified name:
- dag: Optional["DAG"] = None, + dag: Optional["airflow.DAG"] = None,
369-379
: Use enum values consistently instead of string literals.The staging QA checks integration is well-structured, but the stage parameter uses a string literal rather than the defined enum value, reducing type safety.
Apply this diff to use enum values consistently:
- stage="staging", + stage=ETLStage.STAGING,
388-398
: Use enum values consistently instead of string literals.Similar to the file flux sensor integration, this section should use the enum value rather than a string literal for type safety.
Apply this diff:
- stage="staging", + stage=ETLStage.STAGING,
429-436
: Good task flow design but use enum value for extract stage.The task flow with the
EmptyOperator
for gathering is well-designed, but the stage parameter should use the enum value for consistency.Apply this diff:
- additional_qa_check_paths=additional_qa_check_paths, stage="extract" + additional_qa_check_paths=additional_qa_check_paths, stage=ETLStage.EXTRACT
🧹 Nitpick comments (3)
flowetl/flowetl/flowetl/operators/qa_check_operator.py (1)
10-10
: Fix line length violation whilst maintaining correct logic.The logic for extracting the task ID prefix is correct and aligns with the new task ID format that includes CDR type and stage suffixes. However, the line exceeds the maximum length limit.
Consider breaking the line to improve readability:
- (date '{{{{ ds }}}}', '{{{{ params.cdr_type }}}}', '{{{{ task.task_id.split(".")[0] }}}}', ({sql}), NOW()) + (date '{{{{ ds }}}}', '{{{{ params.cdr_type }}}}', + '{{{{ task.task_id.split(".")[0] }}}}', ({sql}), NOW())CHANGELOG.md (1)
10-16
: Fix list indentation for consistency.The changelog entries comprehensively document the new QA checks and related changes. However, the nested list items should use 2-space indentation rather than 4-space indentation for consistency with markdown best practices.
Apply this diff to fix the indentation:
-- Added new cell info qa checks to flowetl: - - Missing latitude - - Missing longitude - - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868) - - Number of new cell ids - - Number of duplicate ids +- Added new cell info qa checks to flowetl: + - Missing latitude + - Missing longitude + - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868) + - Number of new cell ids + - Number of duplicate idsflowetl/flowetl/flowetl/util.py (1)
57-58
: Fix typo in docstring.There's a typo in the docstring - "ETLSTage" should be "ETLStage".
Apply this diff:
- stage : ETLStage, default ETLSTage.FINAL + stage : ETLStage, default ETLStage.FINAL
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (77)
CHANGELOG.md
(1 hunks)flowdb/testdata/bin/run_qa_checks.py
(1 hunks)flowdb_synthetic_data.Dockerfile
(1 hunks)flowdb_synthetic_data.Dockerfile.dockerignore
(1 hunks)flowdb_testdata.Dockerfile
(1 hunks)flowdb_testdata.Dockerfile.dockerignore
(1 hunks)flowetl/flowetl/MANIFEST.in
(1 hunks)flowetl/flowetl/flowetl/operators/qa_check_operator.py
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
(1 hunks)flowetl/flowetl/flowetl/util.py
(8 hunks)flowetl/tests/integration/conftest.py
(1 hunks)flowetl/tests/integration/test_qa.py
(29 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(9 hunks)
💤 Files with no reviewable changes (2)
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (2)
- flowdb_testdata.Dockerfile
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
🚧 Files skipped from review as they are similar to previous changes (68)
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
- flowetl/flowetl/MANIFEST.in
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
- flowdb_testdata.Dockerfile.dockerignore
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
- flowdb_synthetic_data.Dockerfile
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
- flowdb/testdata/bin/run_qa_checks.py
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
- flowdb_synthetic_data.Dockerfile.dockerignore
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py
[error] 38-38: undefined name 'DAG'
(F821)
[error] 58-58: line too long (93 > 79 characters)
(E501)
[error] 71-71: line too long (80 > 79 characters)
(E501)
[error] 261-261: line too long (113 > 79 characters)
(E501)
[error] 433-433: line too long (88 > 79 characters)
(E501)
flowetl/tests/integration/test_qa.py
[error] 10-10: line too long (103 > 79 characters)
(E501)
[error] 11-11: line too long (97 > 79 characters)
(E501)
[error] 26-26: line too long (103 > 79 characters)
(E501)
[error] 27-27: line too long (97 > 79 characters)
(E501)
[error] 43-43: line too long (103 > 79 characters)
(E501)
[error] 44-44: line too long (97 > 79 characters)
(E501)
[error] 62-62: line too long (103 > 79 characters)
(E501)
[error] 63-63: line too long (103 > 79 characters)
(E501)
[error] 64-64: line too long (87 > 79 characters)
(E501)
[error] 65-65: line too long (89 > 79 characters)
(E501)
[error] 79-79: line too long (103 > 79 characters)
(E501)
[error] 80-80: line too long (103 > 79 characters)
(E501)
[error] 81-81: line too long (87 > 79 characters)
(E501)
[error] 83-83: line too long (81 > 79 characters)
(E501)
[error] 97-97: line too long (103 > 79 characters)
(E501)
[error] 98-98: line too long (103 > 79 characters)
(E501)
[error] 99-99: line too long (87 > 79 characters)
(E501)
[error] 100-100: line too long (89 > 79 characters)
(E501)
[error] 114-114: line too long (103 > 79 characters)
(E501)
[error] 115-115: line too long (103 > 79 characters)
(E501)
[error] 116-116: line too long (87 > 79 characters)
(E501)
[error] 118-118: line too long (81 > 79 characters)
(E501)
[error] 132-132: line too long (103 > 79 characters)
(E501)
[error] 133-133: line too long (97 > 79 characters)
(E501)
[error] 138-138: line too long (81 > 79 characters)
(E501)
[error] 148-148: line too long (83 > 79 characters)
(E501)
[error] 149-149: line too long (103 > 79 characters)
(E501)
[error] 150-150: line too long (97 > 79 characters)
(E501)
[error] 156-156: line too long (86 > 79 characters)
(E501)
[error] 166-166: line too long (88 > 79 characters)
(E501)
[error] 184-184: line too long (91 > 79 characters)
(E501)
[error] 194-194: line too long (82 > 79 characters)
(E501)
[error] 212-212: line too long (85 > 79 characters)
(E501)
[error] 223-223: line too long (103 > 79 characters)
(E501)
[error] 224-224: line too long (97 > 79 characters)
(E501)
[error] 238-238: line too long (95 > 79 characters)
(E501)
[error] 239-239: line too long (103 > 79 characters)
(E501)
[error] 240-240: line too long (117 > 79 characters)
(E501)
[error] 241-241: line too long (91 > 79 characters)
(E501)
[error] 242-242: line too long (93 > 79 characters)
(E501)
[error] 255-255: line too long (103 > 79 characters)
(E501)
[error] 256-256: line too long (97 > 79 characters)
(E501)
[error] 261-261: line too long (81 > 79 characters)
(E501)
[error] 271-271: line too long (103 > 79 characters)
(E501)
[error] 272-272: line too long (97 > 79 characters)
(E501)
[error] 286-286: line too long (80 > 79 characters)
(E501)
[error] 287-287: line too long (103 > 79 characters)
(E501)
[error] 288-288: line too long (103 > 79 characters)
(E501)
[error] 289-289: line too long (87 > 79 characters)
(E501)
[error] 290-290: line too long (87 > 79 characters)
(E501)
[error] 291-291: line too long (89 > 79 characters)
(E501)
[error] 294-294: line too long (83 > 79 characters)
(E501)
[error] 304-304: line too long (80 > 79 characters)
(E501)
[error] 305-305: line too long (103 > 79 characters)
(E501)
[error] 306-306: line too long (103 > 79 characters)
(E501)
[error] 307-307: line too long (87 > 79 characters)
(E501)
[error] 308-308: line too long (87 > 79 characters)
(E501)
[error] 309-309: line too long (89 > 79 characters)
(E501)
[error] 312-312: line too long (83 > 79 characters)
(E501)
[error] 322-322: line too long (85 > 79 characters)
(E501)
[error] 323-323: line too long (103 > 79 characters)
(E501)
[error] 324-324: line too long (107 > 79 characters)
(E501)
[error] 327-327: line too long (82 > 79 characters)
(E501)
[error] 340-340: line too long (83 > 79 characters)
(E501)
[error] 341-341: line too long (103 > 79 characters)
(E501)
[error] 342-342: line too long (117 > 79 characters)
(E501)
[error] 343-343: line too long (87 > 79 characters)
(E501)
[error] 345-345: line too long (81 > 79 characters)
(E501)
[error] 359-359: line too long (103 > 79 characters)
(E501)
[error] 360-360: line too long (117 > 79 characters)
(E501)
[error] 361-361: line too long (91 > 79 characters)
(E501)
[error] 362-362: line too long (93 > 79 characters)
(E501)
[error] 365-365: line too long (94 > 79 characters)
(E501)
[error] 374-374: line too long (88 > 79 characters)
(E501)
[error] 395-395: line too long (88 > 79 characters)
(E501)
[error] 417-417: line too long (103 > 79 characters)
(E501)
[error] 418-418: line too long (107 > 79 characters)
(E501)
[error] 419-419: line too long (81 > 79 characters)
(E501)
[error] 420-420: line too long (81 > 79 characters)
(E501)
[error] 421-421: line too long (83 > 79 characters)
(E501)
[error] 424-424: line too long (87 > 79 characters)
(E501)
[error] 434-434: line too long (103 > 79 characters)
(E501)
[error] 435-435: line too long (107 > 79 characters)
(E501)
[error] 436-436: line too long (81 > 79 characters)
(E501)
[error] 437-437: line too long (81 > 79 characters)
(E501)
[error] 438-438: line too long (83 > 79 characters)
(E501)
[error] 441-441: line too long (90 > 79 characters)
(E501)
[error] 450-450: line too long (80 > 79 characters)
(E501)
[error] 451-451: line too long (103 > 79 characters)
(E501)
[error] 452-452: line too long (107 > 79 characters)
(E501)
[error] 455-455: line too long (81 > 79 characters)
(E501)
[error] 458-458: line too long (95 > 79 characters)
(E501)
flowetl/flowetl/flowetl/operators/qa_check_operator.py
[error] 10-10: line too long (114 > 79 characters)
(E501)
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
38-38: Undefined name DAG
(F821)
🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md
11-11: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
12-12: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
13-13: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: CodeQL-Build
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (11)
flowetl/tests/integration/conftest.py (1)
543-549
: ```shell
#!/bin/bashRe-run search excluding the new fixture name without PCRE look-arounds
echo "Searching for any remaining references to the old fixture name 'jinja_env' (excluding 'jinja_env_final_qa'):"
rg jinja_env --type py | grep -v jinja_env_final_qaIf PCRE2 support is available, a more precise search:
echo "Searching with PCRE2 look-ahead for any 'jinja_env' not followed by '_final_qa':"
rg --pcre2 'jinja_env(?!_final_qa)' --type py</details> <details> <summary>CHANGELOG.md (1)</summary> `20-21`: **Clear documentation of breaking changes.** The entries properly document the breaking changes to task IDs and the `type_of_query_or_check` field, which will help users understand the impact of these modifications on existing systems. </details> <details> <summary>flowetl/flowetl/flowetl/util.py (8)</summary> `26-34`: **Well-designed enum for ETL stages.** The `ETLStage` enum implementation is excellent - inheriting from both `str` and `Enum` provides good type safety whilst allowing string interoperability. The three stages logically represent the ETL pipeline flow. --- `40-40`: **Good addition of stage parameter with sensible default.** The stage parameter enhances the function's flexibility whilst maintaining backwards compatibility by defaulting to `ETLStage.FINAL`. --- `73-73`: **Excellent defensive programming with stage validation.** Using `ETLStage(stage)` ensures type safety and provides clear error messages for invalid stage values. --- `75-77`: **Clean stage-based path organisation.** Using `stage.value` in the path construction effectively organises QA checks into stage-specific subdirectories, improving maintainability. --- `79-86`: **Good refactoring for improved readability.** The restructured template search path construction using generator expressions and unpacking improves code readability whilst maintaining the same functionality. --- `186-187`: **Logical extension of QA check path configuration.** The new stage-specific parameters follow the established pattern and provide necessary flexibility for configuring QA checks at different ETL stages. --- `261-262`: **Clear and consistent parameter documentation.** The documentation for the new stage-specific parameters follows the established style and clearly explains their purpose. --- `272-272`: **Appropriate import for task synchronisation.** The `EmptyOperator` import is correctly placed and necessary for the task gathering functionality in the DAG. </details> <details> <summary>flowetl/tests/integration/test_qa.py (1)</summary> `9-465`: **Excellent consistency in fixture updates for stage-based QA checks.** All test functions have been consistently updated to use the `jinja_env_final_qa` fixture, which properly aligns with the new stage-based QA check architecture. The changes maintain test functionality whilst supporting the enhanced pipeline structure. </details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
3acfcc8
to
4415971
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
flowetl/flowetl/flowetl/util.py (3)
38-38
: Fix undefined type annotation (duplicate issue).The
DAG
type is still undefined in the type hint, as flagged in previous reviews.
375-375
: Use enum values instead of string literals (duplicate issue).These lines still use string literals
"staging"
instead of the enum valueETLStage.STAGING
, as flagged in previous reviews.Also applies to: 394-394
433-433
: Use enum value for extract stage (duplicate issue).This line still uses the string literal
"extract"
instead ofETLStage.EXTRACT
, as flagged in previous reviews.
🧹 Nitpick comments (2)
flowetl/flowetl/flowetl/operators/qa_check_operator.py (1)
10-10
: LGTM! Correct extraction of base check name from task ID.The change to use
task.task_id.split(".")[0]
correctly extracts the base check name, which aligns with the new task ID format that includes CDR type and stage suffixes.However, please address the line length issue:
- (date '{{{{ ds }}}}', '{{{{ params.cdr_type }}}}', '{{{{ task.task_id.split(".")[0] }}}}', ({sql}), NOW()) + (date '{{{{ ds }}}}', '{{{{ params.cdr_type }}}}', + '{{{{ task.task_id.split(".")[0] }}}}', ({sql}), NOW())CHANGELOG.md (1)
10-22
: LGTM! Comprehensive documentation of QA check enhancements.The changelog entries thoroughly document all the new cell info QA checks and related changes, including the new
create_dag
parameters and organisational structure changes.Please fix the markdown list indentation to follow the expected 2-space format:
-- Added new cell info qa checks to flowetl: -- - Missing latitude -- - Missing longitude -- - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868) -- - Number of new cell ids -- - Number of duplicate ids +- Added new cell info qa checks to flowetl: + - Missing latitude + - Missing longitude + - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868) + - Number of new cell ids + - Number of duplicate ids
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (77)
CHANGELOG.md
(1 hunks)flowdb/testdata/bin/run_qa_checks.py
(1 hunks)flowdb_synthetic_data.Dockerfile
(1 hunks)flowdb_synthetic_data.Dockerfile.dockerignore
(1 hunks)flowdb_testdata.Dockerfile
(1 hunks)flowdb_testdata.Dockerfile.dockerignore
(1 hunks)flowetl/flowetl/MANIFEST.in
(1 hunks)flowetl/flowetl/flowetl/operators/qa_check_operator.py
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
(1 hunks)flowetl/flowetl/flowetl/util.py
(8 hunks)flowetl/tests/integration/conftest.py
(1 hunks)flowetl/tests/integration/test_qa.py
(18 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(9 hunks)
💤 Files with no reviewable changes (2)
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (2)
- flowdb_synthetic_data.Dockerfile
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
🚧 Files skipped from review as they are similar to previous changes (68)
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
- flowetl/flowetl/MANIFEST.in
- flowdb/testdata/bin/run_qa_checks.py
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
- flowdb_synthetic_data.Dockerfile.dockerignore
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
- flowdb_testdata.Dockerfile
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
- flowdb_testdata.Dockerfile.dockerignore
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Flake8 (7.2.0)
flowetl/tests/integration/test_qa.py
[error] 10-10: line too long (103 > 79 characters)
(E501)
[error] 11-11: line too long (97 > 79 characters)
(E501)
[error] 26-26: line too long (103 > 79 characters)
(E501)
[error] 27-27: line too long (97 > 79 characters)
(E501)
[error] 43-43: line too long (103 > 79 characters)
(E501)
[error] 44-44: line too long (97 > 79 characters)
(E501)
[error] 62-62: line too long (103 > 79 characters)
(E501)
[error] 63-63: line too long (103 > 79 characters)
(E501)
[error] 64-64: line too long (87 > 79 characters)
(E501)
[error] 65-65: line too long (89 > 79 characters)
(E501)
[error] 68-68: line too long (86 > 79 characters)
(E501)
[error] 79-79: line too long (103 > 79 characters)
(E501)
[error] 80-80: line too long (103 > 79 characters)
(E501)
[error] 81-81: line too long (87 > 79 characters)
(E501)
[error] 83-83: line too long (81 > 79 characters)
(E501)
[error] 97-97: line too long (103 > 79 characters)
(E501)
[error] 98-98: line too long (103 > 79 characters)
(E501)
[error] 99-99: line too long (87 > 79 characters)
(E501)
[error] 100-100: line too long (89 > 79 characters)
(E501)
[error] 103-103: line too long (86 > 79 characters)
(E501)
[error] 114-114: line too long (103 > 79 characters)
(E501)
[error] 115-115: line too long (103 > 79 characters)
(E501)
[error] 116-116: line too long (87 > 79 characters)
(E501)
[error] 118-118: line too long (81 > 79 characters)
(E501)
[error] 132-132: line too long (103 > 79 characters)
(E501)
[error] 133-133: line too long (97 > 79 characters)
(E501)
[error] 148-148: line too long (83 > 79 characters)
(E501)
[error] 149-149: line too long (103 > 79 characters)
(E501)
[error] 150-150: line too long (97 > 79 characters)
(E501)
[error] 166-166: line too long (88 > 79 characters)
(E501)
[error] 196-196: line too long (82 > 79 characters)
(E501)
[error] 227-227: line too long (103 > 79 characters)
(E501)
[error] 228-228: line too long (97 > 79 characters)
(E501)
[error] 233-233: line too long (88 > 79 characters)
(E501)
[error] 245-245: line too long (103 > 79 characters)
(E501)
[error] 246-246: line too long (117 > 79 characters)
(E501)
[error] 247-247: line too long (91 > 79 characters)
(E501)
[error] 248-248: line too long (93 > 79 characters)
(E501)
[error] 251-251: line too long (88 > 79 characters)
(E501)
[error] 261-261: line too long (103 > 79 characters)
(E501)
[error] 262-262: line too long (97 > 79 characters)
(E501)
[error] 277-277: line too long (103 > 79 characters)
(E501)
[error] 278-278: line too long (97 > 79 characters)
(E501)
[error] 292-292: line too long (80 > 79 characters)
(E501)
[error] 293-293: line too long (103 > 79 characters)
(E501)
[error] 294-294: line too long (103 > 79 characters)
(E501)
[error] 295-295: line too long (87 > 79 characters)
(E501)
[error] 296-296: line too long (87 > 79 characters)
(E501)
[error] 297-297: line too long (89 > 79 characters)
(E501)
[error] 310-310: line too long (80 > 79 characters)
(E501)
[error] 311-311: line too long (103 > 79 characters)
(E501)
[error] 312-312: line too long (103 > 79 characters)
(E501)
[error] 313-313: line too long (87 > 79 characters)
(E501)
[error] 314-314: line too long (87 > 79 characters)
(E501)
[error] 315-315: line too long (89 > 79 characters)
(E501)
[error] 328-328: line too long (85 > 79 characters)
(E501)
[error] 329-329: line too long (103 > 79 characters)
(E501)
[error] 330-330: line too long (107 > 79 characters)
(E501)
[error] 333-333: line too long (82 > 79 characters)
(E501)
[error] 346-346: line too long (83 > 79 characters)
(E501)
[error] 347-347: line too long (103 > 79 characters)
(E501)
[error] 348-348: line too long (117 > 79 characters)
(E501)
[error] 349-349: line too long (87 > 79 characters)
(E501)
[error] 351-351: line too long (81 > 79 characters)
(E501)
[error] 365-365: line too long (103 > 79 characters)
(E501)
[error] 366-366: line too long (117 > 79 characters)
(E501)
[error] 367-367: line too long (91 > 79 characters)
(E501)
[error] 368-368: line too long (93 > 79 characters)
(E501)
[error] 380-380: line too long (88 > 79 characters)
(E501)
[error] 401-401: line too long (88 > 79 characters)
(E501)
[error] 423-423: line too long (103 > 79 characters)
(E501)
[error] 424-424: line too long (107 > 79 characters)
(E501)
[error] 425-425: line too long (81 > 79 characters)
(E501)
[error] 426-426: line too long (81 > 79 characters)
(E501)
[error] 427-427: line too long (83 > 79 characters)
(E501)
[error] 430-430: line too long (87 > 79 characters)
(E501)
[error] 440-440: line too long (103 > 79 characters)
(E501)
[error] 441-441: line too long (107 > 79 characters)
(E501)
[error] 442-442: line too long (81 > 79 characters)
(E501)
[error] 443-443: line too long (81 > 79 characters)
(E501)
[error] 444-444: line too long (83 > 79 characters)
(E501)
[error] 456-456: line too long (80 > 79 characters)
(E501)
[error] 457-457: line too long (103 > 79 characters)
(E501)
[error] 458-458: line too long (107 > 79 characters)
(E501)
[error] 461-461: line too long (81 > 79 characters)
(E501)
flowetl/flowetl/flowetl/util.py
[error] 38-38: undefined name 'DAG'
(F821)
[error] 58-58: line too long (93 > 79 characters)
(E501)
[error] 71-71: line too long (80 > 79 characters)
(E501)
[error] 261-261: line too long (113 > 79 characters)
(E501)
[error] 433-433: line too long (88 > 79 characters)
(E501)
flowetl/flowetl/flowetl/operators/qa_check_operator.py
[error] 10-10: line too long (114 > 79 characters)
(E501)
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
38-38: Undefined name DAG
(F821)
🪛 LanguageTool
CHANGELOG.md
[style] ~22-~22: Would you like to use the Oxford spelling “organized”? The spelling ‘organised’ is also correct.
Context: ...rovided alongside dag files must now be organised into the correct subfolder for the ETL ...
(OXFORD_SPELLING_Z_NOT_S)
🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md
11-11: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
12-12: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
13-13: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (7)
flowetl/tests/integration/conftest.py (1)
543-549
: LGTM! Fixture properly updated for new QA check organisation.The renaming from
jinja_env
tojinja_env_final_qa
and the path change to the "final" subdirectory correctly aligns with the new stage-specific QA check structure described in the PR objectives.flowetl/flowetl/flowetl/util.py (4)
26-34
: Well-implemented ETL stage enum.The
ETLStage
enum provides a clean way to represent the different ETL stages with proper typing support.
40-40
: Good implementation of stage-aware QA check discovery.The addition of the
stage
parameter with proper enum conversion and path construction correctly supports the stage-specific QA check structure.Also applies to: 73-89
186-187
: Clean parameter additions for stage-specific QA checks.The new parameters
additional_staging_qa_check_paths
andadditional_extract_qa_check_paths
are well-integrated with appropriate documentation.Also applies to: 261-262
369-379
: Excellent integration of stage-specific QA checks into DAG flow.The implementation correctly integrates QA checks at appropriate points in the ETL pipeline:
- Staging QA checks after flux sensors
- Extract QA checks after extraction/clustering
- Proper use of EmptyOperator for task gathering
This provides comprehensive quality assurance throughout the ETL process.
Also applies to: 388-398, 429-437
flowetl/tests/integration/test_qa.py (2)
9-9
: Consistent fixture renaming aligns with stage-based structure.All test functions have been systematically updated to use
jinja_env_final_qa
instead ofjinja_env
, correctly reflecting the new stage-specific QA check organisation.Also applies to: 25-25, 42-42, 61-61, 78-78, 96-96, 113-113, 131-131, 148-148, 166-166, 196-196, 226-226, 242-244, 260-260, 276-276, 292-292, 310-310, 328-328, 346-346, 364-364, 380-380, 401-401, 422-422, 439-439, 456-456
16-18
: Template paths correctly updated for new directory structure.All template path references have been systematically updated to include the CDR type subdirectory (e.g.,
f"{cdr_type}/count_added_rows.sql"
), properly aligning with the new stage-based QA check organisation introduced in the broader PR.Also applies to: 30-32, 49-51, 68-70, 86-88, 103-105, 121-123, 138-140, 156-158, 184-188, 214-218, 233-235, 251-253, 267-269, 283-285, 300-302, 318-320, 336-338, 354-356, 371-373, 392-394, 413-415, 430-432, 447-449, 464-466
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)
38-38
: Fix undefined type annotation.The
DAG
type is used in the type hint but not imported, causing a static analysis error.Apply this diff to fix the type annotation:
+from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from airflow import DAGOr use a string literal:
- dag: Optional["DAG"] = None, + dag: Optional["airflow.DAG"] = None,
370-380
: Use enum values consistently instead of string literals.The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.
Apply this diff to use enum values consistently:
- stage="staging", + stage=ETLStage.STAGING,
387-397
: Use enum values consistently instead of string literals.Same issue as the previous comment - use enum values instead of string literals for type safety.
Apply this diff:
- stage="staging", + stage=ETLStage.STAGING,
432-437
: Use enum value for extract stage and approve the gather operator addition.The extract stage should use the enum value instead of a string literal. However, the addition of the "gather"
EmptyOperator
is a good design pattern for coordinating parallel tasks.Apply this diff:
- additional_qa_check_paths=additional_qa_check_paths, stage="extract" + additional_qa_check_paths=additional_qa_check_paths, stage=ETLStage.EXTRACT
🧹 Nitpick comments (1)
flowetl/flowetl/flowetl/util.py (1)
97-97
: Remove debug print statement.This print statement appears to be leftover debug code and should be removed before merging.
- print(templates)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
flowetl/flowetl/flowetl/util.py
(9 hunks)flowetl/tests/integration/test_full_pipeline.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
38-38: Undefined name DAG
(F821)
🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py
[error] 38-38: undefined name 'DAG'
(F821)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (3)
flowetl/tests/integration/test_full_pipeline.py (1)
109-109
: LGTM! Test update aligns with implementation changes.The addition of "gather" to the expected QA check types correctly reflects the new
EmptyOperator
introduced in the DAG construction logic.flowetl/flowetl/flowetl/util.py (2)
26-34
: Well-structured enum addition.The
ETLStage
enum provides clear type safety for representing different ETL stages and follows good naming conventions.
187-189
: Well-structured parameter additions.The addition of stage-specific QA check path parameters provides good flexibility for customising QA checks per ETL stage whilst maintaining backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)
38-38
: Fix undefined type annotation.The
DAG
type is used in the type hint but not imported, causing a static analysis error.Apply this diff to fix the type annotation:
+from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from airflow import DAGOr alternatively:
- dag: Optional["DAG"] = None, + dag: Optional["airflow.DAG"] = None,
372-382
: Use enum values consistently instead of string literals.The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.
Apply this diff to use enum values consistently:
- stage="staging", + stage=ETLStage.STAGING,
391-401
: Use enum values consistently instead of string literals.Similar to the previous comment, use the enum value instead of a string literal.
Apply this diff:
- stage="staging", + stage=ETLStage.STAGING,
432-448
: Use enum value for extract stage and fix parameter usage.The implementation correctly adds extract-stage QA checks and includes a "gather" operator, but should use enum values consistently and utilise the dedicated extract QA check paths parameter.
Apply this diff:
- additional_qa_check_paths=additional_qa_check_paths, stage="extract" + additional_qa_check_paths=additional_extract_qa_check_paths, stage=ETLStage.EXTRACT
🧹 Nitpick comments (1)
flowetl/flowetl/flowetl/util.py (1)
78-79
: Remove debug print statements.These debug print statements should be removed before merging to production.
Apply this diff to remove the debug statements:
- print(default_path) - print(list(default_path.walk()))And:
- print(templates)
Also applies to: 99-99
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
flowetl/flowetl/flowetl/util.py
(9 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
38-38: Undefined name DAG
(F821)
🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py
[error] 38-38: undefined name 'DAG'
(F821)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (5)
flowetl/flowetl/flowetl/util.py (5)
26-34
: Excellent addition of ETL stage enumeration.The
ETLStage
enum is well-designed and provides clear stage categorisation for the QA check system. The inheritance from bothstr
andEnum
allows for easy string comparisons whilst maintaining type safety.
40-40
: Good addition of stage parameter with appropriate default.The stage parameter with
ETLStage.FINAL
as default maintains backward compatibility whilst enabling stage-specific QA checks.
73-91
: Excellent implementation of stage-aware path construction.The logic correctly converts the stage parameter to an enum value and constructs appropriate paths for both default and additional QA check locations. The path construction ensures stage-specific QA checks are properly discovered.
189-190
: Appropriate addition of stage-specific QA check path parameters.The new parameters
additional_staging_qa_check_paths
andadditional_extract_qa_check_paths
provide flexibility for specifying additional QA checks for each stage.
275-275
: Good addition of EmptyOperator for task coordination.The import and usage of
EmptyOperator
as a "gather" task provides proper coordination between extract-stage QA checks and subsequent constraint/index operations.Also applies to: 439-439
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
flowetl/flowetl/flowetl/util.py (1)
432-448
: Good workflow integration with staging separation, but use enum values.The addition of extract-stage QA checks with the "gather" EmptyOperator provides clear separation between different stages of the workflow. However, the string literal should be replaced with the enum value.
Apply this diff to use the enum value:
- additional_qa_check_paths=additional_qa_check_paths, stage="extract" + additional_qa_check_paths=additional_extract_qa_check_paths, stage=ETLStage.EXTRACTNote: I've also corrected the parameter name to use
additional_extract_qa_check_paths
instead ofadditional_qa_check_paths
to match the function signature.
♻️ Duplicate comments (3)
flowetl/flowetl/flowetl/util.py (3)
38-38
: Fix undefined type annotation.The
DAG
type is used in the type hint but not imported, causing a static analysis error.Apply this diff to fix the type annotation:
+from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from airflow import DAGAdd this import block near the top of the file after the existing imports.
372-382
: Use enum values consistently instead of string literals.The stage parameter is passed as a string literal rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.
Apply this diff to use enum values consistently:
- stage="staging", + stage=ETLStage.STAGING,
391-401
: Use enum values consistently instead of string literals.Same issue as the previous segment - use enum values for type safety.
Apply this diff:
- stage="staging", + stage=ETLStage.STAGING,
🧹 Nitpick comments (1)
flowetl/flowetl/flowetl/util.py (1)
78-79
: Remove debug print statements.Debug print statements should be removed before merging to production.
Apply this diff to remove the debug statements:
- print(default_path) - print(list(default_path.glob("*.sql")))- print(templates)
Also applies to: 99-99
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flowetl/flowetl/flowetl/util.py
(9 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
38-38: Undefined name DAG
(F821)
🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py
[error] 38-38: undefined name 'DAG'
(F821)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (7)
flowetl/flowetl/flowetl/util.py (7)
26-34
: LGTM: Well-structured enum definition.The ETLStage enum provides clear typing for the different stages of the ETL process and follows Python enum best practices by inheriting from both str and Enum.
40-40
: Good addition of stage parameter with sensible default.The stage parameter addition allows for stage-specific QA checks whilst maintaining backwards compatibility with the default FINAL stage.
56-59
: Clear documentation of the new stage parameter.The docstring update properly documents the stage parameter's purpose and default behaviour.
73-91
: Solid implementation of stage-specific path resolution.The logic correctly converts the stage to a string value and appends it to both default and additional search paths, enabling proper stage-specific QA check discovery.
189-190
: Well-designed parameter additions for stage-specific paths.The new parameters follow the existing naming convention and provide flexibility for different ETL stages.
264-265
: Clear documentation for the new parameters.The consolidated documentation for all three QA check path parameters improves readability.
275-275
: Appropriate import addition for EmptyOperator.The EmptyOperator import is correctly added to support the new "gather" task in the DAG workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)
38-38
: Fix undefined type annotation.The
DAG
type is used in the type hint but not imported, causing a static analysis error. This issue was previously flagged but remains unresolved.
391-401
: Use enum values consistently instead of string literals.The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies. This issue was previously flagged but remains unresolved.
410-420
: Use enum values consistently instead of string literals.The stage parameter is passed as string literals rather than using the defined enum values. This is the same issue identified in the FILE flux sensor logic above.
455-455
: Use enum value for extract stage.Consistent with the staging stage issue, use the enum value instead of a string literal. This was previously flagged but remains unresolved.
🧹 Nitpick comments (2)
flowetl/flowetl/flowetl/util.py (2)
111-114
: Remove debug print statements.The debug print statements should be removed from production code or replaced with proper logging.
- for pth in set(search_paths): - print(f"Search path {pth}") - sql_files = pth.glob("*.sql") - print(list(pth.glob("*.sql"))) - templates = [*templates, *sql_files] + for pth in set(search_paths): + sql_files = pth.glob("*.sql") + templates = [*templates, *sql_files]
121-127
: Remove debug print statements and improve task ID construction.The debug print statements should be removed, but the task ID construction logic is well-designed to include stage and CDR type information.
- print(f"Parsing template {tmpl}") task_id = ( f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}" if "cdr_type" in dag.params else f"{tmpl.stem}.{stage}" ) - print(f"Task id is {task_id}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
flowetl/flowetl/flowetl/util.py
(8 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
38-38: Undefined name DAG
(F821)
🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py
[error] 38-38: undefined name 'DAG'
(F821)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (6)
flowetl/flowetl/flowetl/util.py (6)
26-34
: Well-designed enum implementation.The
ETLStage
enum properly inherits from bothstr
andEnum
, allowing seamless usage as string values whilst maintaining type safety. The stage values are clearly defined and documented.
40-40
: Good backward-compatible parameter addition.The new
stage
parameter with a sensible default maintains backward compatibility whilst enabling the new stage-specific functionality.
73-73
: Defensive type conversion is well-implemented.The explicit conversion to
ETLStage
ensures type safety even if a string value is passed, preventing potential issues downstream.
75-107
: Comprehensive and well-structured search path logic.The implementation systematically constructs stage-specific search paths from multiple sources whilst properly handling edge cases like null template search paths and CDR type-specific subdirectories.
208-209
: Consistent and well-named parameter additions.The new parameters follow the established naming convention and maintain backward compatibility with sensible default values.
458-458
: Good use of coordination operator.The
EmptyOperator
serves as an effective coordination point to gather multiple QA check tasks before proceeding to the next stage of the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
flowetl/flowetl/flowetl/util.py (1)
37-76
: Consider refactoring for improved readability.The function correctly handles path disambiguation but has high complexity (16 local variables, 6 nested blocks). While the logic appears sound, consider breaking this into smaller helper functions to improve maintainability.
For example, separate the suffix generation logic from the uniqueness checking:
+def _generate_suffix_candidate(path_parts, depth): + """Generate a suffix candidate of given depth.""" + suffix_parts = path_parts[:depth] + return "/".join(reversed(suffix_parts)) + +def _is_suffix_unique(candidate_suffix, target_index, all_paths, path_parts_list, depth): + """Check if a suffix candidate is unique among all paths.""" + for j, other_path in enumerate(all_paths): + if target_index != j: + other_parts = path_parts_list[j] + if len(other_parts) >= depth: + other_suffix = _generate_suffix_candidate(other_parts, depth) + if candidate_suffix == other_suffix: + return False + return True
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
flowetl/flowetl/flowetl/util.py
(9 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
48-48: Loop control variable path
not used within loop body
Rename unused path
to _path
(B007)
58-58: Loop control variable other_path
not used within loop body
Rename unused other_path
to _other_path
(B007)
117-117: Undefined name DAG
(F821)
🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py
[error] 117-117: undefined name 'DAG'
(F821)
🪛 Pylint (3.3.7)
flowetl/flowetl/flowetl/util.py
[refactor] 37-37: Too many local variables (16/15)
(R0914)
[refactor] 48-73: Too many nested blocks (6/5)
(R1702)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (8)
flowetl/flowetl/flowetl/util.py (8)
4-4
: Import addition looks good.The
defaultdict
import is properly placed and used in the new disambiguation functionality.
27-35
: Well-designed enum implementation.The
ETLStage
enum is properly structured with clear naming and appropriate inheritance from bothstr
andEnum
for serialisation compatibility.
78-113
: Clean path disambiguation implementation.The function effectively groups paths by filename and applies disambiguation only where needed. The logic is clear and handles edge cases appropriately.
119-119
: Well-designed parameter addition.The
stage
parameter is properly typed with a sensible default that maintains backward compatibility.
286-287
: Consistent parameter additions.The new
additional_staging_qa_check_paths
andadditional_extract_qa_check_paths
parameters follow the established pattern and provide necessary stage-specific functionality.
372-372
: Appropriate import addition.The
EmptyOperator
import is correctly placed and needed for the new gather functionality in the DAG construction.
469-479
: Well-structured QA check integration.The staging QA checks are properly integrated into both flux sensor paths, maintaining consistency in the DAG structure. The pattern correctly places these checks after data validation but before extraction.
Also applies to: 488-498
531-536
: Effective synchronisation pattern.The extract QA checks integration with the gather operator provides a clean synchronisation point, ensuring all extract-stage checks complete before proceeding to constraint and index creation.
for pathgroup in sorted(disambiguated_paths.values()): | ||
for tmpl in pathgroup: | ||
tmpl = Path(tmpl) | ||
print(f"Parsing template {tmpl}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug print statements.
These print statements appear to be debug code and should not be included in production code as they will clutter the logs.
Apply this diff to remove the debug statements:
- print(f"Parsing template {tmpl}")
task_id = (
f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}"
if "cdr_type" in dag.params
else f"{tmpl.stem}.{stage}"
)
- print(f"Task id is {task_id}")
Also applies to: 205-205
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/util.py at lines 199 and 205, remove the debug print
statements that output parsing template information. These print statements are
for debugging purposes and should be deleted to avoid cluttering production
logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
flowetl/flowetl/flowetl/util.py (1)
188-213
: Template processing logic is well-structured.The disambiguation of templates and task ID generation incorporating stage and CDR type follows a logical pattern, though debug prints should be removed as noted in past reviews.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
flowetl/flowetl/flowetl/util.py
(10 hunks)flowetl/tests/integration/conftest.py
(2 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- flowetl/tests/integration/conftest.py
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
48-48: Loop control variable path
not used within loop body
Rename unused path
to _path
(B007)
58-58: Loop control variable other_path
not used within loop body
Rename unused other_path
to _other_path
(B007)
117-117: Undefined name DAG
(F821)
🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py
[error] 117-117: undefined name 'DAG'
(F821)
🪛 Pylint (3.3.7)
flowetl/flowetl/flowetl/util.py
[refactor] 37-37: Too many local variables (16/15)
(R0914)
[refactor] 48-73: Too many nested blocks (6/5)
(R1702)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: CodeQL-Build
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (8)
flowetl/flowetl/flowetl/util.py (8)
27-35
: Well-implemented ETL stage enum.The ETLStage enum is properly defined with string values and clear documentation, providing type safety for stage operations.
78-112
: Path disambiguation logic is sound.The implementation correctly groups paths by filename and applies minimal distinguishing suffixes to resolve conflicts.
152-153
: Good practice using enum validation.Converting the stage parameter to ETLStage enum ensures type safety and validates the input value.
286-287
: Additional QA check path parameters enhance flexibility.The addition of stage-specific QA check path parameters provides good extensibility for different ETL stages.
460-484
: Proper integration of staging QA checks with flux sensors.The staging QA checks are correctly integrated into the DAG flow, executing after flux checks but before extraction, with appropriate gathering operators.
494-512
: Consistent staging check integration across flux sensor types.The staging QA check integration is properly implemented for both table flux sensors and no-flux scenarios, maintaining consistency.
535-571
: Extract QA checks properly positioned in DAG workflow.The extract QA checks are correctly placed after extraction/clustering and before constraints/indexes, following logical ETL progression.
154-186
: QA Checks Path Logic Verified
- Default QA checks directories for all stages (
extract
,staging
,final
) exist underflowetl/flowetl/flowetl/qa_checks/
.- A DAG-specific override directory is present only for
final
underflowetl/mounts/dags/qa_checks/
.- The code’s search order (defaults → DAG file location → template paths → CDR type → additional paths) will correctly locate checks and fall back to defaults when override directories are absent.
No changes required.
def find_minimal_distinguishing_suffix(paths): | ||
"""Find the minimal suffix needed to distinguish between paths.""" | ||
if len(paths) == 1: | ||
return [(paths[0].name, 1)] # Return (suffix, depth) tuple | ||
|
||
# Convert paths to lists of parts for easier manipulation (excluding root '/') | ||
path_parts = [list(reversed(path.parts[1:])) for path in paths] # Skip the root '/' | ||
min_parts = min(len(parts) for parts in path_parts) | ||
|
||
distinguishing_suffixes = [] | ||
|
||
for i, path in enumerate(paths): | ||
parts = path_parts[i] | ||
|
||
# Start with just the filename | ||
for depth in range(1, min_parts + 1): | ||
suffix_parts = parts[:depth] | ||
candidate_suffix = "/".join(reversed(suffix_parts)) | ||
|
||
# Check if this suffix is unique among all paths | ||
is_unique = True | ||
for j, other_path in enumerate(paths): | ||
if i != j: | ||
other_parts = path_parts[j] | ||
if len(other_parts) >= depth: | ||
other_suffix = "/".join(reversed(other_parts[:depth])) | ||
if candidate_suffix == other_suffix: | ||
is_unique = False | ||
break | ||
|
||
if is_unique: | ||
distinguishing_suffixes.append((candidate_suffix, depth)) | ||
break | ||
else: | ||
# If we can't find a unique suffix, use relative path from root | ||
full_suffix = "/".join(reversed(parts)) | ||
distinguishing_suffixes.append((full_suffix, len(parts))) | ||
|
||
return distinguishing_suffixes | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Reduce complexity and fix unused loop variables.
The function has high complexity with nested loops and could be simplified. Additionally, static analysis correctly identifies unused loop variables.
Apply this diff to fix the unused variables:
- for i, path in enumerate(paths):
+ for i, _path in enumerate(paths):
parts = path_parts[i]
# Start with just the filename
for depth in range(1, min_parts + 1):
suffix_parts = parts[:depth]
candidate_suffix = "/".join(reversed(suffix_parts))
# Check if this suffix is unique among all paths
is_unique = True
- for j, other_path in enumerate(paths):
+ for j, _other_path in enumerate(paths):
if i != j:
Consider refactoring this function to reduce nesting and improve readability, perhaps by extracting the uniqueness check into a separate helper function.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def find_minimal_distinguishing_suffix(paths): | |
"""Find the minimal suffix needed to distinguish between paths.""" | |
if len(paths) == 1: | |
return [(paths[0].name, 1)] # Return (suffix, depth) tuple | |
# Convert paths to lists of parts for easier manipulation (excluding root '/') | |
path_parts = [list(reversed(path.parts[1:])) for path in paths] # Skip the root '/' | |
min_parts = min(len(parts) for parts in path_parts) | |
distinguishing_suffixes = [] | |
for i, path in enumerate(paths): | |
parts = path_parts[i] | |
# Start with just the filename | |
for depth in range(1, min_parts + 1): | |
suffix_parts = parts[:depth] | |
candidate_suffix = "/".join(reversed(suffix_parts)) | |
# Check if this suffix is unique among all paths | |
is_unique = True | |
for j, other_path in enumerate(paths): | |
if i != j: | |
other_parts = path_parts[j] | |
if len(other_parts) >= depth: | |
other_suffix = "/".join(reversed(other_parts[:depth])) | |
if candidate_suffix == other_suffix: | |
is_unique = False | |
break | |
if is_unique: | |
distinguishing_suffixes.append((candidate_suffix, depth)) | |
break | |
else: | |
# If we can't find a unique suffix, use relative path from root | |
full_suffix = "/".join(reversed(parts)) | |
distinguishing_suffixes.append((full_suffix, len(parts))) | |
return distinguishing_suffixes | |
def find_minimal_distinguishing_suffix(paths): | |
"""Find the minimal suffix needed to distinguish between paths.""" | |
if len(paths) == 1: | |
return [(paths[0].name, 1)] # Return (suffix, depth) tuple | |
# Convert paths to lists of parts for easier manipulation (excluding root '/') | |
path_parts = [list(reversed(path.parts[1:])) for path in paths] # Skip the root '/' | |
min_parts = min(len(parts) for parts in path_parts) | |
distinguishing_suffixes = [] | |
for i, _path in enumerate(paths): | |
parts = path_parts[i] | |
# Start with just the filename | |
for depth in range(1, min_parts + 1): | |
suffix_parts = parts[:depth] | |
candidate_suffix = "/".join(reversed(suffix_parts)) | |
# Check if this suffix is unique among all paths | |
is_unique = True | |
for j, _other_path in enumerate(paths): | |
if i != j: | |
other_parts = path_parts[j] | |
if len(other_parts) >= depth: | |
other_suffix = "/".join(reversed(other_parts[:depth])) | |
if candidate_suffix == other_suffix: | |
is_unique = False | |
break | |
if is_unique: | |
distinguishing_suffixes.append((candidate_suffix, depth)) | |
break | |
else: | |
# If we can't find a unique suffix, use relative path from root | |
full_suffix = "/".join(reversed(parts)) | |
distinguishing_suffixes.append((full_suffix, len(parts))) | |
return distinguishing_suffixes |
🧰 Tools
🪛 Ruff (0.11.9)
48-48: Loop control variable path
not used within loop body
Rename unused path
to _path
(B007)
58-58: Loop control variable other_path
not used within loop body
Rename unused other_path
to _other_path
(B007)
🪛 Pylint (3.3.7)
[refactor] 37-37: Too many local variables (16/15)
(R0914)
[refactor] 48-73: Too many nested blocks (6/5)
(R1702)
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/util.py between lines 37 and 76, the function
find_minimal_distinguishing_suffix has nested loops causing high complexity and
contains unused loop variables. To fix this, extract the uniqueness check into a
separate helper function to reduce nesting and improve readability. Also, remove
or rename any unused loop variables to satisfy static analysis. This refactoring
will simplify the logic and make the code easier to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
flowetl/tests/integration/test_qa.py (1)
16-16
: Remove unnecessary f-string prefix.The string has no placeholders, so the
f
prefix is unnecessary.- check_sql = jinja_env_staging_qa.get_template(f"count_added_rows.sql").render( + check_sql = jinja_env_staging_qa.get_template("count_added_rows.sql").render(
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
flowetl/tests/integration/conftest.py
(1 hunks)flowetl/tests/integration/test_dags_present.py
(2 hunks)flowetl/tests/integration/test_full_pipeline.py
(1 hunks)flowetl/tests/integration/test_qa.py
(19 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- flowetl/tests/integration/test_full_pipeline.py
- flowetl/tests/integration/conftest.py
- flowetl/tests/integration/test_dags_present.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
flowetl/tests/integration/test_qa.py (2)
flowetl/tests/integration/conftest.py (3)
flowdb_transaction
(563-566)jinja_env_staging_qa
(553-559)jinja_env_final_qa
(543-549)flowmachine/flowmachine/core/query_state.py (1)
execute
(301-312)
🪛 Ruff (0.11.9)
flowetl/tests/integration/test_qa.py
16-16: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🪛 Flake8 (7.2.0)
flowetl/tests/integration/test_qa.py
[error] 16-16: f-string is missing placeholders
(F541)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: CodeQL-Build
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (2)
flowetl/tests/integration/test_qa.py (2)
9-9
: Correct fixture usage for staging QA checks.The function correctly uses
jinja_env_staging_qa
fixture, which aligns with the staging-specific QA check reorganisation.
25-25
: Systematic migration to stage-specific QA fixtures completed correctly.All final QA tests have been properly updated to use the
jinja_env_final_qa
fixture and the new subdirectory-based template paths ({cdr_type}/template_name.sql
). This aligns perfectly with the QA check reorganisation by ETL stage.Also applies to: 30-32, 42-42, 49-51, 61-61, 68-68, 78-78, 86-88, 96-96, 103-103, 113-113, 121-123, 131-131, 138-140, 148-148, 156-158, 166-166, 184-186, 196-196, 214-216, 226-226, 233-233, 242-244, 251-251, 260-260, 267-269, 276-276, 283-285, 292-292, 300-302, 310-310, 318-320, 328-328, 336-336, 346-346, 354-354, 364-364, 371-373, 380-380, 392-392, 401-401, 413-413, 422-422, 430-430, 439-439, 447-449, 456-456, 464-466
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
flowetl/tests/integration/test_qa.py (1)
9-21
: Fix unnecessary f-string and verify template path consistency.The f-string on line 16 has no placeholders and should be a regular string. Additionally, verify that the
count_added_rows.sql
template should not be organised under CDR type subdirectories like the other templates in this file.- check_sql = jinja_env_final_qa.get_template(f"count_added_rows.sql").render( + check_sql = jinja_env_final_qa.get_template("count_added_rows.sql").render(
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flowetl/tests/integration/test_qa.py
(19 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
flowetl/tests/integration/test_qa.py (1)
flowetl/tests/integration/conftest.py (2)
flowdb_transaction
(563-566)jinja_env_final_qa
(543-549)
🪛 Ruff (0.11.9)
flowetl/tests/integration/test_qa.py
16-16: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🪛 Flake8 (7.2.0)
flowetl/tests/integration/test_qa.py
[error] 16-16: f-string is missing placeholders
(F541)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: CodeQL-Build
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (1)
flowetl/tests/integration/test_qa.py (1)
25-471
: LGTM! Systematic updates align with QA check reorganisation.The fixture name changes and template path updates are consistent throughout all test functions, correctly adapting to the new hierarchical QA check directory structure with CDR type-specific subdirectories.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #7067 +/- ##
===========================================
+ Coverage 76.31% 88.30% +11.98%
===========================================
Files 277 231 -46
Lines 10778 9181 -1597
Branches 697 563 -134
===========================================
- Hits 8225 8107 -118
+ Misses 2289 942 -1347
+ Partials 264 132 -132 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4a18987
to
5974859
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)
460-484
: Use enum values consistently instead of string literals.The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.
Apply this diff to use enum values consistently:
staging_checks = get_qa_checks( additional_qa_check_paths=additional_qa_check_paths, - stage="staging", + stage=ETLStage.STAGING, )
37-76
: Reduce complexity and fix unused loop variables.The function has high complexity with nested loops and could be simplified. Additionally, static analysis correctly identifies unused loop variables.
Apply this diff to fix the unused variables:
- for i, path in enumerate(paths): + for i, _path in enumerate(paths): parts = path_parts[i] # Start with just the filename for depth in range(1, min_parts + 1): suffix_parts = parts[:depth] candidate_suffix = "/".join(reversed(suffix_parts)) # Check if this suffix is unique among all paths is_unique = True - for j, other_path in enumerate(paths): + for j, _other_path in enumerate(paths): if i != j:Consider refactoring this function to reduce nesting and improve readability, perhaps by extracting the uniqueness check into a separate helper function.
117-117
: Fix undefined type annotation.The
DAG
type is used in the type hint but not imported, causing a static analysis error.Apply this diff to fix the type annotation:
def get_qa_checks( *, - dag: Optional["DAG"] = None, + dag: Optional["airflow.DAG"] = None, additional_qa_check_paths: Optional[List[str]] = None, stage: Optional[ETLStage] = ETLStage.FINAL, ) -> List["QACheckOperator"]:Alternatively, add the import at the top of the file:
+from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from airflow import DAG
535-537
: Use enum value for extract stage.Consistent with the staging stage issue, use the enum value instead of a string literal.
Apply this diff:
extract_checks = get_qa_checks( - additional_qa_check_paths=additional_qa_check_paths, stage="extract" + additional_qa_check_paths=additional_qa_check_paths, stage=ETLStage.EXTRACT )
🧹 Nitpick comments (1)
flowetl/tests/integration/test_qa.py (1)
16-16
: Remove unnecessary f-string prefix.The f-string doesn't contain any placeholders, so it should be a regular string.
- check_sql = jinja_env_final_qa.get_template(f"count_added_rows.sql").render( + check_sql = jinja_env_final_qa.get_template("count_added_rows.sql").render(
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (79)
CHANGELOG.md
(1 hunks)flowdb/testdata/bin/run_qa_checks.py
(1 hunks)flowdb_synthetic_data.Dockerfile
(1 hunks)flowdb_synthetic_data.Dockerfile.dockerignore
(1 hunks)flowdb_testdata.Dockerfile
(1 hunks)flowdb_testdata.Dockerfile.dockerignore
(1 hunks)flowetl/flowetl/MANIFEST.in
(1 hunks)flowetl/flowetl/flowetl/operators/qa_check_operator.py
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
(1 hunks)flowetl/flowetl/flowetl/util.py
(10 hunks)flowetl/tests/integration/conftest.py
(1 hunks)flowetl/tests/integration/test_dags_present.py
(2 hunks)flowetl/tests/integration/test_full_pipeline.py
(1 hunks)flowetl/tests/integration/test_qa.py
(19 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(7 hunks)
💤 Files with no reviewable changes (2)
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (1)
- flowdb_testdata.Dockerfile
🚧 Files skipped from review as they are similar to previous changes (73)
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
- flowdb_testdata.Dockerfile.dockerignore
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
- flowetl/flowetl/MANIFEST.in
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
- flowdb/testdata/bin/run_qa_checks.py
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
- flowdb_synthetic_data.Dockerfile
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
- flowetl/tests/integration/test_full_pipeline.py
- flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
- flowdb_synthetic_data.Dockerfile.dockerignore
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
- flowetl/flowetl/flowetl/operators/qa_check_operator.py
- flowetl/tests/integration/conftest.py
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
- flowetl/tests/integration/test_dags_present.py
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py
[error] 117-117: undefined name 'DAG'
(F821)
flowetl/tests/integration/test_qa.py
[error] 16-16: f-string is missing placeholders
(F541)
🪛 Pylint (3.3.7)
flowetl/flowetl/flowetl/util.py
[refactor] 37-37: Too many local variables (16/15)
(R0914)
[refactor] 48-73: Too many nested blocks (6/5)
(R1702)
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
48-48: Loop control variable path
not used within loop body
Rename unused path
to _path
(B007)
58-58: Loop control variable other_path
not used within loop body
Rename unused other_path
to _other_path
(B007)
117-117: Undefined name DAG
(F821)
flowetl/tests/integration/test_qa.py
16-16: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🪛 LanguageTool
CHANGELOG.md
[style] ~22-~22: Would you like to use the Oxford spelling “organized”? The spelling ‘organised’ is also correct.
Context: ...rovided alongside dag files must now be organised into the correct subfolder for the ETL ...
(OXFORD_SPELLING_Z_NOT_S)
🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md
11-11: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
12-12: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
13-13: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: CodeQL-Build
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (6)
CHANGELOG.md (1)
10-22
: LGTM - Well-documented feature additions.The changelog entries clearly document the new cell info QA checks and ETL enhancements. The formatting suggestions from static analysis are minor style preferences and don't affect functionality.
flowetl/flowetl/flowetl/util.py (4)
27-34
: LGTM - Well-defined enum for ETL stages.The ETLStage enum provides clear type safety for representing different ETL stages and will help prevent string literal errors throughout the codebase.
152-185
: LGTM - Comprehensive stage-specific path construction.The logic properly constructs search paths for different ETL stages, handles CDR type-specific checks, and integrates additional paths. The stage validation ensures type safety.
286-287
: LGTM - Enhanced function signature for stage-specific QA checks.The addition of
additional_staging_qa_check_paths
andadditional_extract_qa_check_paths
parameters properly extends the DAG creation functionality to support multi-stage QA checks.
460-575
: LGTM - Well-structured multi-stage QA check integration.The DAG construction logic properly integrates staging and extract QA checks with appropriate EmptyOperator chaining. The conditional logic ensures clean execution paths whether or not QA checks are present for each stage.
flowetl/tests/integration/test_qa.py (1)
9-470
: Well-executed systematic refactoring of QA test integration.The systematic changes to use the
jinja_env_final_qa
fixture and updated template paths with CDR type subdirectories correctly reflect the reorganization of QA check templates into stage- and type-specific subfolders. The changes maintain all existing test logic whilst adapting to the new template structure.
disambiguated_paths = disambiguate_paths(templates) | ||
dag.template_searchpath = [*dag.template_searchpath, *disambiguated_paths.keys()] | ||
for pathgroup in sorted(disambiguated_paths.values()): | ||
for tmpl in pathgroup: | ||
tmpl = Path(tmpl) | ||
print(f"Parsing template {tmpl}") | ||
task_id = ( | ||
f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}" | ||
if "cdr_type" in dag.params | ||
else f"{tmpl.stem}.{stage}" | ||
) | ||
print(f"Task id is {task_id}") | ||
ops.append( | ||
QACheckOperator( | ||
task_id=task_id, | ||
sql=str(tmpl), | ||
dag=dag, | ||
) | ||
) | ||
return ops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug print statements and fix string literal usage.
Multiple issues need to be addressed in this code segment:
- Debug print statements should be removed from production code
- String literals should be replaced with enum values for consistency
Apply this diff to fix these issues:
for pathgroup in sorted(disambiguated_paths.values()):
for tmpl in pathgroup:
tmpl = Path(tmpl)
- print(f"Parsing template {tmpl}")
task_id = (
- f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}"
+ f"{tmpl.stem}.{dag.params['cdr_type']}.{stage.value}"
if "cdr_type" in dag.params
- else f"{tmpl.stem}.{stage}"
+ else f"{tmpl.stem}.{stage.value}"
)
- print(f"Task id is {task_id}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
disambiguated_paths = disambiguate_paths(templates) | |
dag.template_searchpath = [*dag.template_searchpath, *disambiguated_paths.keys()] | |
for pathgroup in sorted(disambiguated_paths.values()): | |
for tmpl in pathgroup: | |
tmpl = Path(tmpl) | |
print(f"Parsing template {tmpl}") | |
task_id = ( | |
f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}" | |
if "cdr_type" in dag.params | |
else f"{tmpl.stem}.{stage}" | |
) | |
print(f"Task id is {task_id}") | |
ops.append( | |
QACheckOperator( | |
task_id=task_id, | |
sql=str(tmpl), | |
dag=dag, | |
) | |
) | |
return ops | |
disambiguated_paths = disambiguate_paths(templates) | |
dag.template_searchpath = [*dag.template_searchpath, *disambiguated_paths.keys()] | |
for pathgroup in sorted(disambiguated_paths.values()): | |
for tmpl in pathgroup: | |
tmpl = Path(tmpl) | |
task_id = ( | |
f"{tmpl.stem}.{dag.params['cdr_type']}.{stage.value}" | |
if "cdr_type" in dag.params | |
else f"{tmpl.stem}.{stage.value}" | |
) | |
ops.append( | |
QACheckOperator( | |
task_id=task_id, | |
sql=str(tmpl), | |
dag=dag, | |
) | |
) | |
return ops |
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/util.py between lines 194 and 213, remove the debug
print statements used for logging template parsing and task id generation.
Replace any string literals used for keys like 'cdr_type' and 'stage' with their
corresponding enum values to ensure consistency and avoid hardcoding strings.
This will clean up the code and align it with the project's standards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
flowetl/flowetl/flowetl/util.py (3)
37-76
: Address previously identified issues.The unused loop variables and high complexity issues in this function have already been flagged in previous reviews.
199-199
: Remove debug print statements.These debug print statements have been previously flagged and should be removed from production code.
Also applies to: 205-205
201-203
: Use enum value instead of string literal.The stage parameter should use the enum's value property for string representation to maintain consistency with the type system.
Apply this diff to use the enum value:
task_id = ( - f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}" + f"{tmpl.stem}.{dag.params['cdr_type']}.{stage.value}" if "cdr_type" in dag.params - else f"{tmpl.stem}.{stage}" + else f"{tmpl.stem}.{stage.value}" )
🧹 Nitpick comments (2)
flowetl/flowetl/flowetl/util.py (2)
372-372
: Remove unused import.The EmptyOperator import is not used anywhere in the code and should be removed.
Apply this diff:
-from airflow.operators.empty import EmptyOperator
567-567
: Remove unused variable assignment.The final_checks variable is assigned but never used, creating unnecessary code.
Apply this diff to remove the unused assignment:
- final_checks = ( - get_qa_checks(additional_qa_check_paths=additional_qa_check_paths), - ) + get_qa_checks(additional_qa_check_paths=additional_qa_check_paths)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
flowetl/flowetl/flowetl/util.py
(10 hunks)flowetl/tests/integration/test_dags_present.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- flowetl/tests/integration/test_dags_present.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
48-48: Loop control variable path
not used within loop body
Rename unused path
to _path
(B007)
58-58: Loop control variable other_path
not used within loop body
Rename unused other_path
to _other_path
(B007)
117-117: Undefined name DAG
(F821)
372-372: airflow.operators.empty.EmptyOperator
imported but unused
Remove unused import: airflow.operators.empty.EmptyOperator
(F401)
567-567: Local variable final_checks
is assigned to but never used
Remove assignment to unused variable final_checks
(F841)
🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py
[error] 117-117: undefined name 'DAG'
(F821)
[error] 372-372: 'airflow.operators.empty.EmptyOperator' imported but unused
(F401)
[error] 567-567: local variable 'final_checks' is assigned to but never used
(F841)
🪛 Pylint (3.3.7)
flowetl/flowetl/flowetl/util.py
[refactor] 37-37: Too many local variables (16/15)
(R0914)
[refactor] 48-73: Too many nested blocks (6/5)
(R1702)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: CodeQL-Build
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (5)
flowetl/flowetl/flowetl/util.py (5)
4-4
: Well-implemented enum and import addition.The defaultdict import is properly used in the new disambiguation logic, and the ETLStage enum follows best practices by inheriting from both str and Enum for type safety and string conversion.
Also applies to: 27-35
78-113
: Good implementation of path disambiguation logic.The function effectively groups and disambiguates paths using a sound algorithmic approach. The use of defaultdict is appropriate for the grouping operations.
119-119
: Good addition of stage parameter.The stage parameter with a sensible default of ETLStage.FINAL enhances the function's flexibility for different ETL stages.
286-287
: Well-designed parameter additions.The new parameters for stage-specific QA check paths follow the established pattern and provide good flexibility for different ETL stages.
Also applies to: 361-362
461-467
: Excellent integration of stage-specific QA checks.The implementation correctly incorporates staging and extract QA checks at the appropriate points in the ETL pipeline flow. The use of TaskGroup provides good organization and the conditional logic properly handles cases where no checks are present.
Also applies to: 478-501, 524-530, 538-562
63bad7b
to
1befbaf
Compare
1befbaf
to
46a66da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)
117-117
: Fix undefined type annotation.The
DAG
type is used in the type hint but not imported, causing a static analysis error.Apply this diff to fix the type annotation:
+from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from airflow import DAGOr import it directly if it's always available in the runtime context.
37-76
: Refactor complex function to improve maintainability.This function has high complexity with excessive nesting and unused loop variables, making it difficult to maintain and understand.
Apply this diff to fix the unused variables:
- for i, path in enumerate(paths): + for i, _path in enumerate(paths): parts = path_parts[i] # Start with just the filename for depth in range(1, min_parts + 1): suffix_parts = parts[:depth] candidate_suffix = "/".join(reversed(suffix_parts)) # Check if this suffix is unique among all paths is_unique = True - for j, other_path in enumerate(paths): + for j, _other_path in enumerate(paths): if i != j:Consider extracting the uniqueness check logic into a separate helper function to reduce nesting.
199-205
: Remove debug print statements.These debug print statements should not be included in production code as they will clutter the logs.
Apply this diff to remove the debug statements:
- print(f"Parsing template {tmpl}") task_id = ( f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}" if "cdr_type" in dag.params else f"{tmpl.stem}.{stage}" ) - print(f"Task id is {task_id}")
201-203
: Use enum values consistently in string formatting.The stage parameter should use
.value
when converted to string to ensure consistent behaviour with enum types.Apply this diff to use enum values properly:
task_id = ( - f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}" + f"{tmpl.stem}.{dag.params['cdr_type']}.{stage.value}" if "cdr_type" in dag.params - else f"{tmpl.stem}.{stage}" + else f"{tmpl.stem}.{stage.value}" )
🧹 Nitpick comments (1)
flowetl/flowetl/flowetl/util.py (1)
372-372
: Remove unused import.The
EmptyOperator
import is not used anywhere in the code.Apply this diff to remove the unused import:
- from airflow.operators.empty import EmptyOperator from airflow.utils.task_group import TaskGroup
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (79)
CHANGELOG.md
(1 hunks)flowdb/testdata/bin/run_qa_checks.py
(1 hunks)flowdb_synthetic_data.Dockerfile
(1 hunks)flowdb_synthetic_data.Dockerfile.dockerignore
(1 hunks)flowdb_testdata.Dockerfile
(1 hunks)flowdb_testdata.Dockerfile.dockerignore
(1 hunks)flowetl/flowetl/MANIFEST.in
(1 hunks)flowetl/flowetl/flowetl/operators/qa_check_operator.py
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
(0 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
(1 hunks)flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
(1 hunks)flowetl/flowetl/flowetl/util.py
(10 hunks)flowetl/tests/integration/conftest.py
(1 hunks)flowetl/tests/integration/test_dags_present.py
(2 hunks)flowetl/tests/integration/test_full_pipeline.py
(1 hunks)flowetl/tests/integration/test_qa.py
(19 hunks)flowetl/tests/unit/test_qa_check_discovery.py
(7 hunks)
💤 Files with no reviewable changes (2)
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
✅ Files skipped from review due to trivial changes (3)
- flowdb_testdata.Dockerfile
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
🚧 Files skipped from review as they are similar to previous changes (71)
- flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
- flowetl/tests/integration/test_full_pipeline.py
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
- flowetl/flowetl/MANIFEST.in
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
- flowdb_synthetic_data.Dockerfile
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
- flowdb_testdata.Dockerfile.dockerignore
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
- flowdb_synthetic_data.Dockerfile.dockerignore
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
- flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
- flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
- flowetl/flowetl/flowetl/operators/qa_check_operator.py
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
- flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
- flowdb/testdata/bin/run_qa_checks.py
- flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
- flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
- flowetl/tests/integration/conftest.py
- flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
- flowetl/tests/integration/test_dags_present.py
- flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Pylint (3.3.7)
flowetl/flowetl/flowetl/util.py
[refactor] 37-37: Too many local variables (16/15)
(R0914)
[refactor] 48-73: Too many nested blocks (6/5)
(R1702)
🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py
[error] 117-117: undefined name 'DAG'
(F821)
[error] 372-372: 'airflow.operators.empty.EmptyOperator' imported but unused
(F401)
[error] 567-567: local variable 'final_checks' is assigned to but never used
(F841)
flowetl/tests/integration/test_qa.py
[error] 16-16: f-string is missing placeholders
(F541)
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py
48-48: Loop control variable path
not used within loop body
Rename unused path
to _path
(B007)
58-58: Loop control variable other_path
not used within loop body
Rename unused other_path
to _other_path
(B007)
117-117: Undefined name DAG
(F821)
372-372: airflow.operators.empty.EmptyOperator
imported but unused
Remove unused import: airflow.operators.empty.EmptyOperator
(F401)
567-567: Local variable final_checks
is assigned to but never used
Remove assignment to unused variable final_checks
(F841)
flowetl/tests/integration/test_qa.py
16-16: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🪛 LanguageTool
CHANGELOG.md
[style] ~22-~22: Would you like to use the Oxford spelling “organized”? The spelling ‘organised’ is also correct.
Context: ...rovided alongside dag files must now be organised into the correct subfolder for the ETL ...
(OXFORD_SPELLING_Z_NOT_S)
🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md
11-11: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
12-12: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
13-13: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (4)
CHANGELOG.md (1)
10-22
: Well-documented feature additions.The changelog entries comprehensively document the new QA check functionality and associated changes. The structure clearly separates additions from changes, making it easy for users to understand the impact.
flowetl/flowetl/flowetl/util.py (1)
461-501
: Well-structured conditional task chaining.The logic for handling different flux sensor types while integrating staging QA checks is well-implemented. The conditional chaining ensures proper task dependencies are maintained across different scenarios.
flowetl/tests/integration/test_qa.py (2)
9-9
: LGTM! Fixture parameter updates are consistent.The systematic update from
jinja_env
tojinja_env_final_qa
aligns with the reorganization of QA checks into stage-specific subdirectories.Also applies to: 25-25, 42-42, 61-61, 78-78, 96-96, 113-113, 131-131, 148-148, 166-166, 196-196, 226-226, 242-244, 260-260, 276-276, 292-292, 310-310, 328-328, 346-346, 364-364, 380-380, 401-401, 422-422, 439-439, 456-456
30-32
: LGTM! Template path updates follow the new subdirectory pattern.The systematic change to use
f"{cdr_type}/template_name.sql"
paths aligns with the reorganization of QA checks into CDR type-specific subdirectories.Also applies to: 49-51, 68-68, 86-88, 103-103, 121-123, 138-140, 156-158, 184-186, 214-216, 233-233, 251-251, 267-269, 283-285, 300-302, 318-320, 336-338, 354-356, 371-373, 392-394, 413-415, 430-430, 447-449, 464-466
final_checks = ( | ||
get_qa_checks(additional_qa_check_paths=additional_qa_check_paths), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix unused variable assignment.
The final_checks
variable is assigned but never used, and there's an unnecessary trailing comma in the tuple assignment.
Apply this diff to fix the assignment:
- with TaskGroup(
- group_id="final_qa_checks", prefix_group_id=False
- ) as final_qa_checks_group:
- final_checks = (
- get_qa_checks(additional_qa_check_paths=additional_qa_check_paths),
- )
+ with TaskGroup(
+ group_id="final_qa_checks", prefix_group_id=False
+ ) as final_qa_checks_group:
+ get_qa_checks(additional_qa_check_paths=additional_qa_check_paths)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
final_checks = ( | |
get_qa_checks(additional_qa_check_paths=additional_qa_check_paths), | |
) | |
with TaskGroup( | |
group_id="final_qa_checks", prefix_group_id=False | |
) as final_qa_checks_group: | |
get_qa_checks(additional_qa_check_paths=additional_qa_check_paths) |
🧰 Tools
🪛 Flake8 (7.2.0)
[error] 567-567: local variable 'final_checks' is assigned to but never used
(F841)
🪛 Ruff (0.11.9)
567-567: Local variable final_checks
is assigned to but never used
Remove assignment to unused variable final_checks
(F841)
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/util.py around lines 567 to 569, the variable
final_checks is assigned a tuple with a single element but is never used, and
the trailing comma makes it a tuple unnecessarily. Remove the assignment to
final_checks entirely or replace it with a direct call to get_qa_checks without
assigning it to a variable, eliminating the trailing comma to fix the unused
variable and tuple issue.
flowdb_transaction.execute(create_sql) | ||
flowdb_transaction.execute(insert_sql) | ||
check_sql = jinja_env.get_template("count_added_rows.sql").render( | ||
check_sql = jinja_env_final_qa.get_template(f"count_added_rows.sql").render( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix inconsistent template path pattern.
This template path doesn't follow the {cdr_type}/
subdirectory pattern used by all other tests, which will likely cause template resolution failures.
Apply this diff to fix the inconsistency:
- check_sql = jinja_env_final_qa.get_template(f"count_added_rows.sql").render(
+ check_sql = jinja_env_final_qa.get_template(f"{cdr_type}/count_added_rows.sql").render(
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
check_sql = jinja_env_final_qa.get_template(f"count_added_rows.sql").render( | |
check_sql = jinja_env_final_qa.get_template(f"{cdr_type}/count_added_rows.sql").render( |
🧰 Tools
🪛 Ruff (0.11.9)
16-16: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🪛 Flake8 (7.2.0)
[error] 16-16: f-string is missing placeholders
(F541)
🤖 Prompt for AI Agents
In flowetl/tests/integration/test_qa.py at line 16, the template path used in
get_template does not include the required {cdr_type}/ subdirectory prefix,
causing inconsistency and potential template resolution failures. Update the
template path to include the {cdr_type}/ prefix before "count_added_rows.sql" to
match the pattern used in other tests.
Closes #5868, closes #5869
I have:
Description
Adds some of our cell info qa checks and slightly reorganises the existing checks to allow that (doesn't address #6497) by moving Cdr type checks to a subdirectory and symlinking to them from the specific type subs.
Summary by CodeRabbit
New Features
Refactor
Tests
Chores