Skip to content

Conversation

greenape
Copy link
Member

@greenape greenape commented Jun 20, 2025

Closes #5868, closes #5869

I have:

  • Formatted any Python files with black
  • Brought the branch up to date with master
  • Added any relevant Github labels
  • Added tests for any new additions
  • Added or updated any relevant documentation
  • Added an Architectural Decision Record (ADR), if appropriate
  • Added an MPLv2 License Header if appropriate
  • Updated the Changelog

Description

Adds some of our cell info qa checks and slightly reorganises the existing checks to allow that (doesn't address #6497) by moving Cdr type checks to a subdirectory and symlinking to them from the specific type subs.


Summary by CodeRabbit

  • New Features

    • Added QA checks for missing or invalid latitude/longitude, duplicate and new cell IDs in staging.
    • Introduced a general QA check to count rows in staging tables.
    • Added duplicate record detection and MSISDN counting for calls, SMS, MDS, and topups datasets.
    • Enabled specifying additional QA checks for staging and extract stages in the pipeline.
  • Refactor

    • Organised QA checks by ETL stage (extract, staging, final) with updated task IDs including data type and stage suffixes.
    • Simplified topups duplicate counting queries by removing conditional logic.
  • Tests

    • Parameterised tests to cover multiple QA check stages and updated integration tests to use stage-specific templates.
    • Added tests for path disambiguation logic in QA check discovery.
  • Chores

    • Expanded packaging and Docker build contexts to include all QA check files.
    • Updated Dockerfiles to reference the final QA checks directory.

@greenape greenape requested a review from jc-harrison June 20, 2025 11:36
@greenape greenape added enhancement New feature or request FlowETL labels Jun 20, 2025
Copy link
Contributor

coderabbitai bot commented Jun 20, 2025

## Walkthrough

This update introduces new quality assurance (QA) checks for the FlowETL pipeline, specifically targeting cell information in the staging phase. It adds checks for missing or invalid latitude and longitude, new and duplicate cell IDs, and a general count of staging table rows. The QA check discovery and DAG creation logic are extended to support stage-specific QA checks.

## Changes

| File(s)                                                                                           | Change Summary                                                                                                                           |
|-------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|
| flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql,<br>garbage_lon.sql,<br>garbage_lat_or_lon.sql | Added SQL QA checks for missing/invalid latitude, longitude, or either coordinate in cell info staging tables.                           |
| flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql,<br>count_duped_ids.sql      | Added SQL QA checks for new cell IDs and duplicate cell IDs in cell info staging tables.                                                 |
| flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql                                  | Added SQL QA check to count all rows in a staging table.                                                                                 |
| flowetl/flowetl/flowetl/util.py                                                                   | Introduced ETLStage enum, updated QA check discovery and DAG creation to support stage-specific QA checks and new parameters.            |
| flowetl/tests/unit/test_qa_check_discovery.py                                                     | Updated and parameterised tests to support and validate multi-stage QA check discovery.                                                  |
| flowetl/flowetl/MANIFEST.in                                                                       | Updated to include deeper directory levels for packaging QA check SQL files.                                                             |
| flowetl/flowetl/flowetl/qa_checks/final/*,<br>flowetl/flowetl/flowetl/qa_checks/staging/*         | Added or refactored SQL QA checks for various CDR types and stages, including new checks for duplicates, counts, and references.         |

## Sequence Diagram(s)

```mermaid
sequenceDiagram
    participant DAG
    participant Sensor
    participant QA_Staging
    participant Extract
    participant QA_Extract
    participant Gather
    participant Attach
    participant QA_Final

    Sensor->>QA_Staging: Run staging QA checks
    QA_Staging->>Extract: Proceed if passed
    Extract->>QA_Extract: Run extract QA checks
    QA_Extract->>Gather: Proceed if passed
    Gather->>Attach: Continue DAG
    Attach->>QA_Final: Run final QA checks

Assessment against linked issues

Objective Addressed Explanation
Add a QA check for cell info rows with garbage lat or lon (at least one unusable coordinate) (#5868)
Add a QA check to count the number of rows in the cell info staging table (#5869)

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Addition of duplicate and count QA checks for various CDR types in final stage (flowetl/flowetl/flowetl/qa_checks/final/*) These changes extend QA coverage for CDR types beyond cell info, which is not specified in the linked issues.
Refactoring of DAG and QA check discovery logic for stage-specific checks (flowetl/flowetl/flowetl/util.py) These structural changes are broader than the specific objectives of the linked issues.
Updates to packaging manifest (flowetl/flowetl/MANIFEST.in) Packaging changes are not directly related to the cell info QA check objectives.
Test refactoring for multi-stage QA (flowetl/tests/unit/test_qa_check_discovery.py) Test updates for multi-stage support are outside the scope of the cell info QA check objectives.

Poem

🐇
In the warren of data, we hop with delight,
Counting cell rows by day and by night.
No more lost lats or lons in the field,
With QA so thorough, no secrets concealed!
Hop, skip, and check—our ETL’s bright!
🥕✨


<!-- walkthrough_end -->
<!-- internal state start -->


<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKNwSPbABsvkCiQBHbGlcSHFcLzpIACIAQVp6Znx/RhIfFAwAM3xGWFEAa0QAGmj7bAFmdRp6OUhsREpIIn8yNG4PNGRbSAxHAUaAdgAGADYBovR4ZmQCSDJYTAYPXDzIILQvdXlOhyoMJcgACgBFWIBKXILkfy9qKNml9Pgs5OZqeHwsAHd1BCwVjwAMS8+C+AGl1DwKPgRGINDAEMhsNxaHdGCCGsgkA5pJAzABWAAcI0J6Aw9AJxIAnPD3LlMKQsfsvNglOgEs8iOzaOoPhgNqkni9IKdLgxCmT6IhNkRYBF5P5kkRMPAAF6csKrEgADyQ4gwXIYeXFMxy/VwND4aAYTFYijRAIa3N5n0Q8IAyu0GPBMvAGBsvLIJgC/CQlSrVe9PhkJPgvFJkEkJBqAMIByAAERIuG0vhsomS9EOKYzNguuFk7TFEueszQkCUPP91TKAh5/jEyW25MY/neBvssmYAjjfsgmwwEtmjo8RqukEy0OYmo8iC9PrHFarDjb8A7BAo8GktMRocQuAo2DE2EPA9o+FxGHwoTQCX8iCxH+CeJGABYqQMJ6zp8F7wAIeDJJACzINkFBvBaURWLIKzRr6kRIogGoAEK3OKEzIqiLYhgIexGpAPwrCu453OekBvOelATG03CbFENwkBImChAA4uoAAS5TUf0XjFJKdQonc9AhrwMKiKERoMmk+BEPCfGghxjFhCEon3gwjjsFGWAETREyxBQRrqHJN4Clm3qYdG+ZMBQtDIIcsSlmcTE9psSwYE6eSvpQyBfJQHhPi+CRRM8mqoLYQGeOk/hBCEjCYJA/ShsmJAhfQ0VKDm8C+NQUEWtwiAcAA9BVRC/OUGi2hVwKgpU5KUI1ILguoFXcN4XgVcMYwaEY+jGOAUBkPQ+CZDgBDEGQygtrabAYJwUL8MIcmSLitRMEoVCqOoWg6CNJhQHAqCoGlaAQXN5BUItLDLatVBfGUTguOl8i7coB2aNouhgIYo2mAYKZ8bEABy3EAKIADIAPLcRozC0BwBjRBjBgWJAsQAJK3QtUQOO98hTfSBrSG4p7GTQ3LIA0UhUL45CvY8vjPLB8F8iKsTVqaVFNV80PQLDPDwO0E4kBMzwMCyPIDnOJoLpBlQfhqtzhKyUt0di6ufDVuBaxMquYQOJC/I0GvqFr/B8CCBrW0oExMNgK0aizgq+LjGaiZg9Au27d7Ij5aLe26kC4y+Ik5LW0K0FeuL1h7pB3QKoqK/kmrFQHuAzKsvTDo0ZPQl8TJUeeaA1QOOYCJE8UAAYMH2NAAPqokQ9cLq7Yjc9FgvC6L0HpSQZBzNqNCtfQFGwGEXw5Pg3DiJ8ArOEQ+kraJ9evjyS/8l4LcV1XRAtwEaAtxnLfcNQsCIJ3fuQFvHK7xsLc6he1q4CfZ8X1fKy30xPhQQajXKIDczZuZkzflQHmfMwg5EvFgSu2g/KhBDIfDU98oEfzCCodCtJOiZzDsrPgKdKBjhLHYdOxoJQLCkMPUeSQeS+nuDHZkNsQwULCJWDoPY0E5lICuBUrs6itT4IcEgGgVITAAOQ5xblMc2iBpHD1tLiRu+BXafwUUgDQwAGC0AoC3LcJA9C6MPiY+uZx4QpmSO+bgnx5ZEEDMGVY9djEtymifYILgPGGIzp3ZhXhcpYEFhmLCPQcj21IHwGWctcQcNLFwqsnR0BlEyL6bU8J4g7z5AGIMMCM7IBksmRs6AomYTZBmWI3EFyFVxMweooQnyvQyuGDAaooorRyCGJy+5WzZCCY0Xp0h7Hkg1NOVYA97D8OWHkbY3AWLyAIMGSsfo8kP19HveuQ0jBY0sLELwlpDL8xDEoWWzhjn8GmjqexFAWyQR6rXMc7BeSUwMFAbJ9w578EXrkoqFA15PROasfuIsH5NxIHcNulcAnd13lwR+OTl773QQaL+59qGX2vrfMSiKXR71fuPKgYh0U/2xds9GmN3kg0yB1bMfVaXqQiBVRlIVmV4EKhobgsg0YY2iHsnG+NSH3SJo4N4H0yYKQpogIwsQsD1wHu6GZncyCOHIik2OigE5T1+HREghcKDIAVQADWgDYWIKZoD1wmPXd00Bqm4yhta3FgJHWxFhp3WY/heDSHYJAKZ5iYKQSoVcWk3yPaZDhXyTemzaAtxalMF+PJzycmwEgP4x8HDpPgNqO+PZ67JrQMOeAa8oW/xvs6kKKRt4sIUMwHqtME1vF8K7eAyU0kZNxLBWpkQeDYrgQ2JARa1Cltpu6Y4osaD1o1h4NCrQ2DBTyFgBphzxa9rnfTBYKQ0FFrCrunZdJ66kE/qfDFVxYX7F3uqpEEkiI5GtEsReqSPYL2fr4eu5jO5XyoGwS0RwlCZGuqu6uOR66uohu6yx8JI4RNZq6C8V5c7TMriQMAIDvS+gYPYSF5kZ7lqDXwEN4pIDjsnfqliNFProAWRNcZqxzHoGQPWHc7Y5JdgHQBoDuAJhVJqXO8c+BwGfB49UrS060QNGcGRfDXl6CYG2E/P5BTqF9r/tB6aIZeNQRSfWRuBijHcK/Rc39mljFofXJh1srHOyHkTtW6O1H2iT3ipGy93MmA+Dkkx9I2zEABC8AEupMElwridDJwdiBh0luwJRgEe5VM3wHWgWM8ApIEIjhmBQPgkDRvwre+JqxePSJmORmd2GpN4evugG0hZOTOLEv4Q2FA/KpM2LRMm9dTgpmofDdo91kid2eBXfYuJp44MQIQn2GQnK3P7FyPhKGxKvr+YGKienDHGIpQYXG/xXEQqhe3C9PdhOzxyC+35yK+0/uzEFcioVuR0ARdvfFL9UXH1PWSv+eb6B4rfYS9+JKPuYvwwEyC6GfSyAwYpy7RHM74eIeXfhGCexYLEMhxkZ4vTiCkIGeKulzy3i5F8FJNNa3+FpXJULHhv27stG6IwAB1X40VNOiaYCgy8x2sAghqgwZ2AZ+ZHuzKS6hOKxshg/cqq7tPGgNFQaBu1DqnV3ZSENygLY0CZD/SGWl2BtTYb8pBQpYl+iwVmR4euqPcCeoIRML4VAaPBNSfXaABDuLQmRJ3fkbAfvRDeyLq40RtkR2mm92BOo9QuJINsFIClnhRHNCFUeOuWT64aIbq0+arc24m4gAA3PwAEFAfgNCjwuVPBvEBG8+OQMQyBrPygHZbolH8tvuimIVZw9X/Q+GCrqyXKHO5y6b9DU15rLWdyrR4NXdyoia+1647PtsflvvBSyBiFAc/5Dtw79oTvdOu4m+7jR3Ave7t91bgPJog/qbHgD+Soux6R8EegWPCx481BIGb50Ct4NUFrExj2M8EoNqNIAXs+HkMXkgNrCnqCFCIJiPC5IOvuLjlTLOs8GnLzMbvPo0BLtQDmEaFvqGG8NFPfM4B4PbsxJ0s7ofvkMfp7j0Ofg/NEJsi/EDoHs6vUHRh4Jxt4KEGBm6h6ujhIgiL6rUnvLAuQQ9nJvXqCBzpCsuDmLnugFro0DTBqB2IWIAfQCCEJlgGHDsvDIzAGGXlKhjm0h0spvOLUAGjMmJDLMkLNrTACMuLWN0oVqJjqKIByqhB1NVk5I4vVmNoWsWqOhAtNLDmJhRrTPDnWClmlhNhlllm1tGjstjAckcrvMCjwaILcPdNGlcmPLcvcnwI8j5HMG7OIG8h8pFPQB7KqsuOch+A/IqjMocAThMNDAXGcJPrqmwAauVC0WPhalajaortxI6txM6gIRBkIdFPXKyvSiynSsyoseyuIF4FyrIBSjUaUhGlGq6BssAfGs8ImvvMmvqGvOmpyAfNgNmtqIcPhj0biiESOjFq3Pho8dilwG1rgAANpWDXwAC6FwgMg6YgAJ18EwvxfxBOQJQJg28qaxDKKxKJTKfUHKmx3KOxkAAAqvlvQK5lzg/MelfoUIiRsqicseidSWyhiRsVsZ6vegkM+tlCvkpjTiZhQAiuYlwL1m+n8a0ShkCZAAALz+oixKooYaDgaQZiRmFRRIY86bg5AOALLJChDmLmagKWaRHw73yvHRaGSFFTrRHU7YpDRQD4mERRBElXqNzNwkDQodwZCUk0nIm0lLGYmMlJYsm4DhpsnLaXack3aGpPbQ4Er+7sHihYp/x8kXZ7x/Gwx6iwkXjwlimQAQyfAkDfYPzPZ/aX7Rn5Cxk3zxkCnJnnipkUDpnilZnkA4nWmSRdxubRgOmQqtyHaukLFUkekenenYkDqxI2w94iRN6klFli66ohgewD6kBGbXZ/rD6zC2r2oTHK73wmpmojF3wvgLLQi8DwBoj2IAGums41JKGZxKBOZKD7BHiiSUGO5dm0H0Gn7IDe5RD1x+5I5ooTlB64rRCFnfyi5B6ybSE0Fu4e6n6ME+7MGsH7y/mg58BwXCFWEmhDR8rDQ0pUk0DniIAVStq4AVQ4UnpAUFBtxIBMCMyyBbG8qYzpFCrzQipShirOCkzTQKkyoGB0jEVlCWQI6RHJqUWUDyDE7IBvwTS1pqnFGagtARb6ho6RGBpcBfmVycjRATAAUt5iDqViQsEYFeDRAniyUNA060xNjPy2bBT3aarxxLBSQ5BvDcCoWZykY9rzrxLuHmylHfEPxRmkUmi5kLH6Vkk4pSHkAMQ1DyCcLGJiRUH5oaBvadyQTbJwVD53GdonLpZhx24IC9rN734hWDZMbl7FTmW5IfTTwaKanpU5rHiiG2wlr6X2B1YW4TmdxlXLwVUpL+BJh0AHquLEVtxf5cYhUtzZCuy0CeopSiUKAYCMwthuGpIhmWgdJSQpSWT5EDj4CMxU4yUjwwKBoTBXxqzVyuKfqv6Ap+o3I+TqCrbLnjn+XkmxUfjq50bFQhiNY3jkD0CRELwLTJCJjUBkQhg3JyRRAdX8gfTdqQpkTmLwiAiQQhiaX36lDmJl6g1iDg1+gWXyCoDkYVg7IHK+AQG4HaRZyhA+rJgaLShfSOn0DxysDyCREbrL47Dq5RA5xMbKhDaoKrBxg/ZtXgk40q4eCk72V1BOhI1wWo3lAN7sb3y2jfodDc0oJHEEptX4KJHeyMYNB3IFHNJFGU4S4aCpUdo5rxRFYlbia0ySa4YJY62vWHFT7iQ2ni3XV+i3W41sJshS36Uy27j7hdiE0Rl5JMQ9BsnLVkKqj3ApRuIhAlnkWRahHvF8iT46a1EDqMwQ5UT9C0IfB8BkwS6GmjpOkg51AbFbDNlc7BghAajqDIDNAn4ahkz8bw47SPTRhK0rTIFsaWXkS6oY0tjF3vFRBVUNr04GBZnkwY7O1vA+2eUKApDKlYaF2rDEUyDR4OLshtCXEriKE5ALDki9orriAsQeCKUzKiTqFmzahY5EzZhMY1bOR1b5JkEcinVhQR1VbD3GnEU7LmD7KHILQFETI5HnL5GHGQK30amj2lHlDlEvJVGcVnQpR2ncyYRED8hNai0EkIoAZaTnhDWAZ8GjXjXkiHDPGABJhA/AAALcp/3ir5BcrGbZiHjR2HDRCnpo2D10A6V/EcPS0TBwUhV/GGXRAgkTAcNW46UADeAAvp5EcCpUfDpX5WeiaKIxoOI2cCCQ2F/gQ5/LwYcqQxouQ1wzMhMDw7QM8fMR6evfhe0oRYNUWYnUJS4Iye8jAKgwcYgrUXg/o4NeWonVFmEZ8F8X/JY7fWDdYxSd2TSfYwRURfHS44JdtcJR4xhdSmAEYL2VSQALKQy4yAjQx2oaDPC0X8r0UExMVvTipsXT1vJ0hvDtKZCoN1LXou1NmzBDlshXQh2+C/yWhYDdpXziiqUDgboegWZrI+D5KQgG09NdpBYCbNi2kaJ8By0fSRA4714jxOUTW4GuIemC2bP5LiVjIf1QgcQfD1DTbr7cyDOUAtYrDZxpP+D0DIgDorCyXbNpC7MkDcAnioAKnpTQiBTPP0ZMBVgt3LOLNO7STWj5CLYXGHjgS7xpGAOZEgML1nJ5GXJQPFGwOeBPJYaIN3lGB1kkCUr8pZM5M9n0s0kTksr6UVQjl4VyLxwsRrLVAJX+YVMCp4zVNNnEx1OFEcWyrh2vSuXJRdVMYZ0PCmO83LDPgCgFz9AF3TScshwtiaHORlzMbTO2lNU1x1z1UyvyCN3IjXCggyDyAn1rrAQsjMAtaHDTBIC0AYATA2niBsDSxut+vmzBjWjQmCaGTyK0DGyYSIAevnyKuUDfrcb8B4BED4CchevWS7wTDkD+nJDb4/L/V+KKDaw5w+LfQXD3ypYvLMIN2QXIB50DhJDbqH38DkD8A2g3j+AjbQZKtYCc1QRwHNPyAOBkRavctRCWtlRP7nhAtFLQglKJzNUGi9qIA97OBnh8Ghg+rp6709KKuFGjurP0C6v0ATsyCdCj27ariGv+xxiOB+T/0CtAMQPPM4u5EXJZGFE3IwOTRwMksVHhDkueNT1fuz4/vEsIOVF3ki1gUqyKAbh0AF49KH0CLs5RtIZV6RCrZkxJxsnSveLyBzroVUoQDZMGC5PulUlMtwWssC6suKttzBxjtuh+ZeD8tVPCrCssUSrsXIdvK8wex4fCVuUoBMgXhap2Xk2pReB6Qzp5zKs5jMx9BFyauMerOhgBH6v2DXviECgmsiF0jmtNC1t+A2tUb2tn1ZZ3uuRuvRuesNh3A+vaxTCYQBvwBBt84rNhupaRvusYCxuaLxvOCJtVUptpsNgZt8hZvZhzwUB5t/X9aFtOwKCaKltFvls9h9uwADvyZhAKJiXWgzytrtq2hqBYPcxUHODhwI0kLGdjaNuzJpTZltt6QUCdtLDSzyQsANoFZhRKcavhdcurNMZyfJfd0tS3PZnxRVWNCGfExydOgHtoh9vWjQjNHpinsTCfXNYajR3QgoDTRPgDfau4gR7Tu7KPtYuHGgN6PgP4vXLQOgfL5lHPKQfVGZm4cTprD4fCc1qozgr0eLc4W8sBZdnkd0menMpUcstssVT1zUuYWkdg9LHHOPV4XUcw9yLaLMd8vw/seMWcckxiu8ecX8dsn8bvk/aY9sA6IsedxDz9Cjy/ciJ7RUTlpung9I+Q+o/Mt7w0e96w8ubtPs4FQtbMYtXjjx6hitNtcYKhhWx0Js91jhSQEkafdzoIoaAaCssGJ0eaLyLU/Y8BZASq5vnMlIoSGuXfTL7L13NazwungRp1IYs4xPuXLXe4vvsFEEvftPfwMvcAdvfAcPclHgf++vJWXVoZ2wdMJHi0BEc0skd0sUfJ/g9Q+88Y/0fOc0848YV4+pwtgiusVE+KQk+Suq+izk9MHqJ69Z+G908pIM++OlIHN8AIuURxMc8Mup/c/o+0fB50j8bC/ILFWmxOIeCSxS+hS3kDj1g3DvAK9VZK8k18CuX8a6FNnFTbJa/6IUC68rT69RvA/9+QHT6m/Oir6W9FvW/KRjg2Xaqnmnj8b9CyBb0rCoDk5T9LDO8ZHANXevu3cftvej3B5H71JavdkGmZHICBxD7PcwBAfCPmAzyJEtGE8He3pdAjLx8EeSfTvinyWJp8NgfPESHv0/jr89Or8KQBvGB5sd9kDFfPqKkJ6SpieErR3r2gp7/c9epA3BE6Q0iUDae2mDeozwzot9WeVWDvsjy754Ce+0PPvoL17RD8hsqSUfr2gn4f8ZeM/OXvP3NKURlymvbXrvzkScDa43AigbnCP7G9T+ESc/kpkv5shIINvOFg/1QAsCJEADF3pdxfZUQPez7ZAEAJgGgD/24fYaJAKKI+8QBf7MlriC8HIC4OzCWgOAQwCrYcOr0fjGNjF5Ltx+kvVQWQG9Az9lejQVfu0xmq/dMBtLMjhINWKUcpB6fWjgYNDZkDSBfIcNob2oGCohWBfLjvU3FYGBSeUrNXu0zYHV99+hgyIC3AaGfAmhR/fgfQib5RBhBbfGeGIIqG4Cue6jQoDzwIEw9j+6BOQSBGH6KDxeKgr/J/1l5z9scWgmeDoO3469ahzYLgaMLqGNDUsdfcwSJ0sH5lrBn3K3nYJv5YYHBLOB3mTyd6uCf+3gr5qsC8F3cQhwA39hB3gFBCg+hLMDrAICFINoOv3TbvqjSYRtl8KA2ISUMT5lDlhaJbvqsLR7SD+eNwrzi5CoG48aBbQ+gaK0YEl9mBgI1gVX0pG7wJhfA+niPBmEfNREIg9vpz2JGSDSR6wvqJsNkHAQVoewtIWPwl6tssh0/LkLPzSCaD7alwvQcQPuF6EuR/mLYSbzeH9Ny+CgWwXbB+G29Sk/wpwayJcEXdf+Hg05G+1BG+CiWyIiIRAIRGhCYRYfVEVELA64jY+8QxIWXxSG6o5RvaNnkqIOBFDai+IrCkSIh59R8BEomofR16A+B9eiiGkbnzpEcd2hDAnjsyO6Ghj+h7I9Mb1CzHZ8QePIwQc3wFHzD2e4gxMSmMIF4UthwneQaL0XbyjDh0vbIScLVFnCNR96PISvz6GRANeVw/QRWMzFY8zBohQ0Yd3eGXYbBHgb4bzktHUE3+b5W0d/1d4ft3ezoyEdALdH+CPR8IqAcHzPHhDwB0Hf0TiJiGx9HBcrc3vGMR7lCRRKwwpOKLbHaiMx+8WvjmLop5j8eBYxkUWOlQsjkhZYmCoMM/gASD+1Y+vgIL5HM9Dm5wpsUsJwEkifxvffnh2MH67CFBEYjIYqKOFqCVRGg4cYr1Ay6Cd+/4ysUBNp4vCnBZvF7L4DXHX9Nxfw3bDaJgl1xgRB47Fp4OPGAD7uiI33reLhFAcrxkksIbCPD7aY6EjfG7kgIDFPjtxp4FcRgHfHYDmxOE0UXhPJFEC5EiEsYX5yeHATKmoEugcxULENNS+AnCcXujglmTKxFkvUTWIb68iwKcw1YGz0WFfjgpFUVsZKPqpESZRJEnscoMyEUSBx6g04VtBHEPx6J1w2cfvE8lWSWJi4iwcuONFcSNxt/b2lpP4luV9x7g7ImpM96QMJJ3o0PnAMCGySoRfg6SUpOdpM9o+qAvSYSMMnYSjJouX8TDxwysRCGjnCuPWmskCtaBhMeyRBMcnQThOAwkaXeU/jjScwk07kT5LrGzCGxAU0QcKKTGhSqhGwmQRFKF7ETuxSgsiR4GjGDj5emEpfkXhNHq9Up04iqCtPjrrSi0gLHKe4CXHsSL+nwq/kVKwxbs/U1o3cQJLtHpFhJf/USQAK951ToRDUlEYBygBeiUZ7ou8Q+K6mxCH+n9L4IkIzr8YgpR0lHmKPwlEDPpY0vLhtN+n6ieph0imcZOqH89ZOa0umT9Kml59ZptTIvkyKgkljnJFfWCR+Q5lGIuZm0/UVMNUlM9/JmEsmYdLClnSB+F0qKVdIOFxT+xyo1JElIX7aC6J70iWd9OllG9ghy/WahPFCCgzXh4M7upDNLGCT7RoIo8YjNqktSbxikpBpeM9lIjzxd4jqbUQxG9VsReM2PkzM/FHSVZ/PN4NqHjRRsPWiAS+JQCrE8zbJfMwvtxwWnCzbR0FD8nHITm+dk5fWNOVtNQl+S9pis5mZUMpkmT2xUoq2bKJik3TJ+lEvWUOOSm0SyQlsgoZOLelajC5NnJOSnMMTaIFx/0vKYDI+Giwvh5oniSVLQFQzypQkyqWCMQE1SWsro/2W1J9mT05J9U7GfAPvHOjohMfUqa+PxSRzExLMwaVTLwpDzE5fkUeUhPTmtD8xDIgWZBMZCLTK+cEx+cXJflZ9JhtYtCQrJ7o2Z5ASsqOTHKIGET1ZIvEflrPIk6ychVE/WY9KNmDy0A8c4ec/NLnAK/pJ/V4flPN4ChCp884qbLDt4vinZMMzFg6KqkQjxJfsqSd7PRnBDTxO89haNnuy4y+AgYi+VYM+DXy+pIU1sSjHZYA9VOkkN+YKw/lzSv5OcnoSaMM6gKwKLzTrpohG5qtlOR3MdkewLB6tXSBrHUs+OQp6d4ohnU9iZ1Lhmc+CDrSzs6yZDULHEdEJ+V6wc4KI/WLnFANT3c4htdR3ncLs+wmCxgnWTpAgApzCW3s2ALcYOPgFfAxKIlbcOQiCCSX5sEusbJLgrUVapclA3bFADeXEDVsjOJ+Otslg1D1cs4WAJroJha5tcPAhwQHtIHS5SQl0oYJrBC3k4Cgc4hRYdjPBaUnta2FUxhevOqkujkZrUnhZ6OzKIdTwILbeMVRFlfchOYYlCHgDogxDIcuQ+QFwrYW+i7yoigyScu/F3yWWki7US0uaG0j35YEz+dnK6EqLBOH0C6CtDjj38FWmiKiFEtVZ9d92Mi2mMe007g58ZFirgVYu+42KS4trOiA4os4eYrORwPBZ4poCOcfFbnPxYGxwQecsp2I+OKEsgDhL9IRiFVl4GSUkqElGS4lXEvvBfAEhiSsOX1moDJBslxbPJbIDZVtKilVbKDjYrq7JAGutS8iQEXhqQQYaM8AZWUuRAdcpOMnWLPnH+VkxrljGKiH0uLhGKkC43HweQCm54AZu33ObrtRaWjckMK3fAGt3SAbdOlX1HbpQBjgHccgKq07poARCoBiklbJBekPsCrs+AnEFkLdIBbvgXklzX5YpwNQArBuaIdgH3QEDntJol7LTmYvHaQUNQTcDaoeQfawy15bsvFiwv2UKTDlb3T5HJjL59yLcHLQFceD4FdjZehnL5d3RVX/D0CEhPTulHjUtsk1GGZ8YiucXHL+pA65MSdL6iXKqe2Ylji0PkX3LFFjypgbnOhn5zKemfA3iAu2lgKq59taBS2OHUVRLl8CnYRrK9W9jtZxwxKZ3INkXDRxvclyVOK1FjrkJrEs/jpPIXAyzRAmTcdPEdnODRlrs//nmqRmsLC1jUveRjLzm/cEU96uvqDyjm3yrgv4vdU3NOLqDrpCowNago1Bb871y68dYzPh6lCa5jLHdaOuXWH8J1tyqdXZP5mzrixKiv+R+Sp6kaZZ6i+WRusCkEbcJ5y3nnusbldij1sUlBaevQXnrMFPc56eWtvUMSGND63KSQunmrjX164yhb8MXm0Lv1q8sZbms3k+CplXsotbMsA0+jgNUHZ2vwq2Xnyl5wi3SXhoJHsaBpcG6jsRo4F1C7hPA0wWRtzF3LKNWczoXOto1iyl1Tm24UYPIHsAoNzGoQaxoOkwba5rMggdxvOkHrEF+w71X2ME0dyHpKUzANetFn9zMNkm+jsMOMGhaJ5xCtiZZpfWzyQZSmq5tu1U17j1Nv6hGf+o9kFrDNaMwPgfKxkBzj5Qc0pOHIQ4tsQxzgtuQlJVFjiXphQjVO8vE59VrNCYsRdHKI0uRtRhWnUVSJuUeaKNmcjocXyFnPKXJi69gUMOc3Ba8VYWtdZXJZ6Nit1C2iRctv3XSiktpE1DSNt1mqiMt3cvLelMC3UA7hZ2krdZSfUFSFN3EqhXEgs1qaXZbvP9Vps/bXjuFem32a1tRkeiJlZ81AQXmG1M9UhLcl7XdNyHZaV5mTGzdFsI11yuNy2jkY8OpHuaQJnm7bQ5KeV0LDt8EtbZyOylMaLtLGq7ftKFGk6ON9mi5fdp42XS+Nrc/HUJo+2iC0pM4n7ezpp2MyZNZW59ZxJB22zeJMUZeYRwa3Q6mtsO7eQcqM0daDNKOnGafI0nmbgx2wMtQdol2iaVe5a+2rGMbD9qQpsGk0PBsp30c8FG2unVtpqbebdtP8+daou+5vKxOtle4AsC0UbwflZKnoEqs1Z6gZYoQPJu6FxjugMwEMYLCwEUHadwVRgyFUJwaCRA68kANPRnqz057FCqwAvZEC9Z0Aq1cnZcFwXUHhUh6yey9NhlL2ZsxIAIXtoqxG7vgHFA4Irt+Ar2Z7s9M7IlUFG5j3hHwz4LcWSC+gOIOJ76scJBFbTRgxsOcQLncnL3p7J9Z7BoAmsgDRVuEP63XU6PdlbydNCOo3Z6I+6ixDO4Go7Z/B92TDbG/OuzR7oc33byICAEdl/njzFV1VSelNF3on1V6FwIWesHXooK/Aqqs1MhezDYT2Ral00PBf5xWh77NAc2j8TfJi2ca4tXuvXohPnG06bJ9OgPTtsFnB6/NbItyRlKrHnaK53OjCZuts1nLBdFOhuQlse3NyUNqW9ue9vVHdystYmg7aQLnz8EZdjEucSuqIWA6jRKBk0XPPX3KbXFQiyHdmo00w7JlJuo+U1Ixmdbplemk+eA3R34zHZOk13eTKIM8GSDUisg0xJs5yKZpNBxnb5uZ0DD3JChxjd5LYMRaed1c7/dwd/1C6+DasxLYIeQVoa0tohmiYvyvWSGctJACTd9v37kG3DShgGeVtV2Va319glTV+vq1Q7Dx+hk8fDsN3tb9NyOow6iJM3m7Hx5m2hbYfwP6TB1i28nU4fkOZSHh4wjnax3I0eGCe80pncNt8PMH/t5c6YZdo4NsawjQ6noyOuF38HENmslLSepEPUSu5yRgeflpcOZiZjiuyebJvyNqGqtGhrcRDrKO6HGt1+5rbfsMPdbjDnC6o0BtqNo6LdGOwbdbuG046hD2x0bfbvyEHbnds24nfNtOXiKlteFGmZzIXT0z3D9ImdT5po0+Gq+CJyWUie5mzG5ZwRhY1FsINk7YtqxqI7XoQWxGtjAmnYxgsy3jbxNBxnXtidNkMzzZZx5XcDsKOKbrjGuncXQsv0VG9dBh+o68ZA3vH5JbW1HaZsEVxC/jrJBdYCbiOva0FoJ8cWkY6bFCOjvUmE90bJO7rltJsqWeycnWjHwJSiiY3nIGHGncTZslCXMfYOt9edCwrg8sYNPxbojAh6KUCdpMgnEjexw2eqYm25a5DtpkIHidOOlagdqhihXyZKN8StdQI8oyJMeP6679NRi8c1LFO7yoOspzSfKax0Z1lTNJ+IyIcZMuS7DysuE7upwVFzbOJc1OePMoPTTUTVG9E3tsxP/y6zPuoBdT1XVBH6xIRzg0seOkrHDTFJ7Yd6c2PHq/Tb23YxeqSwVnNT0huTLIfekAKGzfZ82ADryMq7LjRRi0fyYBHQyhTqZ8EWJIA05mZlSOj49KcDl8Lmj/W+U58CG15ySz8o6Ma3IhNx8dTbpscx6eW2bmR5BCt1iiYUXtmg9fHLswXJ7NPzGzY8sC/id8lOnPBe4XulAv/N3bJznY0XcltnNln/TC5kTRIYd1SG6hMh5k7v2Av4KmzSFqM8odIVr64zxRrQ7cdPM67hTaZ0U3edN0ySTDLx3M7wpSD5nWjNhjAX+dHOtjEA0wK5VWrj4tneZnh8Y94ZWWGcfzA6PpdOUT36LD26nLQgTLVovw9OncONSfs7X1gJ2zdTA/Cp2FOs/ICKPBdMW9YKJpiWfNy9T2mIEFpieK6Ylgd30UAE20xELqmwNDTFs2MXfIMFeZUHg2VzqDciW05XfQOxzwO6Kso+g2LTkv2qjDOCcWi8ewaEOnNKsnb1suQ1Sl5sKsDUBF/0TeqxPVWm58BDOmXbLhgCHZXhBlVa4q2JV1DngMRXS2Xihr6X+rvw3qYNYHHmxr149ui/rkMq6uGXLFnF88xvJ4tSm+LbxzGWYYf1oio+Aigs/MtnC8dZqaHemHGDSD1Mkhoe4vemqfTa6oTBBhbe7rWHUcZLzh/ftcvAvTrILdB6C2pbD2icPlEnBtaEBNXAqTFXa1ATp18CWKzWUK4zjCvsWroEVsSlrKcNrTug09eqHMIRHrDtF2rqqlFfZzRXeK/Fvi7RAEs87y6fOtnbA5aATYTAQrYXCK7m3psxXWV30UCola5Viq+AEq4q7Kp7zyqcKVEaa5GuO7Dc1Ve7YuKZ21UtttYaLBU92hsWlWtlTbRroqNBB6q/0s3RwN5iht5AFunVzmn1dtUDhduDqywc6p6uurzos7bap6tSRhrTVotsdv+1jUdrowaCbTlZYViHgVqaALNQwoeMXmb92mwSzeezO8WGjxm+7J1N2uW61Vh11DnqBOuYdzrNu5/Yauut4G7rnRt3Q4YiO88Xr2oigznz93mmHlHZ+gzBYC378S7gRx04SedOhGSTAuguwQKLsPaNjYuvHfFPnP0nxDy5tyhkdl213FDDFvc9ydNG8mbe9s0IKUY4spn4Z3Fqo6tajvG7rz5hpo5YZ+PWHE1LMYmaUlJlYWd1RdyDWYMkst2f9T1llqfZI3VizTbZwPd9acnWnyxNfHI5zsHO7Thzixy++EevuF3ZLx/VAJFKe247hD/pwe2vwotrmqLxdj+xyejMqHmLau6rcebKna7F7jo4O08dDsb2H9FLUw7pq2tb31JLRiG47IVJVmYFJ92SxyJc0mDfdVB/3WMctOqXX7TBuXQw+K3IWdp/In+8SYev52AH7doByLsPX4X+NhFvu8JoZOE7XpX2keyQJO0jDXNzwpXTGZQc8nQdmh8HXVoXv3Gr9OD9M2HcR37zTHJDx89vfIe73Ndlm6h9uvHO32uHp2gY5ZIV3DHNt5dtE1BZfsLqpjLjkYScfrsEmhzRJvnX/fdPEG+oHd8R2A99PSO1TAZxc5qMOPHagtQTtx15INFTyLjLFo8wmbsc6HA7Rj5ayvcPnimOFG14h18d61WGI5F9oR6SeicVRnH6T9bR9a820Hv5P1jh/RoK1ZOhjDp0J9/fCeumpLtDnC6A+pMEXVT90sQ/sdItgm0jw9lbYM48c5Pzj+5/JwvLYv6OV5WDphZeZa2R3Knbyap/ftqcx2drZm7qY071OPWyRgD16+/vgudOGdKljE79aE7h6AbUe15torQsQG0cItsmGPo8BQHs9yKwQO0x3ArtUWjQStm7GYSGoLgi4XPaYu7VGtW1EKmG2sr3C0QS9XmA/ZXqhcN5Vs6LmvS2t064uYMGwKvFuOQCQvq9a2fy3G0CtBdO4va74KVrnum8mkvUZ2CwFK7cEnQOO3EhDFxjwwIYOMWGKLHi6GR4oJXD/Ou0ORMYm2o8RsPJaoxEue4d4TvWjmZfDxv8OcbgnAgU4J6I1YL9pO2mZdF7Xl9MBDNeHeb91KI7e0VAIBlZQdZgZAXYDwQNcx7d6bzKiFvu+qE3/b53Qx1xeMcrWKnQlx/b0IztCdX9rOj/XwK/2ROALLTju5JwAwgHUkYBuSinpJdH6YDGLyGzgiMGWiNQRr++E+AwBgAAJ7i3zjTdwNEqNgwQenDnd1NdHHnv4tpwhNcM4bPHZdx+90+UXV239LcbI8O+GcoXG7go8Z5m+ktiP1jvGyR+Lt7tJPiLcj1I0Pbgd+HAJY9xB4xbk0W9UH8ZvZ/PYOdRulrEy8p11vje3nV7ZzhAd8ZsfPjxLb4+572+EdPPRHLz6d64YCMP2ILT9np346Wlv2sjwH5CbLPndhOm7I55d5M4F5ru8Lz2iBzI6l1BmlnGp/d4o76NIT1HnJzR0DO0fq7CnAp4p24L0MimH3m12o4Q91U/u87zTxwzE7ofTH1nTD1s2B/HdWnE3UHzhzB+OM8eBzDdxD4u6wlsfW7Ijzj1M6pM+mVTdu5JyJsI+Hu2d1Okj0g6Yvkep7Ojm4/s8we3ul7MbhjzU6zMCX8HVzkS0+bjsUPEz9j1j/YfY9t2FPH0rvKtJxMRmzZoHz6+B4neTGsTXnr6SaYk8jP+HYzmT657k//uPPnd9d5h+BPYeFnuHqB+01XPoB1zWo1k+F9yO5PtnF71i3o+vcmeSn0bsp/mtOdPuI7L7+NxYbIfPnDL+9hTIfaBHdvj7Tj2S+GYmmmmRjY7rw1876c/ZevyJ3h+uoEcROmncX/t6u69Nd2N3Pd9DWepw+XrgzTJwj2N8jMnuJ7sZ4rwU6vdOeaPII0p/e+q/1fw71nmrzMsa/kEd7n7ve/Vs68TPuvD8uC4AtAs7nFLGc5S2w+G/+Oq+NFhC2XM/uSfRnSH3+zN6vvxfWn83ykzEeU+lm5nq3tL+t7w8hnx+MD7L3A+B/bn4Au5wr5PfUMleaFZX5M6Z+wdVerzN3sx9d8u+b2rHTXhz7Y4FPtGXvKHt77WdwXwW+zmEd539+o2dngv3Znn597ov8+Jv8xpD6cxi/VmufiXjD+A5S/bv+7+xwj3j9AuS/x7RP/bxR7QdUeTzN7ir3e+YU0+GfBDur3G9u+kP7vH7oRez+I7Qnf3bn+T0RQXhWs5LUanlj9+oOsOhfweukOzlUMkRFgGaSTrq473pIp+nXey8fovZrYd+BmdoJ3GGtNLG4AuaYi9b8suRnUyVAgD1DKiWJ4Po8HqliPtfyADaGwYnLIHphpBiXcB2qlKGzBitkbgxeuNoi8vWgfL6zry4ZhtQdgt0pAFuEWkVbTFtAetOJa0xzI2ocw8cv2GNRHjTEfUIW/fnGtuAjZl/Fqz+K5pbjr/FgM/h+Aq9ivJXQK8E/JTmWn1pAO8WDROOkG5d5xioztSmjcxprIGOJq2Bwd2ktnrZk/OZdtwGqiQmENOgQ4ZrvWo5A2QHpAp2Z1p2q7scfnLwcQXEOMpuIHvkX5JIdootZme1Pic4W+THvvIseHPtD7/2sPgX6e+lat77Vqpdsw7eOX1hB5oE7/qvg28mikZ5MIOsrH5IqzaiRhww0MJajIYU6H6hmWCfhLhJ+m2O26HgXAiX5YAZflIBx81gP4BU09QPVghgs3PX6Y09NAG7YYSGJKht+COBn694WftMA5+/8LiikBaAcYjhwWZF8D02CQksirAD/ggGcQ3dNdwoBhfjigxUhwG4j9+D8IP6rwTpKP6aI4/nuCOci/of5uIdZgv7T+y/v4Cr+n8Pv6b+NqPYiEMu/nEFLAxflIQOC3cKvpvouOC7y+A9gaCoqkheCry/+ogWn70weNikj1wbLgFwcudyMFbJsoVh3A2oBKoZDhW0XLmzTENKpEpkqnQbEpOkVKpNRn+XQakr0q6SpNTKSHgKpLSBtpCFjKB+HAHa0eQdtgHPGNnlmaUsDjrdo7qpgYB512/nl05DewviN5TuddnO58O6EpD6CODzn+6/i2wWh4LeSXsr5zmqvrI4D28jnUirOZ9gV5bOxPlcYz2Y1nPbHedxib5YB53ub7W+dPpKbghW1qJa/Gr5v8bvm4Yo8GJOyOG8G9oP5hsF6mrYrcFSavHkpb++ldr06A+InlogIOpwZN7ReN2piFbBqAYp6I+M5lI4o+kumj5LmqIVj56Wm/HIY4hhPt8F6+BnpR5HeRTkCGLBZ3mb44BUIXgH0+4oajq2+Lrs15fuV8i57y+BptiEDOGTkVq8CVAXx4BeAnuw5Eh/ToE7qhbmmD6Re5wdJ6UhXRliE0hdwQj7Tm3dlh7PBa3k3ichqob9rBaajtyFcmvIST6HepXoCEGOwIVT6ghYoY+5XekISGGM+kfH1os+QYgqYXW6yjPDJeTwSiF7uKQlNoR62qBiEWh1Ia4FrOaoVp6DGHjnsEfO/3ocFCedGjXbKOeYcE5kh0vmaFdeyoVaGK+EjomHIhqPkkZBmGni6H1C4njlJT0lssLx+o6urOy+oDsn6HG+woZV5BhKwbT6W+koeGHQh9nrc6s+RvrdZO+91lcGu+JAVaFU6BYbiG/e+Ib46/y/mlO7VhkgbWHXa9YS063BTYfE4qeW7vM7th63p2Fy62nh6FkeM8nyEG+AodR5Chp3pOGih04bgFWeYYYx4ymi4XKZW6ipsJwfmkYlVhfmWpnGKKhNDuOYqhevGm6aheIRaYB+hIZdYOuGQOmGA20eqaojcjtqC7gG8lKnqH60BpS556yaoSTGsuLgZyGqqgUhhGuNETrgMRRgpEz349eBoFlBoFP3rERceha7guJblXrwguJAkLwA+QNTjyBr/hoYuIqAIZzz6/Lkvo5cwfmvo28m+u0jb6fRE/KtutQewF9qmAYGEAReDjOEShOMBnQXWLygRztMqbm858CwguaGwmKEY2GSc0AsVRNWirBqAXExbmxGwGFbm2qVUmyhpGr4Q5OgaFE1QTgaGRmYa5ENhOYZp67BA3vx4HBVdiL76honke6zuZ4ahYXhr3glGe+N4TM4Mhqnju6vBKYe8EHuzBicGPqyDvp7ehuzr6GCh/oROGm+xzoBFSh4Asx4uChARuGze1HKhFZRxHgL4Hhz9keGMGmUYO7+GcHuFpSe+UZz6FRZUNaFTmi3i2GMh6WsyGpOmRtNFHuARps6ehWjp+GXuzUT+GtRf4e1Eh2cOkBHdR5jqsEPmkYfU59U9VCCxJ2uFPYCnWWHNNDPYbek/rQRiIQk4bRGPuWpxR+pleHbh3HkEqFhKUdqFpROEY7oBOw0aeFzREPnWEFR4MTmHFRSPrM5lRavul6sh1UUcb9GUMTp6nueTgd5NRZPmOHlebUSCFmR10V1H8WIEZZ5m61jnKFOejvgnzO+snjD43BVoXl52m/Xl46DenzqWHCeH5ALG+e7JjWF5RLpnL7IRS0bSG2hS3vaEPhgZk+FyGksX15vh9UR+GNRYOlTEtR44RdF0xHUeZE3RTMRc6ZmrMcz5LhMYUWalIMEZu4reY2qyGgxfboNFWh23n54wx+waLHpRf0eWEPw3sdLG5RC7gtFEBUThx7u+mMXE4lRzsQkblRizhl6hmxsjRCImUsTrF6eesb8E+hhsWdHGxcMqZFmxDMfOGWRVsZ8bShTPnb7sxdjpzFYCPbjzHEBfMTmGa+dFt94YR+4VhEEhkHkHFVBH3luZfeBPlL6yxzdpHFZu0cdeFxx2MaVH3hbYerEshlUcoLY+HIRuYDxIFu3HDxOvjyFHR+sbo75xK4RT4BhRzldEG6lcbdFzhoEQ9EdANznKbux1wZ7Gtx68bRaIW2viO7UBIsSWEBxeoT9htxr8VvEhOCHqjGnI6FpAryxjjorErRuFs2FIhQMUnEdhchn/GvyXwYdENRucZTFWi1MUfG0xxcafEZm58ZbFEOlzmBFsx0YQNpwhUEfGEqxKvsmFkWmpuiFIRECYNJvYrLGkCASLwF750AXkkWGC+PcYtK2RCEaUhA2wkX8pWuKnBQGewGWCCrfk82IxGrA1inDamctQI3BsJ4bLmQC23gAqq9cYibpaUY0NAVzFWqqruzfKUtnYoy2k3BHChAhVrdj8qxCtpY6JltksB0AOqjAQdKRqrlYmqy3E3AWqutmUBkQp7PFDD6hyK8IPke+K64LCKYPDDuoJTCmDQwTJDarNYkAGbb7cFtk3pTseBoc7jK9MWfH3mhCRY62eN8VGF2xA2khyKQR1snYfRqdoUQ2Rn3IZyrhXMeuEu+s3iwlsw8iBwnKgFAHGrD+GsKNHdxh4SHoCJpMh0ldJTpD0l8CGluApQ+/UbzEtJqiRzD4AUCd5GAu9ieqyFEoNqQTg2+MmHjBRdiXtiqJGsFy7I2Uws8D+qqWNRCawbIPxGbJMzKthfAxCionHG1ACqjrAo5GAAAAjMvioAEMLiRyuFfhLQLsppDOhpYVbu+QPwMjDIzo4NxG2pyMcjPEkBgcBLQCyA3uGOBwumsFeir0ywL4GoIEgcVAIIjnGeamxeCfknARFcbkntS1zkUkQRsYenbpWdkWiFphfzr+Z9RTSTMkyJrCXOLtJzgCMmjC1AL4j3Cukr74sOfSeNEDJtSd9waWzAX2wrJeiusmIIkKdXASBdyfdghgUSI7DT27wIbCXJHbnZgeA5sM9LvJy+N8m/JbqsgCGc4ehJTi0SLiUoKgpnGNgnJGwGckqwusAOCkAykA7iAGCgLVi3+4blRgZwZrvYEPJ/RmFa4orSQckI4uVmFGpEJkSfG4OpcVfF5J90T1oUpT0S+Y2BUEYMmTa/1jNpMpa4bnaxerKeMxEA7KewnZA1UFykoY/Kb0kV2/SftpJuUNPZHDJFafbAReQCVF4XB03tMnNxsyRymlpwDqanfcUqYqo6JsqbRFYuUoDImVuaIbskVq+yfrCHJ9lnMAWwfAEPD1gafoUQGpkEF8k/JsMH8n1AAKaVhNknEOIFVu9cOCnypx8NCmwpHGEilFoNyUOwWYtgZikAoLfm2q4pmiAogEpuCbGk5Ja1hKakpv6dHaPRD3uQlppNSXWl0p05gNbi87oNwG8BFcPwHd0M1Hfx2U98ZuGtObKa0nzJFUCzDnwcydDHCxqUf7HwxYqUJwox7SgC6x60qf1wjpmLhDbbJiqYAaDKZ2Ivq0IHQIvR2Iq+gOCMwkUTGpe0VEBzBUABOIhg3gEiGzAzAciRdCP0CABpCRUPAJQCcwsvLDDQwgINACQA8MLiTQA0MHYAAAUvDCOow8P6S+SC2EfCTpPCMEiLgnQE67YM9UGwk6qVENECpYpQNy5R4wzIVArUEzJBDw2SqRq4xA3GXyClAgSLlDIARqTul4uH0Ftz0hwEN8prpo1iOG70WXK9CDs56fpbGKuEJnD1g8EBZBcZs+tGCcBYmaZmfpMaSY6Jp61k1yHA3zh9DoilgiVl+i4EQWZnAjCZsFiiLSfRz+4MKlWk+OIqbWnQR6dKUi1qyGuLz1qREctyJQpnBskFBRMBOnQ2TEUJx7pxVICmHpzgIeQnpZ6f7iXp8SYinIpI5AqBbu5uDgjPp2KYXoaOusZdjL0QjG5lkIBoPTY82rXJBCH0tABOBcgjsuayFZWSSXE/pa9gm6Y+0hAiiXhcGi1l68bWTayTC/WVRIoahnLaiwZqmZEkSu0AIcAAAVBcCAgNgPDB5MkAKtkyJRiBIEwpcPIQG0AAgAfBIpAIOIAMA0KDmAaAGYIJgyRFAOry+xxYdhGcUdIJTnig8me0yBiehI5Llw6zAcCy+3aFCw7KXIKKALZQtotS7sGsoi5vAAiDREuRYMXBqC0y5H9nuefaWUhFWdyX6CSqstKAkHg9TLDifgRns5EVQukNTlgAvGceQrQYAKcS4A+ORoC0AE8QFSGWXYpQDwgfYc9IcUA6GdlyZcELrCug12VbLQgvgKyiv4HgHPT0KOCUVmxuZcWsGTcjTvjmE5K0HkAk5ZOf7bM51OYRxG5ZCJgyCqPCWNF0BXFHYGHWgiiNzbI6eYeCZ5/gIFi9otQCRCMqZAGa6uwJudqDUKkUY8yJJ3aBLgnMWuexjOR/5sHhWAckdTT1YFCdOQhAHrrL71wTLILQzU7tN6CN4cLBMDy2BtCDRuwKQOPnc87VJ3l4RcLNPqvgQgI0hPQRRH7AjcK7H9SFELNPfAsYG+cZrM48qEyyeoREVIRJ2e0ODQ3gZrinmNA4EIVA3sOBuPCvZmmhHnxpTUpSwx5BOcRTY2FOVTms5k4nTm8J/SUzkQFNOWzlPiHOW7loI3Of66B0UNIjSuIkSVYAAAmoclOAvCEREUYSwDBCwsXSA/CG58BSbnvKlYKFa4AFuY4zW5tuRrT1UVeDeAHAbPKgBi0ZbsuAy5fbvLmgYiufJ7B4LuSrwoF88PSpBQCANwCWMc1HuCfA++UenLZ6EL7n7mS0IflB5dEIFALBJsV+nFZFkVHkEBuab1Kx5oBb9rgFLOQgV1wJeSWhPg/gNnnCpueUH4F5mkkXk258BXYWCqFeZMF7Kt9H7BmuEUQ8z4ETzOGmuIHeRgXyA3eaOa95/eYoH5IQ+fnAj5zFAHQYWD8BPmo8adN1aDcntDcY5UoUCkiL5WoMvmtUa+RAra5sVE8BIYA4bHrpBi8tPrV54LEflQs64tNBn5PYBfkYFUHM5Hy59+bHiuglbC64Eqr+fAXpQaaEEhWyb8L/mVGF3ozGAF2ZI1l6mx/gDR25xZMsUUANFNAU55Oci4VlJheVRCuUM+FejwZ+qH6ifqiap1gWoPWKzab4ejJsj4oVGBsDuZE1h4BrpRxZ0jOB7iJ4jmsfKf4h5W8IH3nXMA+fkhL54gCvkXkGgBeRqJ0yITgdM7xbIGxAqhEh4Kk1gXdT0Y5QATjWWD8BCVQlqWKZZf4gqlRC+ghqBTRkIigEcDbIxfm8q601QExAWQMmTq5csFoGa6YlW1PKgaAuZDmD5AZrsSW0Q9fk9DT6kNPDYhg8JQAHfgdYFcwZIJpHzSl5DEVrQZguhUXHh5FnsQkXxwQhIXlwdhdQAiZy+ABGFElxd1gFAvWAWz1wkESGCuUrJSqIAo68BTRs04tPXChJXCSxweINxVkXFJOaQ0nYC+OUkzng2NhVClcFUAgghUmxQRmwxRGYzn55ZScAFcspSmLlCZxJBikPwyaBRicq75DuSgQaLNPgXFxwGgDQAB6TQCIkw2KQVUY0gf6lZBSmOzirMZXNGCJlwuVOagpDEMuAVsSGF3TA2G+UOxTo2GKZQJ41uveCuqiJQvgHWikM0FIAKZdO67oRVGUg1+dfvLiQAu+bRAcRkQA2WdltZfmXnCDcHWWBUu/qIH4EqLPqrXA+qMggiICpNiJjYluCYJ/+Q+BsR5B5ZciirYw+IIGn6CLJ2xtlkRfnKKlOajMVghkebdHrBLnvYy1gJAM0CGQdHFkB/03KI4XVpIqXSAuqGoBkjYMD8EIDPAQgGfBkAEgG6WdsTBMuRIVGAChWvwc1GNTBUp8LmT10kANpnIVaAAABMURGVhjBsuDhjSYVWDNQnlvBWticMqPH+SdFaRc4F6Ue8JxWa5b5ecVrYyJBoCC0ozEiwCI5aFkj9M9WHGE5oCFfXA4VeFWhUHwmOcRWCJdADlTq58BPOzFU5FbhVUVFRMmDQgGAPvk46dZQJiBQpRPQXjIC9J+RvY/FakVgJQ2JWzlwu6H2hjMkldfDwgWEBAS1I48CJnIAkOGkClq+lShXUVaFQoWmVEMo4yHkmwNHQ6o2gnkDxYdiJtB0IdFYagfldHsvazFP5UzGAg8lTqWYVPuP4zTQSlahVzUFDJ3DUMBaPozlV+FRICEV6tGgBVV0GvEzaQFUIBXAVu8KBWqE07NiSeMBVQFVAZpVXOUUVDVapWFpX8K1UZu4PABU4G3VXyC9V4FdsT/lHVV1XPsPpSRQhlo7oRlfxfHOkA8UaDIcSzVSxPNU0Ai1Z8BbVX8D6TO0PBV6gAsuEHtmKV41WhWBYw1Qgbt89VSpXCMxFdYi2IIyJxlOIr9Pf4C4TeN9VzUkiMLh1lhwJrwaAzxM7QsVswHumIVr1QRW/V/tqSQw1cNT0TSVKBkoFr0a5cJzw4s9AWZcA3aEkALl2kGXgWVrdExUpILFTRFwGGsG5Sgp4iJIgaANqNEByIv3C3DtZLHEHjlg96FZjtlMJcJkpAMkbIA9l4KPpiiB7NVIgbI0QDIwiB3CHIxe+x3Apb+YgtfFCW8zcJFwZANJfrVeEekL3pkEL1HrTRgxRj8pk0PVEeXdwvHLIHuAToG7kdSCyKxBf5aHOwCJC3iZaoDMLDKBAJVBjARhiB8kefrtA2hAYypQmIFlVLBU4ebFzFEpoCA+MzVJgzalKQGLSjVg1LzUwqhwCrXtAQjB1BmFewJFjV0Y1QZUNVVDEmUBM8dNnU2sudbLXcIBdaCBF1mACXW96ENY1UY1NjEiTYU61QtWbVzjP7YDVUAEnUtkWABgxYMOpRnVV1GmPHTkBdAPXUbYjdeXjN1IBcXUfw+tR3WV1tVbPWEM89bQCL1f/k3VfALdX5Ab1J2B3VNVbBGgDd1MXudVAVA9ckxD1q1SPXJ1E9WnU4MrtJnVz10iuHWH1xiMfWn1bdZvVo1EgNvX4Mg1PPWIA/9cvWsoQDefV2cl9V3WxMdjH3UXVj9YQynwHjK/Vj1KdZPXp1uDDPUGMNNiwYwN+dSvUn1a9a3UINEwFvXVVRDZA3YaSAGQ3awcDVQ1n1pdUg1EVN9Sg291uFJ1X91IFYPXYNkAKPXEk79QhXT1O9cQ1JRBvCw2AN7DcA0X1oDeA3V1e9TVFyNedaw2F1ijTQ1l1ylejXcNt9WTL31l1RgDXVWDcPWiNb9VqWSNhDdI2MN79lGzyNFDfA2cNKjfQ0ONP9U43MNWjQo1GI69e43l1P1UY28N7Vfw0bVQjU/UiNYjccW2NU9fY0QN3jcNG18LjWw0BN1DUE0GNYDZ41JN6jUTHEeaTTo0ZNHDe3WgNV9fBQ8NbVXNVoND9VE2YNz9XDw4N4jfE0ENX9Qw3JNlYetpFNq9SU1KNiDR42QANVXk2fwO4e47QNfja426NWTRVWd1oTdU1nVtTWY0WNjTYNU2NqdXY3tNXjfk1IxPHj02UNfTXo10NQzR007Nu0fmHjN+zW41lNwTYY3NVxjag0RNgjT1XCNVjbE3oMrTZ/WSQ39Wc1s6f2ns2TN6Te/ClNIDbc05NJzds2jNXYf80kxVzdM03N2TRU1fwDzXw25wAjeg31N21W83rN+DV83VAPzVC0GhMQRM0N15DUC2BNCLbM2qNu9YS3tOZAmo5wthzTM0TVyDQs3MopjRg1YtL9dY24NEjQk1bNIzSQ0+6jLcC39NtDYM3DNajbS2vOxciK0UtoLYi2stp1ey1LNnLTdXYtvLZ82dM+Lac3St9ZiPKLMycgFYJsJLUvVktxTaK1HNErbq1CtvPoa0GRJrXK2ZNlLSy3zNyrX1ActmLeq3ct7zdGB8tbTd802tWsfTJOtILco1gt1LcQ3BtP0qG1it+jbM1Itp8Ci3hNaLZE0vN0TRq0tNGzfy2BtkLTymDUbJrG1WtEbbk1St+bWF6CxRbcy0hN9zWE01NTzRi3ptDTTE04tH9dq2PYNrUgnaIVbS61oVkbYNRdt1PD20KtCbUq091KbXhRptS1a80+trbZs25tgrYO2YQw7eG3ZN/bfHRLt8ACu0DNYLYm1VN7rVtWTtzzdO0Zts7Zq3ZtAbTq15tPNbUR81NrB4gNBnINu3itJbRC2CtN7Y2B3tpcA+24AoXAaDPt8ba621tbLR62qtXrZY1ntWbbi3ttf3Ne3MGxrUFymtR9VM1MtvbXNTrtvzYhIIddyEh0ANKHZa3VtdzdfXJt9bam3HtV1TO1NNPLVB1ttUje+30cNeMLjCtgLRa3ytq7VS2ltNLSQ2MdMrQ2YAdXDcB0HtnrU21ctVHb63j1WrXR1ltciDx36tz8gzb/tLHb00EdaHeC2StXHTJ0YA2bHJ3JyCnUQD8d5TWO131YHSJ3etYnXO05tV7fR168snb2YOELUPp1KdBzSp0jtFdZx0yNDHVp1MddrfsAsAT7U53XNrnTW3EddbYs0NtdTaZ0Qd5nee3QdUnVx2FyLQbvAGdr7ep3ENCXRFxhMAXfC1BdRHZU0kdYXWR2NtJ7c22ZtcTRe14tHbXm0+4h5H5yJdfIMl1rt7nQO10ANXQxzPsDXaO1ut47aR1HtRXRR2nt0XTR3ztVndJ3MGdXa6AddbnW+2jdBTeN1+Qk3cF15doXSq3hdyzZR1rVq3Wq2RomYtwDiw1/rqoQVWxU4U7FhNbRBx0hDHOiXwe3ZLBul91awhaGZIBcZWMzlOgHMEciEDmlwf5JwG/EhRIkEMFUyF664gWdPjJIIPNJ4Kuh57HjUf+IJeiX3EqUmywYVmIjIGsVDMMoC+AkRDFSWlB5UCkDoenbYqiQTbth25w6bN4LWB2bM24NmoFHN18ByegwBABOYOSDOAPIOqAf0ToO+R94Gykqzw9mfmbSZIFss9JaRfAMLz+5FDR0x3xkbsfFvZRKTVmAcf5cymgdm3V63twJcv8E7VH8XtUM59ATxT+gmIDPqHgmQALlUQz3ReRSJrpOZy9osQHuCB5vGK5DRAZfq3BkC7cDpR6U6ELX5TozpDpT3w0QIPUe9FwKxlzGd3VLw96BbodYs46WMim/4c1C8iugUzGYrbZACFDbh9u6Nj1dMOQLrnykg+hMFzGcpoOQlS5cDVT642yGywm0+lIWU0Ar4NKWyRQJbczs9ABtpV6pKvAsB1EqpAX3L4Tfdz36BvPXVTZI0PWHQewJvZ+TvdmOe1nF9DlTLHysC9Mb3pYcuMHUS4dvZiIO9dwk72BULvdIBu9+qB72dwNvQ3De98dMv2G0ageNgSgttdFD21ikI7WndoQDbzn5VmUVWHlnAUnTawiRbqlRMh/Sb1Wp8HIahZ9qkoH2ZZFwkkWvQEfVyArseQGwCxUxoheQT00aVL3fp+CWSl/pOQBqU1Z8gGZFmlKUNjZHAU/YkS/EiABcBCVz5TLhsMMdLRDHV8qIw2tcckFjm54k+Pdji9UJkDCnQFRJNDfRN0BBZLQ7AFwAvQkFu3R7QKgGoB/Qx0IDAGAjA0tDqATQnzXXM2UE6U5g++iNDCDY0JAC/gJAPiD+glFb+CZAIwPiBDAaAAMD4gmQPiBvJRIL+C/gAAMyGDlFSMBvJcalSC0AmQLQADADAG8m2DDgwDCGAjAwMADATg8SCUV+gyMBUgQwAICEgmQG8lDAv4P6D4gAgAMBtgDACYO/g+IAMC0Af4G8kDArTAIBDA00CdAKDaQwMAkAVILEO0A+IBYOZA6g0MADAPg7YMxDAgJYMmDhQyMACAbySQC/gIwDaAqAv4NRWZDEAJABlDAwL+DWghILQCUVOg5kD9AAgCYOEgFg7oO/gQwPYN6DVIJRUmD6QwIC/gbyVSCqAAwPWAdDUAG0MkASw8sNpDaAPiD4gtACYO0A/Q0UNxqtAFSCZAVIDsNzDQwCQAjAmQAMAkggQ+0NCDIg3+B1DVICEPEgHg6IB6DhIAwBjAJg/0ClDlFVYM5DJg5RU7DOwyYNzDpIJsOKDAwISDEg+IMsOZADAISABDAwyQBDAeg7+CEgbyVRXgjwIycMxD6gwwDDDVIBsPvDCg/+DaDwwwwACADAIcNUgbI3MPPDhIL+A5DDg+sMAjpw08OODJAISDIjIwK4PyDnQ9oMCARwwkAmDmI40Nf4wwAMOEgJAG8kNDWg1MMmDmQJRUTDlFQMALDCwwIDijjA8YOGDAEG8kMAQwABDSjAgE8PNDTgyMA6jCQDEMNDKo8MOEgkQ3UNpDxowoOJDAQyoAmDIIw0MmDuQ7QCqDIwMcPxDlg60wLDWg2gBUgKI0MCUVYY2KOIjmg7+CXDVw0MBmDcwyQAJDQwKMCqAbI3YNND+ICMBf4hIPiBoA6gyYN/gbIz6OdDZg0CO0AbyScOUVjg0oOUV+IDYOPDjowIDzD8w3EOrDDgxGNsjbyTqM0jbgwoMMA6w/oMaDcQ04PGDoQ46NxDlFcyNpDJgwMDfDpg/MN+DFo34NxqDY1ADSj1w0MAzjeo7aNoA5Y5RXfDr4ISAmDaANWNVjyY5oNUgpgw0OZAII3MNHjXQ9UOBjTI28mVj+Y5RWlD5gwBhqDNg8yNoAJw7OMejaox4CIjaAI0PxjKwwWNHDW42qOhDKo3MMqDxg2yMhjkKMEMpDYY8MMITtI50Nf4wQ7iPdjvQ8iP9A/4HoNoAHo+sNyjywyyMjAQwOMMhjZQxmO/gP4yiOUjaI0yOqjKw7iORDUwz4Plj0E6IBpDvQ0sBpDMQzsN/gP4/6AfjsQzsNDATg5iObjPg9MNljcanYNXDmQEmNdj6w3mPtj+ID+O9DgEwmMOD9gyQCbjGIwSMzjRwzUNxqKo6UOZj9g5oOa4Cwz+PdjPQ5WNUgYwKiBxqUI3oMRDy4/UPVjxIOkPQjmk/qOYjVgz+MjAII6+BRDeQ6EM5DOg0ENcjIQzYPQjTg0SOxj4w6cOPjmk1ZMjAV4xGMHDogKUN1DBw4UNojDQxxMzjhE+MODDkKCcM3DJg+KMSjUAKINaILkBINZQOUPhX0A+gEAA== -->

<!-- internal state end -->
<!-- finishing_touch_checkbox_start -->

<details open="true">
<summary>✨ Finishing Touches</summary>

- [ ] <!-- {"checkboxId": "7962f53c-55bc-4827-bfbf-6a18da830691"} --> 📝 Generate Docstrings

</details>

<!-- finishing_touch_checkbox_end -->
<!-- tips_start -->

---

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

<details>
<summary>❤️ Share</summary>

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)
- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)
- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)
- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

</details>

<details>
<summary>🪧 Tips</summary>

### Chat

There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=Flowminder/FlowKit&utm_content=7067):

- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
  - `I pushed a fix in commit <commit_id>, please review it.`
  - `Explain this complex logic.`
  - `Open a follow-up GitHub issue for this discussion.`
- Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples:
  - `@coderabbitai explain this code block.`
  -	`@coderabbitai modularize this function.`
- PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
  - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.`
  - `@coderabbitai read src/utils.ts and explain its main purpose.`
  - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.`
  - `@coderabbitai help me debug CodeRabbit configuration file.`

### Support

Need help? Create a ticket on our [support page](https://www.coderabbit.ai/contact-us/support) for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

### CodeRabbit Commands (Invoked using PR comments)

- `@coderabbitai pause` to pause the reviews on a PR.
- `@coderabbitai resume` to resume the paused reviews.
- `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
- `@coderabbitai full review` to do a full review from scratch and review all the files again.
- `@coderabbitai summary` to regenerate the summary of the PR.
- `@coderabbitai generate docstrings` to [generate docstrings](https://docs.coderabbit.ai/finishing-touches/docstrings) for this PR.
- `@coderabbitai generate sequence diagram` to generate a sequence diagram of the changes in this PR.
- `@coderabbitai resolve` resolve all the CodeRabbit review comments.
- `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository.
- `@coderabbitai help` to get help.

### Other keywords and placeholders

- Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed.
- Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description.
- Add `@coderabbitai` anywhere in the PR title to generate the title automatically.

### CodeRabbit Configuration File (`.coderabbit.yaml`)

- You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository.
- Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json`

### Documentation and Community

- Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit.
- Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback.
- Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

</details>

<!-- tips_end -->

@greenape greenape force-pushed the add-garbage-cell-info-check branch from d4a4b3c to 4e5ca24 Compare June 20, 2025 11:36
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🔭 Outside diff range comments (11)
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imeis.sql (1)

1-7: Use a symbolic link instead of duplicating SQL
This file duplicates the CDR-level count_null_imeis.sql logic rather than linking to it. Per the PR objectives, it should be a Git symlink to ../cdr/count_null_imeis.sql to keep the checks DRY and simplify maintenance.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imsis.sql (1)

1-7: Remove duplicated SQL and rely solely on the delegated script.
This file currently contains both the include reference and inline SQL that duplicates ../cdr/count_null_imsis.sql. For consistency, strip out lines 2–7 so the stub only references the shared script, and add the standard licence header at the top.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imei.sql (1)

1-19: Remove inline logic and defer entirely to the shared script.
The wrapper duplicates the core query in lines 2–19. Replace this stub with only the reference to ../cdr/max_msisdns_per_imei.sql and prepend the project’s licence header.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_location_ids.sql (1)

1-5: Consolidate to a single include stub.
Inline SQL in lines 2–5 duplicates ../cdr/count_location_ids.sql and risks divergence. Remove those lines so the file only contains the include, and add the FlowKit licence header.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imei.sql (1)

2-2: Missing SELECT before aggregation
The line max(msisdn_count) needs a leading SELECT. Without it, the SQL will error.

Suggested diff:

-    max(msisdn_count)
+SELECT max(msisdn_count)
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_location_ids.sql (2)

2-2: Missing SELECT before count(*)
The aggregation count(*) must be prefixed by SELECT. Otherwise, the SQL will fail.

Suggested diff:

-    count(*)
+SELECT count(*)

12-12: Use proper NOT NULL syntax
Replace cl.geom_point NOTNULL with cl.geom_point IS NOT NULL to adhere to SQL standards.

Suggested diff:

-        AND cl.geom_point NOTNULL
+        AND cl.geom_point IS NOT NULL
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imsis.sql (1)

2-2: Missing SELECT before count(*)
The line count(*) needs to be preceded by SELECT for the query to execute.

Suggested diff:

-    count(*)
+SELECT count(*)
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_events.sql (1)

1-9: Insert project licence header
All new SQL files must include the organisation’s licence header at the top to ensure proper attribution and compliance.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_location_ids.sql (1)

1-14: Inconsistent wrapper: duplicate query logic

This file both references the shared script (../cdr/count_locatable_location_ids.sql) and re-implements its SQL inline. To keep the code DRY and maintain consistency with other CDR-type directories, it should be a pure wrapper (or symlink) that includes only the relative path to the cdr script.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imei.sql (1)

1-19: Inconsistent wrapper: duplicate query logic

This wrapper references ../cdr/max_msisdns_per_imei.sql but also contains the full query inline. Wrapper files should exclusively include (or symlink to) the shared CDR script to avoid duplication and simplify maintenance.

♻️ Duplicate comments (13)
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imsis.sql (1)

1-7: Duplicate check: wrap CDR logic via symlink
As with the Topups version, this SMS-specific check should be a symlink to ../cdr/count_null_imsis.sql instead of embedding or referencing the SQL directly.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imeis.sql (1)

1-11: Duplicate check: convert to symlink
The IMEI-count logic is currently inlined here. It ought to be replaced by a symlink pointing to ../cdr/count_imeis.sql to centralise the query and avoid fragmentation.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imsis.sql (1)

1-7: Duplicate check: wrap CDR logic via symlink
Consistent with other CDR-type checks, this file should link to ../cdr/count_null_imsis.sql rather than maintaining its own copy of the SQL.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imeis.sql (1)

1-11: Duplicate check: convert to symlink
The MDS-specific IMEI-count is implemented inline here. Replace this with a Git symlink to ../cdr/count_imeis.sql to unify the logic.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_events.sql (2)

1-9: Duplicate: Explicitly include or symlink CDR-level SQL
Same as in the SMS version—please confirm the include mechanism or switch to a proper symlink.


1-9: Duplicate: Insert project licence header
New SQL files require the standard licence header.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_location_ids.sql (2)

1-14: Duplicate: Explicitly include or symlink CDR-level SQL
Ensure this wrapper either uses a proper include directive or is a true symlink to avoid code duplication.


1-14: Duplicate: Insert project licence header
Please add the project’s licence header to this new file.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_events.sql (2)

1-9: Duplicate: Explicitly include or symlink CDR-level SQL
As above, confirm the include strategy or convert to a real filesystem symlink.


1-9: Duplicate: Insert project licence header
Add the standard licence header to comply with project guidelines.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_location_ids.sql (2)

1-7: Duplicate: Explicitly include or symlink CDR-level SQL
Verify that the relative path is correctly included at execution time or replace with a proper symlink.


1-7: Duplicate: Insert project licence header
Don't forget to prepend the organisation’s licence header to this newly added SQL file.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imei.sql (1)

1-19: Duplicate of calls wrapper issue

This SMS wrapper exhibits the same pattern: it includes the shared CDR script and duplicates its logic inline. Please convert to a pure wrapper or symlink to avoid code duplication.

🧹 Nitpick comments (31)
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_location_ids.sql (1)

1-5: Add standard license header.
New QA SQL files should include the project’s license header and a brief file-level comment for context, consistent with existing checks.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imeis.sql (1)

1-11: Add standard license header.
Include the FlowKit license header and a file-level comment to maintain consistency across QA checks.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/latest_timestamp.sql (1)

1-1: Add standard license header.
New wrapper files must include the project’s license header and a short description of the check.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_location_ids.sql (1)

1-7: Add standard license header.
Please add the FlowKit project license header and a brief file description.

flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lon.sql (1)

1-1: Add standard license header.
Include the project’s license header and a short description to align with existing QA checks.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_location_ids.sql (1)

1-1: Add a descriptive header and licence boilerplate.

Wrapper files should start with a brief comment describing the QA check (e.g. “-- QA check: count rows with null location_id for SMS events”) and include the project’s standard licence header.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imsi.sql (1)

1-1: Add a descriptive header and licence boilerplate.

Please prepend a short comment summarising the purpose of this check (e.g. “-- QA check: maximum distinct MSISDNs per IMSI for SMS staging”) along with the standard project licence header.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imeis.sql (1)

1-1: Add a descriptive header and licence boilerplate.

As with other QA‐wrapper files, add a one-line comment describing the check (e.g. “-- QA check: count rows with null IMEI for calls”) and include the project’s licence header at the top.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_events.sql (2)

1-1: Add a descriptive header and licence boilerplate.

Include a high-level comment like “-- QA check: count locatable call events joined to active cells” and the standard licence header before referencing the shared script.


9-9: Use standard IS NOT NULL syntax for clarity.

PostgreSQL idiom prefers cl.geom_point IS NOT NULL over NOTNULL. This improves readability and consistency.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_location_ids.sql (1)

1-1: Add a descriptive header and licence boilerplate.

Please add a comment indicating “-- QA check: distinct location_id count for SMS staging” plus the required project licence header at the top of this wrapper.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/latest_timestamp.sql (1)

1-1: Verify reference and add license header. Ensure that the referenced file flowetl/flowetl/flowetl/qa_checks/qa_checks/cdr/latest_timestamp.sql exists at the correct path. Also include the project’s license header at the top of this new SQL file to maintain consistency with existing QA checks.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imeis.sql (1)

1-1: Verify reference and add license header. Confirm that flowetl/flowetl/flowetl/qa_checks/qa_checks/cdr/count_null_imeis.sql is present and correctly referenced. Add the standard license header to this new file for consistency.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/earliest_timestamp.sql (1)

1-1: Verify reference and add license header. Ensure that flowetl/flowetl/flowetl/qa_checks/qa_checks/cdr/earliest_timestamp.sql exists and is referenced correctly. Add the project license header to this new file to align with other QA scripts.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/earliest_timestamp.sql (1)

1-1: Verify reference and add license header. Confirm the existence of flowetl/flowetl/flowetl/qa_checks/qa_checks/cdr/earliest_timestamp.sql and correct the include path if needed. Include the project’s license header at the top for consistency.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imsi.sql (1)

1-1: Verify reference and add license header. Check that flowetl/flowetl/flowetl/qa_checks/qa_checks/cdr/max_msisdns_per_imsi.sql exists at the expected location. Add the standard license header to this new file to match other QA checks.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_location_ids.sql (1)

13-14: Use a descriptive alias

The subquery alias _ is non-descriptive. Please rename it to something like locatable_ids to improve readability and self-documentation.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicates.sql (1)

1-17: Clarify duplicate-count query naming

This query sums all extra duplicate records (sum(n_dupes)). With count_duplicated.sql counting duplicate groups, file names can be easily confused. Consider renaming this to something like sum_duplicate_records.sql for clearer intent.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicated.sql (1)

1-17: Clarify duplicate-group count naming

This script counts duplicate groups rather than individual extra records. To distinguish it from count_duplicates.sql, consider renaming it to count_duplicate_groups.sql.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_duplicated.sql (1)

1-17: Consider simplifying subquery layers
You can remove the extra nesting by directly grouping and then counting, for example:

WITH dupes AS (
  SELECT msisdn, datetime, imsi, imei, tac, location_id,
         msisdn_counterpart, outgoing, duration, network,
         operator_code, country_code
  FROM {{ final_table }}
  GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12
  HAVING COUNT(*) > 1
)
SELECT COUNT(*) FROM dupes;

This flattens the query and may improve readability.

CHANGELOG.md (1)

10-16: Fix markdown list indentation
Nested bullet items under the “Added” section use four spaces; Keep a Changelog style expects two. Please adjust indentation to two spaces for consistency.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_msisdns.sql (2)

1-10: Consider using COUNT(DISTINCT) for clarity
You could streamline the logic to a single pass over the data:

SELECT COUNT(DISTINCT msisdn)
FROM (
  SELECT msisdn FROM {{ final_table }}
  {% if cdr_type == "calls" or cdr_type == "sms" %}
  UNION ALL
  SELECT msisdn_counterpart FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL
  {% endif %}
) AS all_msisdns;

This reduces nesting and may perform better on large tables.


1-10: Use descriptive subquery aliases
Aliasing both inner and outer queries as _ can be confusing. Consider clearer names like raw_msisdns and unique_msisdns.

flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/new_cell_ids.sql (1)

1-5: Optional: Switch to NOT EXISTS for new IDs
For potentially better performance and semantic clarity, try:

SELECT COUNT(*) 
FROM {{ extract_table }} e
WHERE NOT EXISTS (
  SELECT 1 FROM infrastructure.cells c 
  WHERE c.id = e.id AND c.version IS NOT NULL
);
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_msisdns.sql (2)

1-10: Simplify MSISDN count with COUNT(DISTINCT)
Rather than nested DISTINCTs, consider:

SELECT COUNT(DISTINCT msisdn)
FROM (
  SELECT msisdn FROM {{ final_table }}
  {% if cdr_type in ["calls", "sms"] %}
  UNION ALL
  SELECT msisdn_counterpart FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL
  {% endif %}
) AS all_msisdns;

This is more concise and may yield performance gains.


5-8: Refine Jinja conditional syntax
For readability, replace cdr_type == "calls" or cdr_type == "sms" with cdr_type in ["calls", "sms"].

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_msisdns.sql (1)

1-10: Add licence header, documentation and improve aliases.
This new SQL file lacks the project’s standard licence header and a descriptive comment at the top. Also, using _ for both inner and outer subquery aliases reduces readability. Consider renaming aliases (e.g. inner_msisdn, distinct_msisdn) and adding a brief explanation of the check.

Would you like me to draft the header and summary?

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_duplicates.sql (2)

11-32: Remove redundant CDR‐type branches.
Since this file is in the sms directory, cdr_type will always be 'sms'. The calls, mds and topups branches are unreachable and add noise. Simplify the GROUP BY to only the SMS‐relevant columns:

GROUP BY
    msisdn, datetime, imsi, imei, tac, location_id,
    msisdn_counterpart, outgoing, network,
    operator_code, country_code

1-4: Add licence header and file description.
Please include the project’s licence header and a top‐of‐file comment summarising the purpose of this duplicate‐count check.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicated.sql (1)

1-3: Add licence header and comment.
This file also needs the standard licence header and a brief description at the top explaining that it counts the number of top-ups groups with duplicates.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicates.sql (1)

1-35: Simplify for topups only and add header.
The cdr_type conditionals for other types are redundant in this topups-specific file. Restrict the GROUP BY to the topups fields only. Additionally, include the project licence header and a brief summary at the top.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fd7515 and 4e5ca24.

📒 Files selected for processing (65)
  • CHANGELOG.md (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/count_duped_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat_or_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/new_cell_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/count_staging_rows.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/util.py (0 hunks)
💤 Files with no reviewable changes (1)
  • flowetl/flowetl/flowetl/util.py
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md

11-11: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


12-12: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (27)
flowetl/flowetl/flowetl/qa_checks/qa_checks/count_staging_rows.sql (1)

1-1: Simple row count QA check is correctly implemented
This basic SELECT COUNT(*) check is fine and consistent with existing QA patterns.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/latest_timestamp.sql (1)

1-1: Approve wrapper for shared timestamp QA logic
This lightweight include of ../cdr/latest_timestamp.sql aligns with the modular QA checks pattern.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/latest_timestamp.sql (1)

1-1: Approve wrapper for shared timestamp QA logic
Consistent reference to the core timestamp check under cdr makes maintenance easier.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_location_ids.sql (1)

1-1: Verify include syntax and subquery pattern.
The line ../cdr/count_location_ids.sql presumes the QA runner will inline this file. Confirm that this matches the established include mechanism (e.g. Jinja {% include %}) and that the inner subquery is correctly consumed by the generic count logic.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imeis.sql (1)

1-11: Ensure consistent wrapper structure.
Wrapper files normally define only the subquery (e.g. SELECT imei … GROUP BY imei), relying on the generic CDR macro to supply SELECT count(*) FROM. Here, the count(*) and FROM clauses are in the wrapper itself, which may lead to double counting. Please confirm the intended include behaviour and remove redundant clauses if necessary.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/latest_timestamp.sql (1)

1-1: Confirm include mechanism.
This file contains only ../cdr/latest_timestamp.sql. Ensure that this matches the loader’s include syntax and that the generic timestamp check is applied correctly.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_location_ids.sql (1)

1-7: Clarify wrapper responsibilities.
This wrapper explicitly includes count(*) FROM {{ final_table }} where other wrappers only provide the subquery. Confirm whether the generic null-check macro already wraps with count(*), and remove the redundant clauses if so.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imsis.sql (1)

1-1: Include path reference is correct
The relative include to ../cdr/count_imsis.sql aligns with the existing QA-check structure and will resolve properly during execution.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imeis.sql (1)

1-1: Include directive follows established pattern
Referencing ../cdr/count_null_imeis.sql here is consistent with other CDR-type QA checks and maintains modularity.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imsi.sql (1)

1-1: Consistent inclusion of generic SQL logic
The include of ../cdr/max_msisdns_per_imsi.sql matches the pattern used across CDR types, ensuring reuse and consistency.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imsis.sql (1)

1-1: Modular reference is correct
Using ../cdr/count_imsis.sql here mirrors the approach in other subdirectories and promotes shared logic.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imsi.sql (1)

1-1: File inclusion aligns with QA framework conventions
The path ../cdr/max_msisdns_per_imsi.sql is accurate and consistent with the overall QA-check design.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imeis.sql (1)

2-7: Remove duplicate SQL logic. This wrapper file should only include the single-line reference to the shared CDR QA query. The inline query block (lines 2–7) duplicates the logic in cdr/count_null_imeis.sql and should be removed.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imsi.sql (1)

2-19: Remove duplicate SQL logic. According to the PR’s design, this wrapper should only contain the reference to the shared CDR query. The inline subquery block (lines 2–19) duplicates the logic in cdr/max_msisdns_per_imsi.sql and must be removed.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_location_ids.sql (1)

1-5: Inconsistent AI summary vs code: query logic present
The AI summary states this file only references ../cdr/count_location_ids.sql, but it includes an additional subquery with SELECT DISTINCT location_id. Please reconcile the summary with the actual content or adjust the file to remove unintended logic.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imei.sql (1)

1-1: Inconsistent AI summary vs code: unexpected SQL block
The AI summary claims this file only references ../cdr/max_msisdns_per_imei.sql, yet there’s an inline aggregation query below. Please update the summary or remove unintended code.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_location_ids.sql (1)

1-1: Inconsistent AI summary vs code: file contains more than an include
The summary indicates this file only references ../cdr/count_locatable_location_ids.sql, but it defines the query logic as well. Please align the summary with the file content or remove unintended logic.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imsis.sql (1)

1-1: Inconsistent AI summary vs code: extra query content
The AI summary suggests only a reference to ../cdr/count_imsis.sql, but this file also contains inline SQL. Please update the summary or trim the file to the intended include.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_location_ids.sql (1)

1-14: Consistent reuse of shared CDR logic
This thin wrapper correctly delegates the locatable location ID count to the shared ../cdr/count_locatable_location_ids.sql script, promoting modularity and avoiding duplication.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imsis.sql (1)

1-11: Thin wrapper delegating IMSI count to shared script
The wrapper correctly references ../cdr/count_imsis.sql and scopes it to the SMS final table, maintaining consistency with the project’s modular QA design.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_location_ids.sql (1)

1-7: Wrapper correctly targets null location IDs
This new file cleanly delegates null location ID counting to the shared ../cdr/count_null_location_ids.sql script, fitting the QA framework’s DRY approach.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imsis.sql (1)

1-7: Null IMSI count wrapper is consistent
Good reuse of ../cdr/count_null_imsis.sql, applying it to the calls CDR table and aligning with the existing QA check pattern.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imeis.sql (1)

1-11: IMEI count wrapper follows established pattern
This wrapper cleanly delegates the IMEI count to the central ../cdr/count_imeis.sql script, preserving modularity and consistency across QA checks.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_events.sql (1)

1-9: Explicitly include or symlink CDR-level SQL
The file begins with a bare relative path but no explicit include directive (e.g. \i, -- include) or actual filesystem symlink. Please verify that this is intended or adjust to use a proper include mechanism so the shared CDR logic is correctly pulled in at runtime.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_duplicated.sql (1)

1-17: Verify duplication semantics
The query currently returns the number of groups with more than one occurrence. If the QA requirement is to count total duplicated rows (i.e. all extra occurrences beyond the first), please confirm whether this aligns with expectations.

CHANGELOG.md (1)

19-19: Verify CDR task id naming consistency
Ensure all generic CDR checks have been updated to include the <cdr_type>. prefix (e.g. calls.count_imeis) as described here.

flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/new_cell_ids.sql (1)

3-5: Verify version IS NULL semantics
Confirm that a null version truly indicates a new cell ID and that records with any non-null version should be excluded.

@@ -0,0 +1 @@
SELECT count(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat ISNULL No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect NULL check syntax
The condition cell_lat ISNULL is invalid SQL; use IS NULL instead (and consider adding a semicolon at the end).

-SELECT count(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat ISNULL
+SELECT COUNT(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat IS NULL;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT count(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat ISNULL
SELECT COUNT(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat IS NULL;
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat.sql at line
1, the SQL condition uses incorrect syntax `cell_lat ISNULL` for checking NULL
values. Replace `ISNULL` with the correct `IS NULL` syntax and add a semicolon
at the end of the SQL statement to properly terminate it.

Comment on lines 1 to 6
SELECT COALESCE(sum(n_dupes), 0) FROM
(SELECT count(*) - 1 as n_dupes
FROM {{ staging_table }}
GROUP BY
cell_id
HAVING count(*) - 1 > 1) tableWithCount No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct HAVING clause to capture all duplicates
The current HAVING count(*) - 1 > 1 filter only flags cell IDs appearing more than twice. To detect any duplicates (i.e., count > 1), adjust to either HAVING count(*) > 1 or HAVING count(*) - 1 > 0. For clarity:

-HAVING count(*) - 1 > 1
+HAVING count(*) > 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT COALESCE(sum(n_dupes), 0) FROM
(SELECT count(*) - 1 as n_dupes
FROM {{ staging_table }}
GROUP BY
cell_id
HAVING count(*) - 1 > 1) tableWithCount
SELECT COALESCE(sum(n_dupes), 0) FROM
(SELECT count(*) - 1 as n_dupes
FROM {{ staging_table }}
GROUP BY
cell_id
HAVING count(*) > 1) tableWithCount
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/count_duped_ids.sql
lines 1 to 6, the HAVING clause currently filters for cell IDs with more than
two occurrences by using "HAVING count(*) - 1 > 1". To correctly capture all
duplicates, change this condition to "HAVING count(*) > 1" or "HAVING count(*) -
1 > 0" so that any cell ID appearing more than once is included.

@@ -0,0 +1 @@
SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

SQL syntax fix: use IS NULL.
Replace the non-standard ISNULL check with standard SQL IS NULL for ANSI compliance.

Apply this diff:

-SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL
+SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long IS NULL
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lon.sql at line
1, replace the non-standard SQL syntax `ISNULL` with the standard `IS NULL` to
ensure ANSI SQL compliance. Update the WHERE clause condition from `cell_long
ISNULL` to `cell_long IS NULL`.

@@ -0,0 +1 @@
../cdr/earliest_timestamp.sql No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add licence header and consider symlink for wrapper file.
This stub correctly delegates to ../cdr/earliest_timestamp.sql but is missing the project’s standard licence header. You could also replace it with a symbolic link to avoid managing duplicated stubs.

🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/earliest_timestamp.sql at
line 1, add the project's standard licence header at the top of the file to
comply with licensing requirements. Additionally, instead of keeping this stub
SQL file with duplicated content, replace it with a symbolic link pointing to
../cdr/earliest_timestamp.sql to avoid duplication and simplify maintenance.

@@ -0,0 +1 @@
../cdr/earliest_timestamp.sql No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add licence header and consider symlink for wrapper file.
As above, include the standard FlowKit licence header and optionally use a symbolic link to ../cdr/earliest_timestamp.sql rather than a separate stub.

🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/earliest_timestamp.sql at
line 1, add the standard FlowKit licence header at the top of the file.
Additionally, consider replacing this file with a symbolic link to
../cdr/earliest_timestamp.sql to avoid duplication and maintain consistency.

@@ -0,0 +1 @@
SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL OR cell_lat = -1 OR cell_lat ISNULL No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use proper NULL check syntax
PostgreSQL requires IS NULL rather than ISNULL. Change both cell_long ISNULL and cell_lat ISNULL accordingly and consider adding a trailing semicolon for consistency.

Suggested diff:

-SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL OR cell_lat = -1 OR cell_lat ISNULL
+SELECT count(*) 
+FROM {{ staging_table }} 
+WHERE cell_long = -1 
+  OR cell_long IS NULL 
+  OR cell_lat = -1 
+  OR cell_lat IS NULL;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL OR cell_lat = -1 OR cell_lat ISNULL
SELECT count(*)
FROM {{ staging_table }}
WHERE cell_long = -1
OR cell_long IS NULL
OR cell_lat = -1
OR cell_lat IS NULL;
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat_or_lon.sql
at line 1, replace the incorrect NULL checks using ISNULL with the correct
PostgreSQL syntax IS NULL for both cell_long and cell_lat. Also, add a trailing
semicolon at the end of the SQL statement for consistency.

{{ final_table }}
{% if cdr_type == "calls" or cdr_type == "sms" %}
UNION ALL
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct SQL null‐check syntax.
The clause WHERE msisdn_counterpart NOTNULL is not valid in standard SQL; it should read IS NOT NULL.

Apply this diff:

- (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
+ (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL)
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_msisdns.sql at line 7,
replace the invalid SQL null-check syntax `WHERE msisdn_counterpart NOTNULL`
with the correct standard SQL syntax `WHERE msisdn_counterpart IS NOT NULL`.

Comment on lines 1 to 35
SELECT count(*) FROM
(SELECT count(*) as n_dupes
FROM {{ final_table }}
GROUP BY
msisdn,
datetime,
imsi,
imei,
tac,
location_id,
{% if cdr_type == 'calls' %}
msisdn_counterpart,
outgoing,
duration,
network,
{% elif cdr_type == 'sms' %}
msisdn_counterpart,
outgoing,
network,
{% elif cdr_type == 'mds' %}
duration,
volume_total,
volume_upload,
volume_download,
{% elif cdr_type == 'topups' %}
type,
recharge_amount,
airtime_fee,
tax_and_fee,
pre_event_balance,
post_event_balance,
{% endif %}
operator_code,
country_code
HAVING count(*) > 1) tableWithCount No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Unify duplicate‐count filename.
The filename count_duplicated.sql deviates from the count_duplicates.sql naming used in other CDR‐type directories. This discrepancy may break automated discovery of QA checks. Please rename to count_duplicates.sql (and update any references).

🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/topups/count_duplicated.sql lines 1 to 35,
the filename count_duplicated.sql does not match the naming convention
count_duplicates.sql used in other CDR-type directories. Rename the file to
count_duplicates.sql and update any references to this file in the codebase or
configuration to ensure automated QA check discovery works correctly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🔭 Outside diff range comments (20)
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_location_ids.sql (1)

1-5: Invalid SQL inclusion and incomplete query structure
Simply placing a relative file path (../cdr/count_location_ids.sql) is not valid SQL. You need to use the templating engine’s include or import directive, and wrap the SELECT DISTINCT block inside a proper SELECT count(*) if you intend to count unique location_ids. Please clarify the intended templating mechanism or replace it with an explicit query.

Could you confirm how shared CDR-level SQL files are included here? For example, with Jinja you might use:

{% include "../cdr/count_location_ids.sql" %}

Or rewrite explicitly:

SELECT COUNT(*) AS distinct_location_ids
  FROM (
    SELECT DISTINCT location_id
      FROM {{ final_table }}
  ) AS locations;
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imsi.sql (1)

1-19: Avoid duplicating SQL logic
This file should solely reference the shared ../cdr/max_msisdns_per_imsi.sql definition. The embedded SQL block duplicates the CDR-level query and may drift out of sync. Remove the inline SQL and rely on the include only.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_location_ids.sql (1)

1-14: Remove inline SQL duplication
This type-specific file should solely reference ../cdr/count_locatable_location_ids.sql. The inline SELECT block duplicates shared logic and risks inconsistency. Remove the embedded SQL and rely on the single include.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_location_ids.sql (1)

1-6: Remove inline SQL duplication
The inline SELECT block duplicates the shared definition in ../cdr/count_null_location_ids.sql. Keep only the include line to ensure a single source of truth.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imsi.sql (1)

1-19: Remove duplicate SQL logic: use only the shared CDR reference

This type‐level QA file should only include the ../cdr/max_msisdns_per_imsi.sql reference. The inline SQL (lines 2–19) duplicates the shared query and breaks the include pattern. Please strip out lines 2–19 so that the file only contains:

../cdr/max_msisdns_per_imsi.sql

Apply this diff:

- ../cdr/max_msisdns_per_imsi.sql
-     max(msisdn_count)
- FROM (
-     SELECT
-         imsi,
-         count(*) AS msisdn_count
-     FROM (
-         SELECT
-             imsi,
-             msisdn
-         FROM
-             {{ final_table }}
-         WHERE imsi IS NOT NULL
-         GROUP BY
-             imsi,
-             msisdn) _
-     GROUP BY
-         imsi) _
+ ../cdr/max_msisdns_per_imsi.sql
flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imsis.sql (1)

1-7: Remove inline SQL after the CDR include

Following the established pattern, this file should only reference the shared CDR query. Remove lines 2–7 so that only the include remains:

../cdr/count_null_imsis.sql
- ../cdr/count_null_imsis.sql
-     count(*)
- FROM
-     {{ final_table }}
- WHERE
-     imsi IS NULL
+ ../cdr/count_null_imsis.sql
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imsis.sql (1)

1-7: Remove inline SQL after the CDR include

To match other QA files, strip out lines 2–7 so that the file only contains:

../cdr/count_null_imsis.sql
- ../cdr/count_null_imsis.sql
-     count(*)
- FROM
-     {{ final_table }}
- WHERE
-     imsi IS NULL
+ ../cdr/count_null_imsis.sql
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imsis.sql (1)

1-11: Remove inline SQL after the CDR include

This SMS-type QA file should only include the shared reference. Remove lines 2–11 so that only the include remains:

../cdr/count_imsis.sql
- ../cdr/count_imsis.sql
-     count(*)
- FROM (
-     SELECT
-         imsi
-     FROM
-         {{ final_table }}
-     WHERE imsi IS NOT NULL
-     GROUP BY
-         imsi) _
+ ../cdr/count_imsis.sql
flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imeis.sql (1)

1-11: Remove duplicated SQL logic from wrapper file
This wrapper file should only reference the shared cdr/count_imeis.sql. Embedding the full query duplicates logic and risks maintenance drift. Please remove lines 2–11 and retain only the reference line.

@@ -1,11 +1,1
-../cdr/count_imeis.sql
-    count(*)
-FROM (
-    SELECT
-        imei
-    FROM
-        {{ final_table }}
-    WHERE imei IS NOT NULL
-    GROUP BY
-        imei) _
+../cdr/count_imeis.sql
flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imsis.sql (1)

1-11: Remove duplicated SQL logic from wrapper file
This wrapper file should only reference the shared cdr/count_imsis.sql. Embedding the full query duplicates logic and risks maintenance drift. Please remove lines 2–11 and retain only the reference line.

@@ -1,11 +1,1
-../cdr/count_imsis.sql
-    count(*)
-FROM (
-    SELECT
-        imsi
-    FROM
-        {{ final_table }}
-    WHERE imsi IS NOT NULL
-    GROUP BY
-        imsi) _
+../cdr/count_imsis.sql
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imsi.sql (1)

1-19: Remove duplicated SQL logic from wrapper file
This wrapper file should only reference the shared cdr/max_msisdns_per_imsi.sql. Embedding the full query duplicates logic and risks maintenance drift. Please remove lines 2–19 and retain only the reference line.

@@ -1,19 +1,1
-../cdr/max_msisdns_per_imsi.sql
-    max(msisdn_count)
-FROM (
-    SELECT
-        imsi,
-        count(*) AS msisdn_count
-    FROM (
-        SELECT
-            imsi,
-            msisdn
-        FROM
-            {{ final_table }}
-        WHERE imsi IS NOT NULL
-        GROUP BY
-            imsi,
-            msisdn) _
-    GROUP BY
-        imsi) _
+../cdr/max_msisdns_per_imsi.sql
flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imsi.sql (1)

1-19: Remove duplicated SQL logic from wrapper file
This wrapper file should only reference the shared cdr/max_msisdns_per_imsi.sql. Embedding the full query duplicates logic and risks maintenance drift. Please remove lines 2–19 and retain only the reference line.

@@ -1,19 +1,1
-../cdr/max_msisdns_per_imsi.sql
-    max(msisdn_count)
-FROM (
-    SELECT
-        imsi,
-        count(*) AS msisdn_count
-    FROM (
-        SELECT
-            imsi,
-            msisdn
-        FROM
-            {{ final_table }}
-        WHERE imsi IS NOT NULL
-        GROUP BY
-            imsi,
-            msisdm) _
-    GROUP BY
-        imsi) _
+../cdr/max_msisdns_per_imsi.sql
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imsis.sql (1)

1-11: Remove duplicated SQL logic from wrapper file
This wrapper file should only reference the shared cdr/count_imsis.sql. Embedding the full query duplicates logic and risks maintenance drift. Please remove lines 2–11 and retain only the reference line.

@@ -1,11 +1,1
-../cdr/count_imsis.sql
-    count(*)
-FROM (
-    SELECT
-        imsi
-    FROM
-        {{ final_table }}
-    WHERE imsi IS NOT NULL
-    GROUP BY
-        imsi) _
+../cdr/count_imsis.sql
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imsis.sql (1)

1-11: Wrapper file should only reference shared CDR SQL.
The count_imsis.sql under topups currently inlines the full query (lines 2–11) instead of delegating entirely to ../cdr/count_imsis.sql. This duplicates code and breaks the established pattern. Please remove the embedded SQL and leave only the relative-path reference.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imei.sql (1)

1-19: Remove inlined SQL logic from wrapper.
This MDS wrapper embeds the entire max_msisdns_per_imei query rather than deferring to ../cdr/max_msisdns_per_imei.sql. To maintain consistency and avoid duplication, strip out lines 2–19 so the file only contains the path reference.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_location_ids.sql (1)

1-7: Remove inlined SQL logic from wrapper.
The SMS wrapper currently includes the full count_null_location_ids query (lines 2–7) instead of solely pointing to ../cdr/count_null_location_ids.sql. Please remove the extra SQL so only the relative-path reference remains.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_events.sql (1)

7-9: Invalid SQL syntax for null check.

In PostgreSQL you must use IS NOT NULL rather than NOTNULL.

Apply this patch:

-    AND cl.geom_point NOTNULL
+    AND cl.geom_point IS NOT NULL
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_events.sql (1)

7-9: Use correct null check syntax.

Replace cl.geom_point NOTNULL with cl.geom_point IS NOT NULL to conform with PostgreSQL syntax.

Apply this patch:

-    AND cl.geom_point NOTNULL
+    AND cl.geom_point IS NOT NULL
flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_events.sql (1)

9-9: Invalid null check syntax
PostgreSQL requires IS NOT NULL in WHERE clauses, not NOTNULL.
Apply this diff:

-    AND cl.geom_point NOTNULL
+    AND cl.geom_point IS NOT NULL
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_location_ids.sql (1)

12-12: Invalid null check syntax
Replace the non-standard NOTNULL with IS NOT NULL in the WHERE clause.

-        AND cl.geom_point NOTNULL
+        AND cl.geom_point IS NOT NULL
🧹 Nitpick comments (26)
flowetl/flowetl/flowetl/qa_checks/qa_checks/count_staging_rows.sql (1)

1-1: Add a column alias for clarity
Returning an unnamed COUNT(*) can make downstream parsing ambiguous. Consider aliasing the result to total_rows and formatting the SQL for readability.

- SELECT COUNT(*) FROM {{ staging_table }}
+ SELECT COUNT(*) AS total_rows
+   FROM {{ staging_table }}
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_location_ids.sql (1)

1-5: Add a descriptive header and licence notice.

All SQL QA scripts should begin with a brief description of their purpose and include the project’s licence header for consistency.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/earliest_timestamp.sql (1)

1-1: Include licence header and description.

Please add the project licence header and a concise comment summarising the QA check’s intent for uniformity across all SQL scripts.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imei.sql (1)

1-19: Add header and improve alias naming.

Include the project’s licence header and a brief description. Also, replace the generic alias _ with a more descriptive identifier for better readability.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_location_ids.sql (1)

1-7: Add documentation and test coverage
These newly added QA checks lack accompanying documentation, ADR entries, and test cases for validation. Please update the QA docs and include tests to ensure maintainability and coverage.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/latest_timestamp.sql (1)

1-1: Add standard license header
All new SQL files should include the project’s standard license header for consistency.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_location_ids.sql (1)

1-5: Include standard license header
Please add the repository’s SQL license header to this new file.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imeis.sql (1)

1-10: Add project license header
Ensure the standard license statement appears at the top of this new SQL file.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_location_ids.sql (1)

1-14: Include standard license header
Add the repository’s SQL license header at the top to maintain compliance.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imeis.sql (1)

1-6: Add standard license header
Please prepend the project’s license header to this new SQL file for consistency.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_events.sql (1)

1-1: Missing license header
Please add the standard Flowminder license header at the top of this SQL file to comply with project conventions.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imeis.sql (1)

1-1: Missing license header
Add the standard Flowminder license header at the top of this SQL file to align with project guidelines.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_location_ids.sql (1)

1-1: Missing license header
Please include the standard Flowminder license header at the top of this SQL file.

flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lon.sql (1)

1-1: Missing license header
Add the standard Flowminder license header at the top of this SQL file to ensure compliance.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_duplicated.sql (2)

1-1: Missing license header
Please add the standard Flowminder license header at the top of this SQL file.


17-17: Use snake_case for table alias
The alias tableWithCount mixes camelCase; prefer snake_case (table_with_count) for consistency.

-    ) tableWithCount
+    ) table_with_count
flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicated.sql (1)

1-17: Consider factoring out repeated grouping keys

The GROUP BY clause here duplicates the same set of columns used across several QA checks. Extract these common grouping columns into a shared Jinja include to reduce repetition and simplify future updates.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicated.sql (1)

1-35: Extract common grouping logic into an include file

This query’s GROUP BY block mirrors those in other CDR-type checks with only minor variations. Moving the shared column list into a Jinja include (and then extending it per type) will DRY up the code and make maintenance easier.

CHANGELOG.md (2)

10-15: Fix unordered list indentation

The sub-items under - Added new cell info qa checks to flowetl: are indented by 4 spaces, but markdownlint expects 2-space indentation for nested lists. Please adjust the indent to 2 spaces for proper formatting.


19-20: Remove trailing comma

The entry 'count_imeis' will now be identified as '<cdr_type>.count_imeis', ends with an extraneous comma. Drop the comma to match the style elsewhere in the changelog.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_msisdns.sql (1)

2-8: Correct NULL predicate and standardise SQL style

Use IS NOT NULL instead of NOTNULL for valid SQL syntax, switch to single quotes in the Jinja conditional to match other checks, and uppercase the DISTINCT keyword. For example:

- SELECT count(*) FROM (SELECT distinct msisdn FROM
+ SELECT count(*) FROM (
+   SELECT DISTINCT msisdn FROM
         {{ final_table }}
-        {% if cdr_type == "calls" or cdr_type == "sms" %}
-        UNION ALL
-        (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
+   {% if cdr_type == 'calls' or cdr_type == 'sms' %}
+   UNION ALL
+   SELECT msisdn_counterpart AS msisdn
+   FROM {{ final_table }}
+   WHERE msisdn_counterpart IS NOT NULL
   {% endif %}
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_msisdns.sql (1)

1-10: Use descriptive aliases instead of _
Aliasing both subqueries as _ can be confusing and may lead to conflicts. Consider more meaningful names, for example:

-SELECT count(*) FROM (SELECT distinct msisdn FROM
-      ( … ) _
-) _
+SELECT count(*) FROM (
+    SELECT DISTINCT msisdn
+    FROM (
+
+    ) AS msisdn_union
+) AS distinct_msisdns
flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_msisdns.sql (1)

1-10: Use descriptive aliases instead of _
Both nested subqueries use _ as an alias, which reduces readability. A clearer pattern is:

-SELECT count(*) FROM (SELECT distinct msisdn FROM
-      ( … ) _
-) _
+SELECT count(*) FROM (
+    SELECT DISTINCT msisdn
+    FROM (
+
+    ) AS msisdn_union
+) AS distinct_msisdns
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicates.sql (2)

11-20: Consolidate duplicate grouping logic for calls and sms
Both the cdr_type == 'calls' and cdr_type == 'sms' branches group by the same columns (msisdn_counterpart, outgoing, network). Merge these into a single condition:

-{% if cdr_type == 'calls' %}
-    msisdn_counterpart,
-    outgoing,
-    network,
-{% elif cdr_type == 'sms' %}
-    msisdn_counterpart,
-    outgoing,
-    network,
-{% endif %}
+{% if cdr_type in ['calls', 'sms'] %}
+    msisdn_counterpart,
+    outgoing,
+    network,
+{% endif %}

1-35: Standardise Jinja string quoting
This template uses single quotes ('calls'), whereas other QA SQL templates (e.g., count_msisdns.sql) use double quotes ("calls"). For consistency and to reduce confusion, choose and apply one quoting style across all QA checks.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_duplicates.sql (1)

11-20: Consolidate duplicate grouping logic for calls and sms
Identical grouping is used for both cdr_type == 'calls' and cdr_type == 'sms'. Simplify by merging into a single branch:

-{% if cdr_type == 'calls' %}
-    msisdn_counterpart,
-    outgoing,
-    network,
-{% elif cdr_type == 'sms' %}
-    msisdn_counterpart,
-    outgoing,
-    network,
-{% endif %}
+{% if cdr_type in ['calls', 'sms'] %}
+    msisdn_counterpart,
+    outgoing,
+    network,
+{% endif %}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fd7515 and 4e5ca24.

📒 Files selected for processing (65)
  • CHANGELOG.md (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/count_duped_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat_or_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/new_cell_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/count_staging_rows.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/util.py (0 hunks)
💤 Files with no reviewable changes (1)
  • flowetl/flowetl/flowetl/util.py
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md

11-11: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


12-12: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (30)
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_location_ids.sql (1)

1-1: ```shell
#!/bin/bash
set -e

1. List all occurrences of count_location_ids.sql in the repo

echo "=== All count_location_ids.sql files ==="
find . -type f -name count_location_ids.sql

2. Show the first 10 lines of the calls version to inspect header and include

CALLS_FILE="flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_location_ids.sql"
echo "=== First 10 lines of ${CALLS_FILE} ==="
sed -n '1,10p' "${CALLS_FILE}"

3. Check for a relative include of the CDR version

echo "=== Include directives in calls version ==="
grep -n "../cdr/count_location_ids.sql" "${CALLS_FILE}" || echo "No '../cdr/count_location_ids.sql' include found"

4. Check for usage of alias '_' in the calls version

echo "=== Alias '_' usage in calls version ==="
grep -n " as _" "${CALLS_FILE}" || echo "No alias ' as _' found"

5. Verify that the CDR version exists at the expected path

CDR_FILE="flowetl/flowetl/flowetl/qa_checks/qa_checks/cdr/count_location_ids.sql"
if [ -f "${CDR_FILE}" ]; then
echo "✅ CDR file exists at ${CDR_FILE}"
else
echo "❌ Expected CDR file missing at ${CDR_FILE}"
fi


</details>
<details>
<summary>flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/earliest_timestamp.sql (1)</summary>

`1-1`: ```shell
#!/usr/bin/env bash
set -euo pipefail

echo "Finding all earliest_timestamp.sql files:"
fd earliest_timestamp.sql || true

echo
echo "Displaying first 50 lines of the SMS script:"
sms_file=$(fd earliest_timestamp.sql | grep '/sms/' || true)
if [[ -n "$sms_file" ]]; then
  sed -n '1,50p' "$sms_file"
else
  echo "⛔ SMS earliest_timestamp.sql not found"
fi

echo
echo "Displaying first 50 lines of the CDR script:"
cdr_file=$(fd earliest_timestamp.sql | grep '/cdr/' || true)
if [[ -n "$cdr_file" ]]; then
  sed -n '1,50p' "$cdr_file"
else
  echo "⛔ CDR earliest_timestamp.sql not found"
fi

echo
echo "Checking for a license header in the SMS script:"
if [[ -n "$sms_file" ]]; then
  head -n5 "$sms_file" | rg -i 'license|copyright' || echo "No license header detected in first 5 lines"
else
  echo "Skipping license check; SMS file missing"
fi
flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/latest_timestamp.sql (1)

1-1: Correct include-only pattern

This file correctly references the shared CDR query without additional SQL logic. LGTM.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/latest_timestamp.sql (1)

1-1: Wrapper correctly references shared CDR SQL.
This file follows the established pattern by containing only ../cdr/latest_timestamp.sql. No further changes needed.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/earliest_timestamp.sql (1)

1-1: Wrapper correctly references shared CDR SQL.
This file correctly delegates to ../cdr/earliest_timestamp.sql without additional logic. No updates required.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_location_ids.sql (1)

1-1: ```shell
#!/usr/bin/env bash

Verify the symlink points to the expected shared CDR QA check and resolves correctly

target="flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_locatable_location_ids.sql"
expected="../cdr/count_locatable_location_ids.sql"

actual=$(readlink "$target")
echo "Symlink target: $actual"

if [ "$actual" != "$expected" ]; then
echo "Unexpected symlink target: $actual (expected $expected)"
exit 1
fi

if [ ! -e "$(dirname "$target")/$actual" ]; then
echo "Broken target path: $(dirname "$target")/$actual"
exit 1
fi


</details>
<details>
<summary>flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imeis.sql (1)</summary>

`1-1`: ```shell
#!/usr/bin/env bash
# Detailed validation of the count_null_imeis.sql symlink
symlink="flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_imeis.sql"

echo "1. Symlink details:"
ls -l "$symlink" || { echo "Symlink not found"; exit 1; }

echo
echo "2. Linked target (readlink):"
linked=$(readlink "$symlink")
echo "  -> $linked"

echo
echo "3. Resolved reference path and existence check:"
ref_path="$(dirname "$symlink")/$linked"
echo "  Resolved to: $ref_path"
if [ -e "$ref_path" ]; then
  echo "  ✅ Target file exists"
else
  echo "  ❌ Target file is missing"
fi

echo
echo "4. All occurrences of count_null_imeis.sql in the repo:"
rg -l "count_null_imeis\.sql" || echo "No other instances found"
flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imsis.sql (1)

1-1: Further verify symlink resolution and locate the shared SQL file

#!/usr/bin/env bash
# Inspect the symlink target and ensure the shared SQL exists
target="flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imsis.sql"
echo "Symlink target:"
readlink -f "$target"

echo -e "\nAll instances of count_null_imsis.sql in the repo:"
rg 'count_null_imsis.sql' -g '*.sql'
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_events.sql (1)

1-1: Let's verify that the symlink points to the expected shared QA check:

#!/usr/bin/env bash
# Confirm the symlink destination for count_locatable_events.sql
target="flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_locatable_events.sql"
expected="../cdr/count_locatable_events.sql"
actual=$(readlink "$target")

if [ "$actual" != "$expected" ]; then
  echo "Symlink '$target' points to '$actual', expected '$expected'"
  exit 1
fi
flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/earliest_timestamp.sql (1)

1-1: Verify symlink for earliest_timestamp.sql
Confirm that ../cdr/earliest_timestamp.sql resolves correctly and is not broken in CI or the packaged distribution.

#!/usr/bin/env bash
# Validate earliest timestamp QA check symlink
target="flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/earliest_timestamp.sql"
if [ ! -L "$target" ] || [ ! -e "$target" ]; then
  echo "Broken or missing symlink: $target"
  exit 1
fi
flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/max_msisdns_per_imei.sql (1)

1-19: Invalid file content; expected symlink
This file contains both the relative-path pointer and the full SQL query, which will lead to SQL parsing errors. According to the PR’s restructuring, this should be a pure symbolic link to ../cdr/max_msisdns_per_imei.sql.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_null_location_ids.sql (1)

1-7: Unexpected mixed pointer and SQL; revert to symlink
The file mixes the symlink indicator (../cdr/count_null_location_ids.sql) with actual query content. It should be a standalone symbolic link or contain only the SQL logic—not both.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/max_msisdns_per_imei.sql (1)

1-19: Invalid file content; expected symlink
As with the SMS version, this file erroneously includes both the path pointer and the SQL body. Please replace it with a pure symbolic link to ../cdr/max_msisdns_per_imei.sql.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/latest_timestamp.sql (1)

1-1: Symlink pattern is correct
This file correctly points to the shared CDR QA check at ../cdr/latest_timestamp.sql, aligning with the new modular structure.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_location_ids.sql (1)

1-1: Inconsistent AI summary regarding file contents.

The AI-generated summary states this file contains only a symlink reference, but it also includes the SELECT DISTINCT location_id logic. Please update the summary or refactor the file to match project conventions.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/earliest_timestamp.sql (1)

1-1: Verify symlink usage.

This file appears to act as a placeholder to the CDR-level earliest_timestamp.sql. Confirm that it’s implemented as a filesystem symlink (mode 120000) so that SQL runners follow it correctly, rather than duplicating the content.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_events.sql (1)

1-1: Inconsistent AI summary: additional SQL logic present.

The summary indicates only a reference, but this file includes a JOIN with infrastructure.cells and a filter on geom_point. Update the summary or split the logic appropriately.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/max_msisdns_per_imei.sql (1)

1-1: Inconsistent AI summary: full query included.

The AI summary describes only a symlink, but the file contains the full subquery logic. Please update the summary or adjust the file to match how generic CDR checks are handled.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_locatable_events.sql (1)

1-1: Inconsistent AI summary: logic beyond a reference.

The summary suggests just a relative include, but the file includes JOIN logic and a geom_point filter. Please correct the summary or reorganise this file to align with project structure.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imeis.sql (1)

1-7: Consistent wrapper for null IMEIs
This thin wrapper correctly references the shared CDR-level query (count_null_imeis.sql) and uses the {{ final_table }} placeholder as intended.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_null_imsis.sql (1)

1-7: Consistent wrapper for null IMSIs
This wrapper accurately delegates to the shared CDR-level query (count_null_imsis.sql), and the relative path and templating usage are correct.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_imeis.sql (1)

1-7: Consistent wrapper for calls null IMEIs
Reuses the shared CDR-level query for null IMEIs in the calls namespace. The path and {{ final_table }} placeholder are correctly applied.

flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_imeis.sql (1)

1-11: Distinct IMEIs count is correctly implemented
The subquery for distinct non-null IMEIs is appropriately included via count_imeis.sql, and the outer count(*) is applied correctly against {{ final_table }}.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_null_location_ids.sql (1)

1-7: Consistent wrapper for null location IDs
This wrapper delegates to the shared CDR-level query (count_null_location_ids.sql) as expected, with correct templating.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/latest_timestamp.sql (1)

1-1: Wrapper file referencing shared QA logic
This file correctly delegates to the common cdr/latest_timestamp.sql check.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_location_ids.sql (1)

1-5: Inconsistent with AI summary: wrapper vs inlined logic
The AI summary indicates this file should only reference ../cdr/count_location_ids.sql, but it also contains an inlined subquery. Please confirm the intended pattern—either drop the duplicated SQL block here or remove the in-file logic and rely solely on the shared CDR script.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_imeis.sql (1)

1-10: Inconsistent with AI summary: wrapper vs inlined logic
The summary states this should be a pure reference to ../cdr/count_imeis.sql, yet the file includes the query body. Confirm whether the inline SQL belongs here or should reside exclusively in the shared CDR folder.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/calls/count_locatable_location_ids.sql (1)

1-14: Inconsistent with AI summary: wrapper vs inlined logic
According to the summary, this file should only include ../cdr/count_locatable_location_ids.sql, but it also carries the full query. Please clarify and either remove the inline SQL or shift it entirely to the shared CDR script.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/topups/count_null_imeis.sql (1)

1-6: Inconsistent with AI summary: wrapper vs inlined logic
The file both references ../cdr/count_null_imeis.sql and provides its own query. Confirm whether the inline SQL is intended here or if it should be consolidated in the shared CDR directory.

Likely an incorrect or invalid review comment.

flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_duplicates.sql (1)

1-17: LGTM

The query correctly computes the total number of duplicate rows by summing (count(*) - 1) per group and defaulting to zero. It aligns with existing QA checks.

@@ -0,0 +1 @@
SELECT count(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat ISNULL No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Invalid NULL check syntax
The use of ISNULL is not standard SQL (in Postgres it should be IS NULL). Additionally, consider adding an AS alias to the count for consistency with other checks.

- SELECT count(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat ISNULL
+ SELECT count(*) AS invalid_lat_count
+   FROM {{ staging_table }}
+  WHERE cell_lat = -1 OR cell_lat IS NULL
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT count(*) FROM {{ staging_table }} WHERE cell_lat = -1 OR cell_lat ISNULL
SELECT count(*) AS invalid_lat_count
FROM {{ staging_table }}
WHERE cell_lat = -1 OR cell_lat IS NULL
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat.sql at line
1, replace the non-standard `ISNULL` with the correct Postgres syntax `IS NULL`
for checking null values. Also, add an `AS` alias to the count result for
consistency, for example `count(*) AS garbage_lat_count`.

Comment on lines 1 to 6
SELECT COALESCE(sum(n_dupes), 0) FROM
(SELECT count(*) - 1 as n_dupes
FROM {{ staging_table }}
GROUP BY
cell_id
HAVING count(*) - 1 > 1) tableWithCount No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

HAVING clause logic excludes single duplicates
The condition HAVING count(*) - 1 > 1 only flags IDs with more than two occurrences, omitting valid single-duplicate cases (2 occurrences yields 1 duplicate). Update the filter to > 0 so any duplicate is captured, and alias the outer result.

- SELECT COALESCE(sum(n_dupes), 0) FROM
-           (SELECT count(*) - 1 as n_dupes
-             FROM {{ staging_table }}
-             GROUP BY
-                 cell_id
-             HAVING count(*) - 1 > 1) tableWithCount
+ SELECT COALESCE(SUM(n_dupes), 0) AS duplicate_count
+   FROM (
+     SELECT COUNT(*) - 1 AS n_dupes
+       FROM {{ staging_table }}
+      GROUP BY cell_id
+     HAVING COUNT(*) - 1 > 0
+   ) AS table_with_count;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT COALESCE(sum(n_dupes), 0) FROM
(SELECT count(*) - 1 as n_dupes
FROM {{ staging_table }}
GROUP BY
cell_id
HAVING count(*) - 1 > 1) tableWithCount
SELECT COALESCE(SUM(n_dupes), 0) AS duplicate_count
FROM (
SELECT COUNT(*) - 1 AS n_dupes
FROM {{ staging_table }}
GROUP BY cell_id
HAVING COUNT(*) - 1 > 0
) AS table_with_count;
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/count_duped_ids.sql
lines 1 to 6, the HAVING clause uses `count(*) - 1 > 1`, which excludes IDs with
exactly two occurrences (single duplicates). Change the condition to `count(*) -
1 > 0` to include all duplicates, and add an alias to the outer SELECT result
for clarity.

Comment on lines 1 to 5
SELECT
count(*)
FROM {{ extract_table }} LEFT OUTER JOIN infrastructure.cells
USING (id)
WHERE version ISNULL No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Invalid NULL check syntax and missing alias
ISNULL is non-standard; use IS NULL. Also alias the count and align the join formatting.

- SELECT
-     count(*)
- FROM {{ extract_table }} LEFT OUTER JOIN infrastructure.cells
-     USING (id)
- WHERE version ISNULL
+ SELECT COUNT(*) AS new_cell_id_count
+   FROM {{ extract_table }}
+   LEFT JOIN infrastructure.cells USING (id)
+  WHERE version IS NULL;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT
count(*)
FROM {{ extract_table }} LEFT OUTER JOIN infrastructure.cells
USING (id)
WHERE version ISNULL
SELECT COUNT(*) AS new_cell_id_count
FROM {{ extract_table }}
LEFT JOIN infrastructure.cells USING (id)
WHERE version IS NULL;
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/new_cell_ids.sql at
lines 1 to 5, replace the non-standard `ISNULL` with the correct SQL syntax `IS
NULL` for the null check. Add an alias to the count result for clarity, such as
`AS count_missing_version`. Also, adjust the JOIN clause formatting to improve
readability by placing the JOIN keyword and table on separate lines aligned
properly.

@@ -0,0 +1 @@
SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL OR cell_lat = -1 OR cell_lat ISNULL No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix SQL syntax for NULL checks
The use of ISNULL is invalid in standard SQL. Replace with IS NULL and terminate the statement with a semicolon. For example:

SELECT count(*) FROM {{ staging_table }}
WHERE cell_long = -1
  OR cell_long IS NULL
  OR cell_lat = -1
  OR cell_lat IS NULL;
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lat_or_lon.sql
at line 1, the SQL uses the invalid syntax ISNULL for null checks. Replace all
occurrences of ISNULL with the correct standard SQL syntax IS NULL and add a
semicolon at the end of the statement to properly terminate it.

@@ -0,0 +1 @@
SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Invalid null check syntax
Use IS NULL instead of ISNULL in the WHERE clause.

-SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long ISNULL
+SELECT count(*) FROM {{ staging_table }} WHERE cell_long = -1 OR cell_long IS NULL
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/cell_info/garbage_lon.sql at line
1, the SQL query uses the incorrect syntax `ISNULL` to check for null values.
Replace `cell_long ISNULL` with the correct syntax `cell_long IS NULL` to
properly check for nulls in the WHERE clause.

Comment on lines 5 to 7
{% if cdr_type == "calls" or cdr_type == "sms" %}
UNION ALL
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect null‐check syntax
The clause WHERE msisdn_counterpart NOTNULL is not valid SQL. Replace it with IS NOT NULL to perform proper null checks.
Apply this diff:

-(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
+(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL)
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_msisdns.sql around
lines 5 to 7, replace the invalid null check syntax `WHERE msisdn_counterpart
NOTNULL` with the correct SQL syntax `WHERE msisdn_counterpart IS NOT NULL` to
properly filter out null values.

Comment on lines 5 to 7
{% if cdr_type == "calls" or cdr_type == "sms" %}
UNION ALL
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect null‐check syntax
The clause WHERE msisdn_counterpart NOTNULL will fail at runtime. Use IS NOT NULL instead:

-(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
+(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL)
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/mds/count_msisdns.sql around
lines 5 to 7, the null check syntax `WHERE msisdn_counterpart NOTNULL` is
incorrect and will cause a runtime error. Replace `NOTNULL` with the correct SQL
syntax `IS NOT NULL` to properly filter non-null values.

Comment on lines 1 to 35
SELECT COALESCE(sum(n_dupes), 0) FROM
(SELECT count(*) - 1 as n_dupes
FROM {{ final_table }}
GROUP BY
msisdn,
datetime,
imsi,
imei,
tac,
location_id,
{% if cdr_type == 'calls' %}
msisdn_counterpart,
outgoing,
duration,
network,
{% elif cdr_type == 'sms' %}
msisdn_counterpart,
outgoing,
network,
{% elif cdr_type == 'mds' %}
duration,
volume_total,
volume_upload,
volume_download,
{% elif cdr_type == 'topups' %}
type,
recharge_amount,
airtime_fee,
tax_and_fee,
pre_event_balance,
post_event_balance,
{% endif %}
operator_code,
country_code
HAVING count(*) > 1) tableWithCount No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Unify file naming for duplicate counts
The Calls QA check uses count_duplicated.sql while SMS and Topups use count_duplicates.sql. Align filenames and identifiers (e.g. standardise on count_duplicates.sql) to ensure consistency across all CDR types.

🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/sms/count_duplicates.sql lines 1
to 35, the filename and identifiers for counting duplicates differ from those
used in the Calls QA check, which uses count_duplicated.sql. Rename the Calls QA
check file and any related identifiers to count_duplicates.sql to standardize
naming conventions across all CDR types, ensuring consistency in file names and
references.

Copy link

cypress bot commented Jun 20, 2025

FlowAuth    Run #24945

Run Properties:  status check passed Passed #24945  •  git commit 46a66da5ec: Split checks into stages
Project FlowAuth
Branch Review add-garbage-cell-info-check
Run status status check passed Passed #24945
Run duration 00m 46s
Commit git commit 46a66da5ec: Split checks into stages
Committer Jonathan Gray
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 4
View all changes introduced in this branch ↗︎

@greenape greenape force-pushed the add-garbage-cell-info-check branch from 4e5ca24 to 771d862 Compare June 23, 2025 09:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_msisdns.sql (1)

1-10: Simplify SQL and reduce nested subqueries

Consider replacing the double-nested UNION ALL plus outer DISTINCT with a single-level COUNT(DISTINCT ...) on a UNION of msisdn and msisdn_counterpart. This reduces query complexity and improves readability.

Suggested diff:

-SELECT count(*) FROM (SELECT distinct msisdn FROM
-      (SELECT msisdn
-      FROM
-        {{ final_table }}
-        {% if cdr_type == "calls" or cdr_type == "sms" %}
-        UNION ALL
-        (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
-        {% endif %}
-        ) _
-    ) _
+SELECT COUNT(DISTINCT msisdn) FROM (
+  SELECT msisdn FROM {{ final_table }}
+  {% if cdr_type == "calls" or cdr_type == "sms" %}
+  UNION
+  SELECT msisdn_counterpart FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL
+  {% endif %}
+) sub_msisdn;
flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_msisdns.sql (1)

1-10: Simplify SQL by using COUNT(DISTINCT) and UNION

Instead of nested DISTINCT subqueries, use a direct COUNT(DISTINCT ...) on a UNION of msisdn and msisdn_counterpart, and give the subquery a clear alias. This flattens the structure and may improve performance.

Proposed diff:

-SELECT count(*) FROM (SELECT distinct msisdn FROM
-      (SELECT msisdn
-      FROM
-        {{ final_table }}
-        {% if cdr_type == "calls" or cdr_type == "sms" %}
-        UNION ALL
-        (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
-        {% endif %}
-        ) _
-    ) _
+SELECT COUNT(DISTINCT msisdn) FROM (
+  SELECT msisdn FROM {{ final_table }}
+  {% if cdr_type == "calls" or cdr_type == "sms" %}
+  UNION
+  SELECT msisdn_counterpart FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL
+  {% endif %}
+) sub_msisdn;
CHANGELOG.md (1)

10-16: Fix markdown list indentation for consistency.

The unordered list items have inconsistent indentation. According to the markdown linting rules, list items should use 2-space indentation rather than 4-space indentation.

Apply this diff to fix the indentation:

- Added new cell info qa checks to flowetl:
-    - Missing latitude
-    - Missing longitude
-    - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868)
-    - Number of new cell ids
-    - Number of duplicate ids
+- Added new cell info qa checks to flowetl:
+  - Missing latitude
+  - Missing longitude
+  - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868)
+  - Number of new cell ids
+  - Number of duplicate ids
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e5ca24 and 771d862.

📒 Files selected for processing (68)
  • CHANGELOG.md (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicated.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicates.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/count_duped_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat_or_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/new_cell_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/count_staging_rows.sql (1 hunks)
  • flowetl/flowetl/flowetl/util.py (7 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (7 hunks)
💤 Files with no reviewable changes (2)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (57)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/new_cell_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/count_staging_rows.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imei.sql
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

38-38: Undefined name DAG

(F821)

🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md

11-11: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


12-12: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: CodeQL-Build
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (4)
flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicates.sql (1)

1-16: Correctly counts SMS duplicate records

The query calculates per-group duplicate counts (count(*) - 1), sums them, and defaults to zero if none exist. It aligns with the intended metric and is consistent with other CDR types.

flowetl/tests/unit/test_qa_check_discovery.py (1)

10-60: Well-structured test refactoring for multi-stage QA checks.

The refactoring properly separates QA checks by stage and uses parameterised testing to ensure comprehensive coverage across all ETL stages. The code is clean and maintains good test practices.

flowetl/flowetl/flowetl/util.py (2)

26-34: Well-designed enum for ETL stages.

The ETLStage enum provides a clean way to represent the different ETL stages and integrates well with the enhanced QA check functionality.


367-447: Excellent implementation of stage-aware QA checks in DAG creation.

The modification to create_dag function properly integrates QA checks at different ETL stages, ensuring comprehensive quality assurance throughout the pipeline. The placement of staging checks after flux sensors and extract/final checks at appropriate points in the workflow demonstrates good understanding of the ETL process.

Comment on lines 1 to 6
SELECT COALESCE(sum(n_dupes), 0) FROM
(SELECT count(*) - 1 as n_dupes
FROM {{ staging_table }}
GROUP BY
cell_id
HAVING count(*) - 1 > 1) tableWithCount No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix duplicate count logic

The HAVING count(*) - 1 > 1 condition excludes groups with exactly two occurrences, so those duplicates aren’t counted. It should use HAVING count(*) > 1 to capture all duplicates.

Apply this diff:

-            HAVING count(*) - 1 > 1
+            HAVING count(*) > 1
🤖 Prompt for AI Agents
In
flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/count_duped_ids.sql
lines 1 to 6, the HAVING clause uses `count(*) - 1 > 1`, which misses groups
with exactly two duplicates. Change the condition to `HAVING count(*) > 1` to
correctly include all groups with duplicates. This will ensure all duplicate
counts are captured accurately.

Comment on lines 1 to 17
SELECT count(*) FROM
(SELECT count(*) as n_dupes
FROM {{ final_table }}
GROUP BY
msisdn,
datetime,
imsi,
imei,
tac,
location_id,
duration,
volume_total,
volume_upload,
volume_download,
operator_code,
country_code
HAVING count(*) > 1) tableWithCount No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Align duplicate counting logic with other CDR types

This query currently returns the number of duplicate groups instead of the total duplicate records. For consistency with the SMS implementation it should subtract one per group and sum those values, wrapping with COALESCE.

Recommended diff:

-SELECT count(*) FROM
-  (SELECT count(*) as n_dupes
+SELECT COALESCE(sum(n_dupes), 0) FROM (
+  SELECT count(*) - 1 as n_dupes
     FROM {{ final_table }}
     GROUP BY
         msisdn,
         datetime,
         imsi,
         imei,
         tac,
         location_id,
         duration,
         volume_total,
         volume_upload,
         volume_download,
         operator_code,
         country_code
-    HAVING count(*) > 1) tableWithCount
+    HAVING count(*) > 1
+) sub_duplicates;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT count(*) FROM
(SELECT count(*) as n_dupes
FROM {{ final_table }}
GROUP BY
msisdn,
datetime,
imsi,
imei,
tac,
location_id,
duration,
volume_total,
volume_upload,
volume_download,
operator_code,
country_code
HAVING count(*) > 1) tableWithCount
SELECT COALESCE(sum(n_dupes), 0) FROM (
SELECT count(*) - 1 as n_dupes
FROM {{ final_table }}
GROUP BY
msisdn,
datetime,
imsi,
imei,
tac,
location_id,
duration,
volume_total,
volume_upload,
volume_download,
operator_code,
country_code
HAVING count(*) > 1
) sub_duplicates;
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicated.sql
lines 1 to 17, the query counts the number of duplicate groups rather than the
total duplicate records. To fix this, modify the query to sum (count(*) - 1) for
each group to get the total duplicates, and wrap the sum with COALESCE to handle
nulls, ensuring consistency with the SMS implementation.

{{ final_table }}
{% if cdr_type == "calls" or cdr_type == "sms" %}
UNION ALL
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use standard SQL syntax for null checking.

The NOTNULL condition should use the standard SQL syntax IS NOT NULL for better compatibility and clarity.

Apply this diff to fix the SQL syntax:

-        (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
+        (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL)
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql at
line 7, replace the non-standard `NOTNULL` condition with the standard SQL
syntax `IS NOT NULL` to ensure better compatibility and clarity in the null
check.


def get_qa_checks(
*,
dag: Optional["DAG"] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix undefined type annotation.

The DAG type is used in the type hint but not imported, causing a static analysis error.

Apply this diff to fix the type annotation:

def get_qa_checks(
    *,
-    dag: Optional["DAG"] = None,
+    dag: Optional["airflow.DAG"] = None,
    additional_qa_check_paths: Optional[List[str]] = None,
    stage: Optional[ETLStage] = ETLStage.FINAL,
) -> List["QACheckOperator"]:

Alternatively, add the import at the top of the file:

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from airflow import DAG
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dag: Optional["DAG"] = None,
def get_qa_checks(
*,
dag: Optional["airflow.DAG"] = None,
additional_qa_check_paths: Optional[List[str]] = None,
stage: Optional[ETLStage] = ETLStage.FINAL,
) -> List["QACheckOperator"]:
🧰 Tools
🪛 Ruff (0.11.9)

38-38: Undefined name DAG

(F821)

🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/util.py at line 38, the type annotation uses DAG
which is not imported, causing a static analysis error. Fix this by importing
DAG from its module at the top of the file, ensuring the type annotation is
recognized and the error is resolved.

@greenape greenape force-pushed the add-garbage-cell-info-check branch from 771d862 to b07e934 Compare June 23, 2025 09:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
flowetl/flowetl/flowetl/util.py (1)

40-40: Undefined type annotation issue persists.

The DAG type annotation issue flagged in the past review remains unresolved.

flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql (1)

7-7: SQL syntax issue remains unaddressed.

The non-standard NOTNULL condition should use the standard SQL syntax IS NOT NULL as flagged in the previous review.

🧹 Nitpick comments (1)
flowetl/flowetl/flowetl/util.py (1)

8-8: Remove unused import.

The StrEnum import is not used in the code. The ETLStage class uses str, Enum instead.

-from enum import Enum, StrEnum
+from enum import Enum
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 771d862 and b07e934.

📒 Files selected for processing (67)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicated.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicates.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/count_duped_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat_or_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/new_cell_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/count_staging_rows.sql (1 hunks)
  • flowetl/flowetl/flowetl/util.py (7 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (7 hunks)
💤 Files with no reviewable changes (2)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicates.sql
🚧 Files skipped from review as they are similar to previous changes (63)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/count_duped_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/count_staging_rows.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/new_cell_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicates.sql
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

8-8: enum.StrEnum imported but unused

Remove unused import: enum.StrEnum

(F401)


38-38: Undefined name DAG

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: CodeQL-Build
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (7)
flowetl/flowetl/flowetl/util.py (6)

26-34: Well-implemented enum for ETL stages.

The ETLStage enum is properly implemented with clear documentation and appropriate inheritance from str and Enum for string-based operations.


57-59: Clear documentation for the new stage parameter.

The docstring properly documents the new stage parameter with appropriate default value and explanation.


73-89: Robust stage-aware path handling.

The implementation correctly:

  • Validates the stage parameter using ETLStage(stage)
  • Constructs stage-specific paths for both default and additional QA check paths
  • Maintains backward compatibility with existing template search paths

260-261: Enhanced documentation for stage-specific paths.

The updated docstring clearly explains the requirement for organising additional QA check paths under stage subdirectories.


367-378: Proper integration of staging QA checks.

The staging QA checks are correctly integrated into both flux sensor workflows, running after flux checks but before extraction. The consistent pattern between FILE and TABLE flux sensor types ensures uniform behaviour.

Also applies to: 386-397


428-433: Appropriate placement of extract stage QA checks.

The extract stage QA checks are properly positioned after extraction (and optional clustering) but before constraints and indexes, which is the logical sequence for data quality validation.

flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql (1)

1-10: Well-structured query logic for MSISDN counting.

The query design correctly:

  • Uses nested subqueries to ensure proper deduplication
  • Conditionally includes counterpart MSISDNs only for calls and SMS types
  • Handles the union and distinct operations in the appropriate order

The overall approach for counting unique MSISDNs across both regular and counterpart fields is sound.

@greenape greenape force-pushed the add-garbage-cell-info-check branch 3 times, most recently from 30eee7b to 42eb441 Compare June 23, 2025 09:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
flowetl/flowetl/flowetl/util.py (1)

38-38: Fix undefined type annotation.

The DAG type is used in the type hint but not imported, causing a static analysis error.

Apply this diff to fix the type annotation:

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from airflow import DAG

Then update the type hint:

-    dag: Optional["DAG"] = None,
+    dag: Optional[DAG] = None,
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7274ac8 and 42eb441.

📒 Files selected for processing (67)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicated.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicates.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/count_duped_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat_or_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/new_cell_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/count_staging_rows.sql (1 hunks)
  • flowetl/flowetl/flowetl/util.py (7 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (7 hunks)
💤 Files with no reviewable changes (2)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (1)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/count_duped_ids.sql
🚧 Files skipped from review as they are similar to previous changes (63)
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/cell_info/new_cell_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/mds/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/calls/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/topups/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/staging/count_staging_rows.sql
  • flowetl/flowetl/flowetl/qa_checks/qa_checks/final/sms/count_msisdns.sql
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

38-38: Undefined name DAG

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: CodeQL-Build
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (6)
flowetl/flowetl/flowetl/util.py (6)

26-34: Well-designed enum for ETL stage management.

The ETLStage enum provides a clean way to represent different ETL phases and will help organise QA checks systematically. The string inheritance allows for easy serialisation and template path construction.


40-40: Good addition of stage parameter with sensible default.

The stage parameter with ETLStage.FINAL as default maintains backward compatibility whilst enabling stage-specific QA checks.


73-73: Proper validation of stage parameter.

The explicit conversion to ETLStage ensures type safety and will raise a clear error if an invalid stage is provided.


75-89: Well-implemented stage-aware path construction.

The modification to include stage subdirectories in both default and additional QA check paths enables proper organisation of checks by ETL stage.


430-435: Verify the purpose of the EmptyOperator 'gather' task.

The EmptyOperator named "gather" is inserted after extract QA checks but its purpose isn't immediately clear from the context. Ensure this is necessary for the DAG structure.

Could you clarify the purpose of the "gather" task? Is it used to synchronise multiple QA check tasks before proceeding to constraints and indexes?


448-448: Final QA checks correctly default to FINAL stage.

The final QA checks appropriately use the default stage parameter (FINAL), maintaining the existing behaviour whilst being explicit about the stage.

@greenape greenape force-pushed the add-garbage-cell-info-check branch 2 times, most recently from 7144700 to c646b91 Compare June 23, 2025 10:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
flowetl/flowetl/flowetl/util.py (4)

38-38: The DAG type annotation issue remains unresolved.


377-377: Use enum values consistently instead of string literals.


396-396: Use enum values consistently instead of string literals.


435-435: Use enum value for extract stage.

flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql (1)

1-10: Duplicate: apply the same SQL enhancements as for final/topups/count_msisdns.sql
All critical fixes, optional refactors, and nitpicks around IS NOT NULL, COUNT(DISTINCT ...), alias improvements, and macro extraction apply identically here.

🧹 Nitpick comments (5)
flowetl/flowetl/flowetl/util.py (2)

78-78: Remove debug print statement from production code.

This debug print statement should be removed before merging to production.

-    print(default_path)

92-92: Remove debug print statement from production code.

This debug print statement should be removed before merging to production.

-    print(jinja_env.list_templates())
flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql (3)

1-10: Optional refactor: simplify using COUNT(DISTINCT)
Nestings can be replaced with a single COUNT(DISTINCT msisdn) over the unioned set for improved readability and performance.

-SELECT count(*) FROM (
-  SELECT distinct msisdn
-  FROM
-    {{ final_table }}
-    {% if cdr_type == "calls" or cdr_type == "sms" %}
-    UNION ALL
-    (SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart IS NOT NULL)
-    {% endif %}
-  ) _
-) _
+SELECT COUNT(DISTINCT msisdn)
+FROM (
+  SELECT msisdn
+  FROM {{ final_table }}
+  {% if cdr_type == "calls" or cdr_type == "sms" %}
+  UNION ALL
+  SELECT msisdn_counterpart
+  FROM {{ final_table }}
+  WHERE msisdn_counterpart IS NOT NULL
+  {% endif %}
+) AS all_msisdns;

9-10: Nitpick: improve alias naming and terminate statement
Using _ as an alias is ambiguous; consider a descriptive alias (e.g. all_msisdns) and ensure the statement ends with a semicolon.


1-10: Advice: consolidate duplicated logic via Jinja macro
This MSISDN-counting logic is repeated across multiple QA checks. Extract it into a shared Jinja template or macro to DRY up the SQL and simplify future maintenance.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42eb441 and c646b91.

📒 Files selected for processing (68)
  • flowetl/flowetl/MANIFEST.in (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql (1 hunks)
  • flowetl/flowetl/flowetl/util.py (8 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (9 hunks)
💤 Files with no reviewable changes (2)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
✅ Files skipped from review due to trivial changes (58)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
  • flowetl/flowetl/MANIFEST.in
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

38-38: Undefined name DAG

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: CodeQL-Build
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (3)
flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql (1)

1-1: LGTM! The query correctly identifies invalid longitude values.

The logic properly checks for both explicit invalid values (-1) and null longitude values.

flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql (1)

1-5: LGTM! The query correctly identifies new cell IDs.

The LEFT OUTER JOIN with NULL check properly identifies staging records that don't have corresponding entries in the infrastructure.cells table.

flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql (1)

1-10: LGTM! The query correctly counts distinct MSISDNs with proper conditional logic.

The implementation properly:

  • Combines MSISDNs from both primary and counterpart columns for relevant CDR types
  • Uses UNION ALL followed by DISTINCT for efficient deduplication
  • Handles NULL counterpart values appropriately

Comment on lines +1 to +6
SELECT COALESCE(sum(n_dupes), 0) FROM
(SELECT count(*) - 1 as n_dupes
FROM {{ staging_table }}
GROUP BY
cell_id
HAVING count(*) - 1 > 1) tableWithCount No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the duplicate detection logic.

The HAVING condition count(*) - 1 > 1 will miss cases where there are exactly 2 duplicate records (which should count as 1 duplicate). This condition only includes groups with 3 or more records.

Apply this diff to fix the logic:

            HAVING count(*) - 1 > 1) tableWithCount
+            HAVING count(*) > 1) tableWithCount

This change ensures that groups with 2 or more records (i.e., at least 1 duplicate) are included in the count.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql lines
1 to 6, the HAVING clause uses `count(*) - 1 > 1`, which excludes groups with
exactly 2 records and thus misses some duplicates. Change the HAVING condition
to `count(*) - 1 >= 1` to include groups with at least one duplicate, ensuring
all duplicate cases are counted correctly.

{{ final_table }}
{% if cdr_type == "calls" or cdr_type == "sms" %}
UNION ALL
(SELECT msisdn_counterpart AS msisdn FROM {{ final_table }} WHERE msisdn_counterpart NOTNULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: use proper SQL null check syntax
The condition msisdn_counterpart NOTNULL is invalid in most SQL dialects. It should read msisdn_counterart IS NOT NULL.

🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql at line 7,
replace the invalid condition "msisdn_counterpart NOTNULL" with the correct SQL
syntax "msisdn_counterpart IS NOT NULL" to properly check for non-null values.

@greenape greenape force-pushed the add-garbage-cell-info-check branch 4 times, most recently from 4788654 to 490afbc Compare June 23, 2025 12:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)

38-38: Fix the undefined type annotation.

The DAG type is used in the type hint but not imported, causing a static analysis error.

Apply this diff to fix the type annotation:

def get_qa_checks(
    *,
-    dag: Optional["DAG"] = None,
+    dag: Optional["airflow.DAG"] = None,
    additional_qa_check_paths: Optional[List[str]] = None,
    stage: Optional[ETLStage] = ETLStage.FINAL,
) -> List["QACheckOperator"]:

Alternatively, add the import at the top of the file:

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from airflow import DAG

369-379: Use enum values consistently instead of string literals.

The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.

Apply this diff to use enum values consistently:

-                        stage="staging",
+                        stage=ETLStage.STAGING,

388-398: Use enum values consistently instead of string literals.

The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.

Apply this diff to use enum values consistently:

-                        stage="staging",
+                        stage=ETLStage.STAGING,

431-436: Use enum value for extract stage.

Consistent with the staging stage issue, use the enum value instead of a string literal.

Apply this diff:

-                    additional_qa_check_paths=additional_qa_check_paths, stage="extract"
+                    additional_qa_check_paths=additional_qa_check_paths, stage=ETLStage.EXTRACT
🧹 Nitpick comments (1)
CHANGELOG.md (1)

10-22: Fix markdown list indentation for consistency.

The changelog content is comprehensive and well-structured, but the list indentation is inconsistent with the expected markdown formatting.

Apply this diff to fix the indentation:

- Added new cell info qa checks to flowetl:
-    - Missing latitude
-    - Missing longitude
-    - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868)
-    - Number of new cell ids
-    - Number of duplicate ids
+- Added new cell info qa checks to flowetl:
+  - Missing latitude
+  - Missing longitude
+  - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868)
+  - Number of new cell ids
+  - Number of duplicate ids
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4788654 and 490afbc.

📒 Files selected for processing (75)
  • CHANGELOG.md (1 hunks)
  • flowdb/testdata/bin/run_qa_checks.py (1 hunks)
  • flowdb_synthetic_data.Dockerfile (1 hunks)
  • flowdb_synthetic_data.Dockerfile.dockerignore (1 hunks)
  • flowdb_testdata.Dockerfile (1 hunks)
  • flowdb_testdata.Dockerfile.dockerignore (1 hunks)
  • flowetl/flowetl/MANIFEST.in (1 hunks)
  • flowetl/flowetl/flowetl/operators/qa_check_operator.py (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql (1 hunks)
  • flowetl/flowetl/flowetl/util.py (8 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (9 hunks)
💤 Files with no reviewable changes (2)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (2)
  • flowdb_testdata.Dockerfile
  • flowdb_synthetic_data.Dockerfile
🚧 Files skipped from review as they are similar to previous changes (68)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
  • flowetl/flowetl/MANIFEST.in
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
  • flowdb_synthetic_data.Dockerfile.dockerignore
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
  • flowdb_testdata.Dockerfile.dockerignore
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
  • flowetl/flowetl/flowetl/operators/qa_check_operator.py
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

38-38: Undefined name DAG

(F821)

🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md

11-11: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


12-12: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (5)
flowetl/flowetl/flowetl/util.py (4)

26-34: LGTM: Well-defined enum for ETL stages.

The ETLStage enum provides clear type safety for the different ETL stages and follows good practices by inheriting from both str and Enum.


40-40: LGTM: Good addition of stage parameter.

The addition of the stage parameter with a sensible default enhances the flexibility of the QA check system.


73-89: LGTM: Stage-aware path construction.

The logic correctly constructs paths based on the ETL stage, enabling stage-specific QA checks. The use of ETLStage(stage) ensures type validation.


186-187: LGTM: Additional QA check path parameters.

The new parameters additional_staging_qa_check_paths and additional_extract_qa_check_paths provide good flexibility for stage-specific QA checks.

flowdb/testdata/bin/run_qa_checks.py (1)

110-110: LGTM: Simplified display name construction.

The change to always use the filename stem for the display name is consistent with the stage-aware QA check reorganisation. This simplification improves clarity whilst maintaining unique identification across the new directory structure.

@greenape greenape force-pushed the add-garbage-cell-info-check branch from 490afbc to 0ab5ddd Compare June 23, 2025 13:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)

38-38: Fix undefined type annotation.

The DAG type is used in the type hint but not imported, causing a static analysis error.

Apply this diff to fix the type annotation:

def get_qa_checks(
    *,
-    dag: Optional["DAG"] = None,
+    dag: Optional["airflow.DAG"] = None,
    additional_qa_check_paths: Optional[List[str]] = None,
    stage: Optional[ETLStage] = ETLStage.FINAL,
) -> List["QACheckOperator"]:

Alternatively, add the import at the top of the file:

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from airflow import DAG

369-379: Use enum values consistently instead of string literals.

The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.

Apply this diff to use enum values consistently:

-                        stage="staging",
+                        stage=ETLStage.STAGING,

388-398: Use enum values consistently instead of string literals.

The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.

Apply this diff to use enum values consistently:

-                        stage="staging",
+                        stage=ETLStage.STAGING,

430-437: Use enum value for extract stage and approve the gather operator.

The gather operator provides a clean synchronisation point after extract-stage QA checks. However, maintain consistency by using enum values instead of string literals.

Apply this diff:

-                    additional_qa_check_paths=additional_qa_check_paths, stage="extract"
+                    additional_qa_check_paths=additional_qa_check_paths, stage=ETLStage.EXTRACT
🧹 Nitpick comments (1)
CHANGELOG.md (1)

10-21: Fix list indentation for consistency.

The changelog content accurately documents the new QA check features, but the list indentation is inconsistent according to Markdown standards.

Apply this diff to fix the indentation:

 ### Added
-    - Missing latitude
-    - Missing longitude
-    - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868)
-    - Number of new cell ids
-    - Number of duplicate ids
+  - Missing latitude
+  - Missing longitude
+  - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868)
+  - Number of new cell ids
+  - Number of duplicate ids
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 490afbc and 0ab5ddd.

📒 Files selected for processing (75)
  • CHANGELOG.md (1 hunks)
  • flowdb/testdata/bin/run_qa_checks.py (1 hunks)
  • flowdb_synthetic_data.Dockerfile (1 hunks)
  • flowdb_synthetic_data.Dockerfile.dockerignore (1 hunks)
  • flowdb_testdata.Dockerfile (1 hunks)
  • flowdb_testdata.Dockerfile.dockerignore (1 hunks)
  • flowetl/flowetl/MANIFEST.in (1 hunks)
  • flowetl/flowetl/flowetl/operators/qa_check_operator.py (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql (1 hunks)
  • flowetl/flowetl/flowetl/util.py (8 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (9 hunks)
💤 Files with no reviewable changes (2)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (2)
  • flowdb_testdata.Dockerfile
  • flowdb_synthetic_data.Dockerfile
🚧 Files skipped from review as they are similar to previous changes (69)
  • flowetl/flowetl/MANIFEST.in
  • flowdb_synthetic_data.Dockerfile.dockerignore
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
  • flowdb_testdata.Dockerfile.dockerignore
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
  • flowetl/flowetl/flowetl/operators/qa_check_operator.py
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
  • flowdb/testdata/bin/run_qa_checks.py
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

38-38: Undefined name DAG

(F821)

🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md

11-11: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


12-12: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

🔇 Additional comments (4)
flowetl/flowetl/flowetl/util.py (4)

26-34: LGTM! Well-structured enum definition.

The ETLStage enum is properly defined with clear string values representing the different ETL stages.


40-40: Good enhancement for stage-specific QA checks.

The addition of the stage parameter with proper default value and documentation enables stage-specific QA check filtering, which aligns well with the new ETL stage architecture.

Also applies to: 56-59


73-89: Approve the path construction logic.

The modification to construct stage-specific paths by appending the stage value to the default and additional QA check paths is correctly implemented and maintains backward compatibility.


186-187: Well-documented new parameters for stage-specific QA checks.

The addition of additional_staging_qa_check_paths and additional_extract_qa_check_paths parameters enhances the flexibility of the DAG creation function and is properly documented.

Also applies to: 261-262

@greenape greenape force-pushed the add-garbage-cell-info-check branch 4 times, most recently from ce43417 to 4d2b9cb Compare June 23, 2025 14:59
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)

38-38: Fix undefined type annotation.

The DAG type is used in the type hint but not imported, causing a static analysis error.

Apply this diff to fix the type annotation:

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from airflow import DAG

Alternatively, use the fully qualified name:

-    dag: Optional["DAG"] = None,
+    dag: Optional["airflow.DAG"] = None,

369-379: Use enum values consistently instead of string literals.

The staging QA checks integration is well-structured, but the stage parameter uses a string literal rather than the defined enum value, reducing type safety.

Apply this diff to use enum values consistently:

-                        stage="staging",
+                        stage=ETLStage.STAGING,

388-398: Use enum values consistently instead of string literals.

Similar to the file flux sensor integration, this section should use the enum value rather than a string literal for type safety.

Apply this diff:

-                        stage="staging",
+                        stage=ETLStage.STAGING,

429-436: Good task flow design but use enum value for extract stage.

The task flow with the EmptyOperator for gathering is well-designed, but the stage parameter should use the enum value for consistency.

Apply this diff:

-                    additional_qa_check_paths=additional_qa_check_paths, stage="extract"
+                    additional_qa_check_paths=additional_qa_check_paths, stage=ETLStage.EXTRACT
🧹 Nitpick comments (3)
flowetl/flowetl/flowetl/operators/qa_check_operator.py (1)

10-10: Fix line length violation whilst maintaining correct logic.

The logic for extracting the task ID prefix is correct and aligns with the new task ID format that includes CDR type and stage suffixes. However, the line exceeds the maximum length limit.

Consider breaking the line to improve readability:

-        (date '{{{{ ds }}}}', '{{{{ params.cdr_type }}}}', '{{{{ task.task_id.split(".")[0] }}}}', ({sql}), NOW())
+        (date '{{{{ ds }}}}', '{{{{ params.cdr_type }}}}', 
+         '{{{{ task.task_id.split(".")[0] }}}}', ({sql}), NOW())
CHANGELOG.md (1)

10-16: Fix list indentation for consistency.

The changelog entries comprehensively document the new QA checks and related changes. However, the nested list items should use 2-space indentation rather than 4-space indentation for consistency with markdown best practices.

Apply this diff to fix the indentation:

-- Added new cell info qa checks to flowetl:
-    - Missing latitude
-    - Missing longitude
-    - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868)
-    - Number of new cell ids
-    - Number of duplicate ids
+- Added new cell info qa checks to flowetl:
+  - Missing latitude
+  - Missing longitude
+  - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868)
+  - Number of new cell ids
+  - Number of duplicate ids
flowetl/flowetl/flowetl/util.py (1)

57-58: Fix typo in docstring.

There's a typo in the docstring - "ETLSTage" should be "ETLStage".

Apply this diff:

-    stage : ETLStage, default ETLSTage.FINAL
+    stage : ETLStage, default ETLStage.FINAL
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ab5ddd and 697d40a.

📒 Files selected for processing (77)
  • CHANGELOG.md (1 hunks)
  • flowdb/testdata/bin/run_qa_checks.py (1 hunks)
  • flowdb_synthetic_data.Dockerfile (1 hunks)
  • flowdb_synthetic_data.Dockerfile.dockerignore (1 hunks)
  • flowdb_testdata.Dockerfile (1 hunks)
  • flowdb_testdata.Dockerfile.dockerignore (1 hunks)
  • flowetl/flowetl/MANIFEST.in (1 hunks)
  • flowetl/flowetl/flowetl/operators/qa_check_operator.py (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql (1 hunks)
  • flowetl/flowetl/flowetl/util.py (8 hunks)
  • flowetl/tests/integration/conftest.py (1 hunks)
  • flowetl/tests/integration/test_qa.py (29 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (9 hunks)
💤 Files with no reviewable changes (2)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (2)
  • flowdb_testdata.Dockerfile
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
🚧 Files skipped from review as they are similar to previous changes (68)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
  • flowetl/flowetl/MANIFEST.in
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
  • flowdb_testdata.Dockerfile.dockerignore
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
  • flowdb_synthetic_data.Dockerfile
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
  • flowdb/testdata/bin/run_qa_checks.py
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
  • flowdb_synthetic_data.Dockerfile.dockerignore
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py

[error] 38-38: undefined name 'DAG'

(F821)


[error] 58-58: line too long (93 > 79 characters)

(E501)


[error] 71-71: line too long (80 > 79 characters)

(E501)


[error] 261-261: line too long (113 > 79 characters)

(E501)


[error] 433-433: line too long (88 > 79 characters)

(E501)

flowetl/tests/integration/test_qa.py

[error] 10-10: line too long (103 > 79 characters)

(E501)


[error] 11-11: line too long (97 > 79 characters)

(E501)


[error] 26-26: line too long (103 > 79 characters)

(E501)


[error] 27-27: line too long (97 > 79 characters)

(E501)


[error] 43-43: line too long (103 > 79 characters)

(E501)


[error] 44-44: line too long (97 > 79 characters)

(E501)


[error] 62-62: line too long (103 > 79 characters)

(E501)


[error] 63-63: line too long (103 > 79 characters)

(E501)


[error] 64-64: line too long (87 > 79 characters)

(E501)


[error] 65-65: line too long (89 > 79 characters)

(E501)


[error] 79-79: line too long (103 > 79 characters)

(E501)


[error] 80-80: line too long (103 > 79 characters)

(E501)


[error] 81-81: line too long (87 > 79 characters)

(E501)


[error] 83-83: line too long (81 > 79 characters)

(E501)


[error] 97-97: line too long (103 > 79 characters)

(E501)


[error] 98-98: line too long (103 > 79 characters)

(E501)


[error] 99-99: line too long (87 > 79 characters)

(E501)


[error] 100-100: line too long (89 > 79 characters)

(E501)


[error] 114-114: line too long (103 > 79 characters)

(E501)


[error] 115-115: line too long (103 > 79 characters)

(E501)


[error] 116-116: line too long (87 > 79 characters)

(E501)


[error] 118-118: line too long (81 > 79 characters)

(E501)


[error] 132-132: line too long (103 > 79 characters)

(E501)


[error] 133-133: line too long (97 > 79 characters)

(E501)


[error] 138-138: line too long (81 > 79 characters)

(E501)


[error] 148-148: line too long (83 > 79 characters)

(E501)


[error] 149-149: line too long (103 > 79 characters)

(E501)


[error] 150-150: line too long (97 > 79 characters)

(E501)


[error] 156-156: line too long (86 > 79 characters)

(E501)


[error] 166-166: line too long (88 > 79 characters)

(E501)


[error] 184-184: line too long (91 > 79 characters)

(E501)


[error] 194-194: line too long (82 > 79 characters)

(E501)


[error] 212-212: line too long (85 > 79 characters)

(E501)


[error] 223-223: line too long (103 > 79 characters)

(E501)


[error] 224-224: line too long (97 > 79 characters)

(E501)


[error] 238-238: line too long (95 > 79 characters)

(E501)


[error] 239-239: line too long (103 > 79 characters)

(E501)


[error] 240-240: line too long (117 > 79 characters)

(E501)


[error] 241-241: line too long (91 > 79 characters)

(E501)


[error] 242-242: line too long (93 > 79 characters)

(E501)


[error] 255-255: line too long (103 > 79 characters)

(E501)


[error] 256-256: line too long (97 > 79 characters)

(E501)


[error] 261-261: line too long (81 > 79 characters)

(E501)


[error] 271-271: line too long (103 > 79 characters)

(E501)


[error] 272-272: line too long (97 > 79 characters)

(E501)


[error] 286-286: line too long (80 > 79 characters)

(E501)


[error] 287-287: line too long (103 > 79 characters)

(E501)


[error] 288-288: line too long (103 > 79 characters)

(E501)


[error] 289-289: line too long (87 > 79 characters)

(E501)


[error] 290-290: line too long (87 > 79 characters)

(E501)


[error] 291-291: line too long (89 > 79 characters)

(E501)


[error] 294-294: line too long (83 > 79 characters)

(E501)


[error] 304-304: line too long (80 > 79 characters)

(E501)


[error] 305-305: line too long (103 > 79 characters)

(E501)


[error] 306-306: line too long (103 > 79 characters)

(E501)


[error] 307-307: line too long (87 > 79 characters)

(E501)


[error] 308-308: line too long (87 > 79 characters)

(E501)


[error] 309-309: line too long (89 > 79 characters)

(E501)


[error] 312-312: line too long (83 > 79 characters)

(E501)


[error] 322-322: line too long (85 > 79 characters)

(E501)


[error] 323-323: line too long (103 > 79 characters)

(E501)


[error] 324-324: line too long (107 > 79 characters)

(E501)


[error] 327-327: line too long (82 > 79 characters)

(E501)


[error] 340-340: line too long (83 > 79 characters)

(E501)


[error] 341-341: line too long (103 > 79 characters)

(E501)


[error] 342-342: line too long (117 > 79 characters)

(E501)


[error] 343-343: line too long (87 > 79 characters)

(E501)


[error] 345-345: line too long (81 > 79 characters)

(E501)


[error] 359-359: line too long (103 > 79 characters)

(E501)


[error] 360-360: line too long (117 > 79 characters)

(E501)


[error] 361-361: line too long (91 > 79 characters)

(E501)


[error] 362-362: line too long (93 > 79 characters)

(E501)


[error] 365-365: line too long (94 > 79 characters)

(E501)


[error] 374-374: line too long (88 > 79 characters)

(E501)


[error] 395-395: line too long (88 > 79 characters)

(E501)


[error] 417-417: line too long (103 > 79 characters)

(E501)


[error] 418-418: line too long (107 > 79 characters)

(E501)


[error] 419-419: line too long (81 > 79 characters)

(E501)


[error] 420-420: line too long (81 > 79 characters)

(E501)


[error] 421-421: line too long (83 > 79 characters)

(E501)


[error] 424-424: line too long (87 > 79 characters)

(E501)


[error] 434-434: line too long (103 > 79 characters)

(E501)


[error] 435-435: line too long (107 > 79 characters)

(E501)


[error] 436-436: line too long (81 > 79 characters)

(E501)


[error] 437-437: line too long (81 > 79 characters)

(E501)


[error] 438-438: line too long (83 > 79 characters)

(E501)


[error] 441-441: line too long (90 > 79 characters)

(E501)


[error] 450-450: line too long (80 > 79 characters)

(E501)


[error] 451-451: line too long (103 > 79 characters)

(E501)


[error] 452-452: line too long (107 > 79 characters)

(E501)


[error] 455-455: line too long (81 > 79 characters)

(E501)


[error] 458-458: line too long (95 > 79 characters)

(E501)

flowetl/flowetl/flowetl/operators/qa_check_operator.py

[error] 10-10: line too long (114 > 79 characters)

(E501)

🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

38-38: Undefined name DAG

(F821)

🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md

11-11: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


12-12: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: CodeQL-Build
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (11)
flowetl/tests/integration/conftest.py (1)

543-549: ```shell
#!/bin/bash

Re-run search excluding the new fixture name without PCRE look-arounds

echo "Searching for any remaining references to the old fixture name 'jinja_env' (excluding 'jinja_env_final_qa'):"
rg jinja_env --type py | grep -v jinja_env_final_qa

If PCRE2 support is available, a more precise search:

echo "Searching with PCRE2 look-ahead for any 'jinja_env' not followed by '_final_qa':"
rg --pcre2 'jinja_env(?!_final_qa)' --type py


</details>
<details>
<summary>CHANGELOG.md (1)</summary>

`20-21`: **Clear documentation of breaking changes.**

The entries properly document the breaking changes to task IDs and the `type_of_query_or_check` field, which will help users understand the impact of these modifications on existing systems.

</details>
<details>
<summary>flowetl/flowetl/flowetl/util.py (8)</summary>

`26-34`: **Well-designed enum for ETL stages.**

The `ETLStage` enum implementation is excellent - inheriting from both `str` and `Enum` provides good type safety whilst allowing string interoperability. The three stages logically represent the ETL pipeline flow.

---

`40-40`: **Good addition of stage parameter with sensible default.**

The stage parameter enhances the function's flexibility whilst maintaining backwards compatibility by defaulting to `ETLStage.FINAL`.

---

`73-73`: **Excellent defensive programming with stage validation.**

Using `ETLStage(stage)` ensures type safety and provides clear error messages for invalid stage values.

---

`75-77`: **Clean stage-based path organisation.**

Using `stage.value` in the path construction effectively organises QA checks into stage-specific subdirectories, improving maintainability.

---

`79-86`: **Good refactoring for improved readability.**

The restructured template search path construction using generator expressions and unpacking improves code readability whilst maintaining the same functionality.

---

`186-187`: **Logical extension of QA check path configuration.**

The new stage-specific parameters follow the established pattern and provide necessary flexibility for configuring QA checks at different ETL stages.

---

`261-262`: **Clear and consistent parameter documentation.**

The documentation for the new stage-specific parameters follows the established style and clearly explains their purpose.

---

`272-272`: **Appropriate import for task synchronisation.**

The `EmptyOperator` import is correctly placed and necessary for the task gathering functionality in the DAG.

</details>
<details>
<summary>flowetl/tests/integration/test_qa.py (1)</summary>

`9-465`: **Excellent consistency in fixture updates for stage-based QA checks.**

All test functions have been consistently updated to use the `jinja_env_final_qa` fixture, which properly aligns with the new stage-based QA check architecture. The changes maintain test functionality whilst supporting the enhanced pipeline structure.

</details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@greenape greenape force-pushed the add-garbage-cell-info-check branch 3 times, most recently from 3acfcc8 to 4415971 Compare June 23, 2025 15:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
flowetl/flowetl/flowetl/util.py (3)

38-38: Fix undefined type annotation (duplicate issue).

The DAG type is still undefined in the type hint, as flagged in previous reviews.


375-375: Use enum values instead of string literals (duplicate issue).

These lines still use string literals "staging" instead of the enum value ETLStage.STAGING, as flagged in previous reviews.

Also applies to: 394-394


433-433: Use enum value for extract stage (duplicate issue).

This line still uses the string literal "extract" instead of ETLStage.EXTRACT, as flagged in previous reviews.

🧹 Nitpick comments (2)
flowetl/flowetl/flowetl/operators/qa_check_operator.py (1)

10-10: LGTM! Correct extraction of base check name from task ID.

The change to use task.task_id.split(".")[0] correctly extracts the base check name, which aligns with the new task ID format that includes CDR type and stage suffixes.

However, please address the line length issue:

-        (date '{{{{ ds }}}}', '{{{{ params.cdr_type }}}}', '{{{{ task.task_id.split(".")[0] }}}}', ({sql}), NOW())
+        (date '{{{{ ds }}}}', '{{{{ params.cdr_type }}}}', 
+         '{{{{ task.task_id.split(".")[0] }}}}', ({sql}), NOW())
CHANGELOG.md (1)

10-22: LGTM! Comprehensive documentation of QA check enhancements.

The changelog entries thoroughly document all the new cell info QA checks and related changes, including the new create_dag parameters and organisational structure changes.

Please fix the markdown list indentation to follow the expected 2-space format:

-- Added new cell info qa checks to flowetl:
--     - Missing latitude
--     - Missing longitude
--     - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868)
--     - Number of new cell ids
--     - Number of duplicate ids
+- Added new cell info qa checks to flowetl:
+  - Missing latitude
+  - Missing longitude
+  - Missing latitude _or_ longitude [#5868](https://github.com/Flowminder/FlowKit/issues/5868)
+  - Number of new cell ids
+  - Number of duplicate ids
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ab5ddd and 4415971.

📒 Files selected for processing (77)
  • CHANGELOG.md (1 hunks)
  • flowdb/testdata/bin/run_qa_checks.py (1 hunks)
  • flowdb_synthetic_data.Dockerfile (1 hunks)
  • flowdb_synthetic_data.Dockerfile.dockerignore (1 hunks)
  • flowdb_testdata.Dockerfile (1 hunks)
  • flowdb_testdata.Dockerfile.dockerignore (1 hunks)
  • flowetl/flowetl/MANIFEST.in (1 hunks)
  • flowetl/flowetl/flowetl/operators/qa_check_operator.py (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql (1 hunks)
  • flowetl/flowetl/flowetl/util.py (8 hunks)
  • flowetl/tests/integration/conftest.py (1 hunks)
  • flowetl/tests/integration/test_qa.py (18 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (9 hunks)
💤 Files with no reviewable changes (2)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (2)
  • flowdb_synthetic_data.Dockerfile
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
🚧 Files skipped from review as they are similar to previous changes (68)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
  • flowetl/flowetl/MANIFEST.in
  • flowdb/testdata/bin/run_qa_checks.py
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
  • flowdb_synthetic_data.Dockerfile.dockerignore
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
  • flowdb_testdata.Dockerfile
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
  • flowdb_testdata.Dockerfile.dockerignore
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Flake8 (7.2.0)
flowetl/tests/integration/test_qa.py

[error] 10-10: line too long (103 > 79 characters)

(E501)


[error] 11-11: line too long (97 > 79 characters)

(E501)


[error] 26-26: line too long (103 > 79 characters)

(E501)


[error] 27-27: line too long (97 > 79 characters)

(E501)


[error] 43-43: line too long (103 > 79 characters)

(E501)


[error] 44-44: line too long (97 > 79 characters)

(E501)


[error] 62-62: line too long (103 > 79 characters)

(E501)


[error] 63-63: line too long (103 > 79 characters)

(E501)


[error] 64-64: line too long (87 > 79 characters)

(E501)


[error] 65-65: line too long (89 > 79 characters)

(E501)


[error] 68-68: line too long (86 > 79 characters)

(E501)


[error] 79-79: line too long (103 > 79 characters)

(E501)


[error] 80-80: line too long (103 > 79 characters)

(E501)


[error] 81-81: line too long (87 > 79 characters)

(E501)


[error] 83-83: line too long (81 > 79 characters)

(E501)


[error] 97-97: line too long (103 > 79 characters)

(E501)


[error] 98-98: line too long (103 > 79 characters)

(E501)


[error] 99-99: line too long (87 > 79 characters)

(E501)


[error] 100-100: line too long (89 > 79 characters)

(E501)


[error] 103-103: line too long (86 > 79 characters)

(E501)


[error] 114-114: line too long (103 > 79 characters)

(E501)


[error] 115-115: line too long (103 > 79 characters)

(E501)


[error] 116-116: line too long (87 > 79 characters)

(E501)


[error] 118-118: line too long (81 > 79 characters)

(E501)


[error] 132-132: line too long (103 > 79 characters)

(E501)


[error] 133-133: line too long (97 > 79 characters)

(E501)


[error] 148-148: line too long (83 > 79 characters)

(E501)


[error] 149-149: line too long (103 > 79 characters)

(E501)


[error] 150-150: line too long (97 > 79 characters)

(E501)


[error] 166-166: line too long (88 > 79 characters)

(E501)


[error] 196-196: line too long (82 > 79 characters)

(E501)


[error] 227-227: line too long (103 > 79 characters)

(E501)


[error] 228-228: line too long (97 > 79 characters)

(E501)


[error] 233-233: line too long (88 > 79 characters)

(E501)


[error] 245-245: line too long (103 > 79 characters)

(E501)


[error] 246-246: line too long (117 > 79 characters)

(E501)


[error] 247-247: line too long (91 > 79 characters)

(E501)


[error] 248-248: line too long (93 > 79 characters)

(E501)


[error] 251-251: line too long (88 > 79 characters)

(E501)


[error] 261-261: line too long (103 > 79 characters)

(E501)


[error] 262-262: line too long (97 > 79 characters)

(E501)


[error] 277-277: line too long (103 > 79 characters)

(E501)


[error] 278-278: line too long (97 > 79 characters)

(E501)


[error] 292-292: line too long (80 > 79 characters)

(E501)


[error] 293-293: line too long (103 > 79 characters)

(E501)


[error] 294-294: line too long (103 > 79 characters)

(E501)


[error] 295-295: line too long (87 > 79 characters)

(E501)


[error] 296-296: line too long (87 > 79 characters)

(E501)


[error] 297-297: line too long (89 > 79 characters)

(E501)


[error] 310-310: line too long (80 > 79 characters)

(E501)


[error] 311-311: line too long (103 > 79 characters)

(E501)


[error] 312-312: line too long (103 > 79 characters)

(E501)


[error] 313-313: line too long (87 > 79 characters)

(E501)


[error] 314-314: line too long (87 > 79 characters)

(E501)


[error] 315-315: line too long (89 > 79 characters)

(E501)


[error] 328-328: line too long (85 > 79 characters)

(E501)


[error] 329-329: line too long (103 > 79 characters)

(E501)


[error] 330-330: line too long (107 > 79 characters)

(E501)


[error] 333-333: line too long (82 > 79 characters)

(E501)


[error] 346-346: line too long (83 > 79 characters)

(E501)


[error] 347-347: line too long (103 > 79 characters)

(E501)


[error] 348-348: line too long (117 > 79 characters)

(E501)


[error] 349-349: line too long (87 > 79 characters)

(E501)


[error] 351-351: line too long (81 > 79 characters)

(E501)


[error] 365-365: line too long (103 > 79 characters)

(E501)


[error] 366-366: line too long (117 > 79 characters)

(E501)


[error] 367-367: line too long (91 > 79 characters)

(E501)


[error] 368-368: line too long (93 > 79 characters)

(E501)


[error] 380-380: line too long (88 > 79 characters)

(E501)


[error] 401-401: line too long (88 > 79 characters)

(E501)


[error] 423-423: line too long (103 > 79 characters)

(E501)


[error] 424-424: line too long (107 > 79 characters)

(E501)


[error] 425-425: line too long (81 > 79 characters)

(E501)


[error] 426-426: line too long (81 > 79 characters)

(E501)


[error] 427-427: line too long (83 > 79 characters)

(E501)


[error] 430-430: line too long (87 > 79 characters)

(E501)


[error] 440-440: line too long (103 > 79 characters)

(E501)


[error] 441-441: line too long (107 > 79 characters)

(E501)


[error] 442-442: line too long (81 > 79 characters)

(E501)


[error] 443-443: line too long (81 > 79 characters)

(E501)


[error] 444-444: line too long (83 > 79 characters)

(E501)


[error] 456-456: line too long (80 > 79 characters)

(E501)


[error] 457-457: line too long (103 > 79 characters)

(E501)


[error] 458-458: line too long (107 > 79 characters)

(E501)


[error] 461-461: line too long (81 > 79 characters)

(E501)

flowetl/flowetl/flowetl/util.py

[error] 38-38: undefined name 'DAG'

(F821)


[error] 58-58: line too long (93 > 79 characters)

(E501)


[error] 71-71: line too long (80 > 79 characters)

(E501)


[error] 261-261: line too long (113 > 79 characters)

(E501)


[error] 433-433: line too long (88 > 79 characters)

(E501)

flowetl/flowetl/flowetl/operators/qa_check_operator.py

[error] 10-10: line too long (114 > 79 characters)

(E501)

🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

38-38: Undefined name DAG

(F821)

🪛 LanguageTool
CHANGELOG.md

[style] ~22-~22: Would you like to use the Oxford spelling “organized”? The spelling ‘organised’ is also correct.
Context: ...rovided alongside dag files must now be organised into the correct subfolder for the ETL ...

(OXFORD_SPELLING_Z_NOT_S)

🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md

11-11: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


12-12: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (7)
flowetl/tests/integration/conftest.py (1)

543-549: LGTM! Fixture properly updated for new QA check organisation.

The renaming from jinja_env to jinja_env_final_qa and the path change to the "final" subdirectory correctly aligns with the new stage-specific QA check structure described in the PR objectives.

flowetl/flowetl/flowetl/util.py (4)

26-34: Well-implemented ETL stage enum.

The ETLStage enum provides a clean way to represent the different ETL stages with proper typing support.


40-40: Good implementation of stage-aware QA check discovery.

The addition of the stage parameter with proper enum conversion and path construction correctly supports the stage-specific QA check structure.

Also applies to: 73-89


186-187: Clean parameter additions for stage-specific QA checks.

The new parameters additional_staging_qa_check_paths and additional_extract_qa_check_paths are well-integrated with appropriate documentation.

Also applies to: 261-262


369-379: Excellent integration of stage-specific QA checks into DAG flow.

The implementation correctly integrates QA checks at appropriate points in the ETL pipeline:

  • Staging QA checks after flux sensors
  • Extract QA checks after extraction/clustering
  • Proper use of EmptyOperator for task gathering

This provides comprehensive quality assurance throughout the ETL process.

Also applies to: 388-398, 429-437

flowetl/tests/integration/test_qa.py (2)

9-9: Consistent fixture renaming aligns with stage-based structure.

All test functions have been systematically updated to use jinja_env_final_qa instead of jinja_env, correctly reflecting the new stage-specific QA check organisation.

Also applies to: 25-25, 42-42, 61-61, 78-78, 96-96, 113-113, 131-131, 148-148, 166-166, 196-196, 226-226, 242-244, 260-260, 276-276, 292-292, 310-310, 328-328, 346-346, 364-364, 380-380, 401-401, 422-422, 439-439, 456-456


16-18: Template paths correctly updated for new directory structure.

All template path references have been systematically updated to include the CDR type subdirectory (e.g., f"{cdr_type}/count_added_rows.sql"), properly aligning with the new stage-based QA check organisation introduced in the broader PR.

Also applies to: 30-32, 49-51, 68-70, 86-88, 103-105, 121-123, 138-140, 156-158, 184-188, 214-218, 233-235, 251-253, 267-269, 283-285, 300-302, 318-320, 336-338, 354-356, 371-373, 392-394, 413-415, 430-432, 447-449, 464-466

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)

38-38: Fix undefined type annotation.

The DAG type is used in the type hint but not imported, causing a static analysis error.

Apply this diff to fix the type annotation:

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from airflow import DAG

Or use a string literal:

-    dag: Optional["DAG"] = None,
+    dag: Optional["airflow.DAG"] = None,

370-380: Use enum values consistently instead of string literals.

The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.

Apply this diff to use enum values consistently:

-                        stage="staging",
+                        stage=ETLStage.STAGING,

387-397: Use enum values consistently instead of string literals.

Same issue as the previous comment - use enum values instead of string literals for type safety.

Apply this diff:

-                        stage="staging",
+                        stage=ETLStage.STAGING,

432-437: Use enum value for extract stage and approve the gather operator addition.

The extract stage should use the enum value instead of a string literal. However, the addition of the "gather" EmptyOperator is a good design pattern for coordinating parallel tasks.

Apply this diff:

-                    additional_qa_check_paths=additional_qa_check_paths, stage="extract"
+                    additional_qa_check_paths=additional_qa_check_paths, stage=ETLStage.EXTRACT
🧹 Nitpick comments (1)
flowetl/flowetl/flowetl/util.py (1)

97-97: Remove debug print statement.

This print statement appears to be leftover debug code and should be removed before merging.

-    print(templates)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d60bab3 and 654d99f.

📒 Files selected for processing (2)
  • flowetl/flowetl/flowetl/util.py (9 hunks)
  • flowetl/tests/integration/test_full_pipeline.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

38-38: Undefined name DAG

(F821)

🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py

[error] 38-38: undefined name 'DAG'

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (3)
flowetl/tests/integration/test_full_pipeline.py (1)

109-109: LGTM! Test update aligns with implementation changes.

The addition of "gather" to the expected QA check types correctly reflects the new EmptyOperator introduced in the DAG construction logic.

flowetl/flowetl/flowetl/util.py (2)

26-34: Well-structured enum addition.

The ETLStage enum provides clear type safety for representing different ETL stages and follows good naming conventions.


187-189: Well-structured parameter additions.

The addition of stage-specific QA check path parameters provides good flexibility for customising QA checks per ETL stage whilst maintaining backwards compatibility.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)

38-38: Fix undefined type annotation.

The DAG type is used in the type hint but not imported, causing a static analysis error.

Apply this diff to fix the type annotation:

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from airflow import DAG

Or alternatively:

-    dag: Optional["DAG"] = None,
+    dag: Optional["airflow.DAG"] = None,

372-382: Use enum values consistently instead of string literals.

The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.

Apply this diff to use enum values consistently:

-                        stage="staging",
+                        stage=ETLStage.STAGING,

391-401: Use enum values consistently instead of string literals.

Similar to the previous comment, use the enum value instead of a string literal.

Apply this diff:

-                        stage="staging",
+                        stage=ETLStage.STAGING,

432-448: Use enum value for extract stage and fix parameter usage.

The implementation correctly adds extract-stage QA checks and includes a "gather" operator, but should use enum values consistently and utilise the dedicated extract QA check paths parameter.

Apply this diff:

-                    additional_qa_check_paths=additional_qa_check_paths, stage="extract"
+                    additional_qa_check_paths=additional_extract_qa_check_paths, stage=ETLStage.EXTRACT
🧹 Nitpick comments (1)
flowetl/flowetl/flowetl/util.py (1)

78-79: Remove debug print statements.

These debug print statements should be removed before merging to production.

Apply this diff to remove the debug statements:

-    print(default_path)
-    print(list(default_path.walk()))

And:

-    print(templates)

Also applies to: 99-99

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 654d99f and 31c6d13.

📒 Files selected for processing (2)
  • flowetl/flowetl/flowetl/util.py (9 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

38-38: Undefined name DAG

(F821)

🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py

[error] 38-38: undefined name 'DAG'

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (5)
flowetl/flowetl/flowetl/util.py (5)

26-34: Excellent addition of ETL stage enumeration.

The ETLStage enum is well-designed and provides clear stage categorisation for the QA check system. The inheritance from both str and Enum allows for easy string comparisons whilst maintaining type safety.


40-40: Good addition of stage parameter with appropriate default.

The stage parameter with ETLStage.FINAL as default maintains backward compatibility whilst enabling stage-specific QA checks.


73-91: Excellent implementation of stage-aware path construction.

The logic correctly converts the stage parameter to an enum value and constructs appropriate paths for both default and additional QA check locations. The path construction ensures stage-specific QA checks are properly discovered.


189-190: Appropriate addition of stage-specific QA check path parameters.

The new parameters additional_staging_qa_check_paths and additional_extract_qa_check_paths provide flexibility for specifying additional QA checks for each stage.


275-275: Good addition of EmptyOperator for task coordination.

The import and usage of EmptyOperator as a "gather" task provides proper coordination between extract-stage QA checks and subsequent constraint/index operations.

Also applies to: 439-439

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
flowetl/flowetl/flowetl/util.py (1)

432-448: Good workflow integration with staging separation, but use enum values.

The addition of extract-stage QA checks with the "gather" EmptyOperator provides clear separation between different stages of the workflow. However, the string literal should be replaced with the enum value.

Apply this diff to use the enum value:

-                    additional_qa_check_paths=additional_qa_check_paths, stage="extract"
+                    additional_qa_check_paths=additional_extract_qa_check_paths, stage=ETLStage.EXTRACT

Note: I've also corrected the parameter name to use additional_extract_qa_check_paths instead of additional_qa_check_paths to match the function signature.

♻️ Duplicate comments (3)
flowetl/flowetl/flowetl/util.py (3)

38-38: Fix undefined type annotation.

The DAG type is used in the type hint but not imported, causing a static analysis error.

Apply this diff to fix the type annotation:

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from airflow import DAG

Add this import block near the top of the file after the existing imports.


372-382: Use enum values consistently instead of string literals.

The stage parameter is passed as a string literal rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.

Apply this diff to use enum values consistently:

-                        stage="staging",
+                        stage=ETLStage.STAGING,

391-401: Use enum values consistently instead of string literals.

Same issue as the previous segment - use enum values for type safety.

Apply this diff:

-                        stage="staging",
+                        stage=ETLStage.STAGING,
🧹 Nitpick comments (1)
flowetl/flowetl/flowetl/util.py (1)

78-79: Remove debug print statements.

Debug print statements should be removed before merging to production.

Apply this diff to remove the debug statements:

-    print(default_path)
-    print(list(default_path.glob("*.sql")))
-    print(templates)

Also applies to: 99-99

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31c6d13 and c7af5f6.

📒 Files selected for processing (1)
  • flowetl/flowetl/flowetl/util.py (9 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

38-38: Undefined name DAG

(F821)

🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py

[error] 38-38: undefined name 'DAG'

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (7)
flowetl/flowetl/flowetl/util.py (7)

26-34: LGTM: Well-structured enum definition.

The ETLStage enum provides clear typing for the different stages of the ETL process and follows Python enum best practices by inheriting from both str and Enum.


40-40: Good addition of stage parameter with sensible default.

The stage parameter addition allows for stage-specific QA checks whilst maintaining backwards compatibility with the default FINAL stage.


56-59: Clear documentation of the new stage parameter.

The docstring update properly documents the stage parameter's purpose and default behaviour.


73-91: Solid implementation of stage-specific path resolution.

The logic correctly converts the stage to a string value and appends it to both default and additional search paths, enabling proper stage-specific QA check discovery.


189-190: Well-designed parameter additions for stage-specific paths.

The new parameters follow the existing naming convention and provide flexibility for different ETL stages.


264-265: Clear documentation for the new parameters.

The consolidated documentation for all three QA check path parameters improves readability.


275-275: Appropriate import addition for EmptyOperator.

The EmptyOperator import is correctly added to support the new "gather" task in the DAG workflow.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)

38-38: Fix undefined type annotation.

The DAG type is used in the type hint but not imported, causing a static analysis error. This issue was previously flagged but remains unresolved.


391-401: Use enum values consistently instead of string literals.

The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies. This issue was previously flagged but remains unresolved.


410-420: Use enum values consistently instead of string literals.

The stage parameter is passed as string literals rather than using the defined enum values. This is the same issue identified in the FILE flux sensor logic above.


455-455: Use enum value for extract stage.

Consistent with the staging stage issue, use the enum value instead of a string literal. This was previously flagged but remains unresolved.

🧹 Nitpick comments (2)
flowetl/flowetl/flowetl/util.py (2)

111-114: Remove debug print statements.

The debug print statements should be removed from production code or replaced with proper logging.

-    for pth in set(search_paths):
-        print(f"Search path {pth}")
-        sql_files = pth.glob("*.sql")
-        print(list(pth.glob("*.sql")))
-        templates = [*templates, *sql_files]
+    for pth in set(search_paths):
+        sql_files = pth.glob("*.sql")
+        templates = [*templates, *sql_files]

121-127: Remove debug print statements and improve task ID construction.

The debug print statements should be removed, but the task ID construction logic is well-designed to include stage and CDR type information.

-        print(f"Parsing template {tmpl}")
         task_id = (
             f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}"
             if "cdr_type" in dag.params
             else f"{tmpl.stem}.{stage}"
         )
-        print(f"Task id is {task_id}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c7af5f6 and b5f90c7.

📒 Files selected for processing (2)
  • flowetl/flowetl/flowetl/util.py (8 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

38-38: Undefined name DAG

(F821)

🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py

[error] 38-38: undefined name 'DAG'

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (6)
flowetl/flowetl/flowetl/util.py (6)

26-34: Well-designed enum implementation.

The ETLStage enum properly inherits from both str and Enum, allowing seamless usage as string values whilst maintaining type safety. The stage values are clearly defined and documented.


40-40: Good backward-compatible parameter addition.

The new stage parameter with a sensible default maintains backward compatibility whilst enabling the new stage-specific functionality.


73-73: Defensive type conversion is well-implemented.

The explicit conversion to ETLStage ensures type safety even if a string value is passed, preventing potential issues downstream.


75-107: Comprehensive and well-structured search path logic.

The implementation systematically constructs stage-specific search paths from multiple sources whilst properly handling edge cases like null template search paths and CDR type-specific subdirectories.


208-209: Consistent and well-named parameter additions.

The new parameters follow the established naming convention and maintain backward compatibility with sensible default values.


458-458: Good use of coordination operator.

The EmptyOperator serves as an effective coordination point to gather multiple QA check tasks before proceeding to the next stage of the pipeline.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
flowetl/flowetl/flowetl/util.py (1)

37-76: Consider refactoring for improved readability.

The function correctly handles path disambiguation but has high complexity (16 local variables, 6 nested blocks). While the logic appears sound, consider breaking this into smaller helper functions to improve maintainability.

For example, separate the suffix generation logic from the uniqueness checking:

+def _generate_suffix_candidate(path_parts, depth):
+    """Generate a suffix candidate of given depth."""
+    suffix_parts = path_parts[:depth]
+    return "/".join(reversed(suffix_parts))
+
+def _is_suffix_unique(candidate_suffix, target_index, all_paths, path_parts_list, depth):
+    """Check if a suffix candidate is unique among all paths."""
+    for j, other_path in enumerate(all_paths):
+        if target_index != j:
+            other_parts = path_parts_list[j]
+            if len(other_parts) >= depth:
+                other_suffix = _generate_suffix_candidate(other_parts, depth)
+                if candidate_suffix == other_suffix:
+                    return False
+    return True
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 761333c and ae4a919.

📒 Files selected for processing (2)
  • flowetl/flowetl/flowetl/util.py (9 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

48-48: Loop control variable path not used within loop body

Rename unused path to _path

(B007)


58-58: Loop control variable other_path not used within loop body

Rename unused other_path to _other_path

(B007)


117-117: Undefined name DAG

(F821)

🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py

[error] 117-117: undefined name 'DAG'

(F821)

🪛 Pylint (3.3.7)
flowetl/flowetl/flowetl/util.py

[refactor] 37-37: Too many local variables (16/15)

(R0914)


[refactor] 48-73: Too many nested blocks (6/5)

(R1702)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (8)
flowetl/flowetl/flowetl/util.py (8)

4-4: Import addition looks good.

The defaultdict import is properly placed and used in the new disambiguation functionality.


27-35: Well-designed enum implementation.

The ETLStage enum is properly structured with clear naming and appropriate inheritance from both str and Enum for serialisation compatibility.


78-113: Clean path disambiguation implementation.

The function effectively groups paths by filename and applies disambiguation only where needed. The logic is clear and handles edge cases appropriately.


119-119: Well-designed parameter addition.

The stage parameter is properly typed with a sensible default that maintains backward compatibility.


286-287: Consistent parameter additions.

The new additional_staging_qa_check_paths and additional_extract_qa_check_paths parameters follow the established pattern and provide necessary stage-specific functionality.


372-372: Appropriate import addition.

The EmptyOperator import is correctly placed and needed for the new gather functionality in the DAG construction.


469-479: Well-structured QA check integration.

The staging QA checks are properly integrated into both flux sensor paths, maintaining consistency in the DAG structure. The pattern correctly places these checks after data validation but before extraction.

Also applies to: 488-498


531-536: Effective synchronisation pattern.

The extract QA checks integration with the gather operator provides a clean synchronisation point, ensuring all extract-stage checks complete before proceeding to constraint and index creation.

for pathgroup in sorted(disambiguated_paths.values()):
for tmpl in pathgroup:
tmpl = Path(tmpl)
print(f"Parsing template {tmpl}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove debug print statements.

These print statements appear to be debug code and should not be included in production code as they will clutter the logs.

Apply this diff to remove the debug statements:

-            print(f"Parsing template {tmpl}")
             task_id = (
                 f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}"
                 if "cdr_type" in dag.params
                 else f"{tmpl.stem}.{stage}"
             )
-            print(f"Task id is {task_id}")

Also applies to: 205-205

🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/util.py at lines 199 and 205, remove the debug print
statements that output parsing template information. These print statements are
for debugging purposes and should be deleted to avoid cluttering production
logs.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
flowetl/flowetl/flowetl/util.py (1)

188-213: Template processing logic is well-structured.

The disambiguation of templates and task ID generation incorporating stage and CDR type follows a logical pattern, though debug prints should be removed as noted in past reviews.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eff1e05 and 88cf54c.

📒 Files selected for processing (3)
  • flowetl/flowetl/flowetl/util.py (10 hunks)
  • flowetl/tests/integration/conftest.py (2 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • flowetl/tests/integration/conftest.py
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

48-48: Loop control variable path not used within loop body

Rename unused path to _path

(B007)


58-58: Loop control variable other_path not used within loop body

Rename unused other_path to _other_path

(B007)


117-117: Undefined name DAG

(F821)

🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py

[error] 117-117: undefined name 'DAG'

(F821)

🪛 Pylint (3.3.7)
flowetl/flowetl/flowetl/util.py

[refactor] 37-37: Too many local variables (16/15)

(R0914)


[refactor] 48-73: Too many nested blocks (6/5)

(R1702)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: CodeQL-Build
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (8)
flowetl/flowetl/flowetl/util.py (8)

27-35: Well-implemented ETL stage enum.

The ETLStage enum is properly defined with string values and clear documentation, providing type safety for stage operations.


78-112: Path disambiguation logic is sound.

The implementation correctly groups paths by filename and applies minimal distinguishing suffixes to resolve conflicts.


152-153: Good practice using enum validation.

Converting the stage parameter to ETLStage enum ensures type safety and validates the input value.


286-287: Additional QA check path parameters enhance flexibility.

The addition of stage-specific QA check path parameters provides good extensibility for different ETL stages.


460-484: Proper integration of staging QA checks with flux sensors.

The staging QA checks are correctly integrated into the DAG flow, executing after flux checks but before extraction, with appropriate gathering operators.


494-512: Consistent staging check integration across flux sensor types.

The staging QA check integration is properly implemented for both table flux sensors and no-flux scenarios, maintaining consistency.


535-571: Extract QA checks properly positioned in DAG workflow.

The extract QA checks are correctly placed after extraction/clustering and before constraints/indexes, following logical ETL progression.


154-186: QA Checks Path Logic Verified

  • Default QA checks directories for all stages (extract, staging, final) exist under flowetl/flowetl/flowetl/qa_checks/.
  • A DAG-specific override directory is present only for final under flowetl/mounts/dags/qa_checks/.
  • The code’s search order (defaults → DAG file location → template paths → CDR type → additional paths) will correctly locate checks and fall back to defaults when override directories are absent.

No changes required.

Comment on lines +37 to +76
def find_minimal_distinguishing_suffix(paths):
"""Find the minimal suffix needed to distinguish between paths."""
if len(paths) == 1:
return [(paths[0].name, 1)] # Return (suffix, depth) tuple

# Convert paths to lists of parts for easier manipulation (excluding root '/')
path_parts = [list(reversed(path.parts[1:])) for path in paths] # Skip the root '/'
min_parts = min(len(parts) for parts in path_parts)

distinguishing_suffixes = []

for i, path in enumerate(paths):
parts = path_parts[i]

# Start with just the filename
for depth in range(1, min_parts + 1):
suffix_parts = parts[:depth]
candidate_suffix = "/".join(reversed(suffix_parts))

# Check if this suffix is unique among all paths
is_unique = True
for j, other_path in enumerate(paths):
if i != j:
other_parts = path_parts[j]
if len(other_parts) >= depth:
other_suffix = "/".join(reversed(other_parts[:depth]))
if candidate_suffix == other_suffix:
is_unique = False
break

if is_unique:
distinguishing_suffixes.append((candidate_suffix, depth))
break
else:
# If we can't find a unique suffix, use relative path from root
full_suffix = "/".join(reversed(parts))
distinguishing_suffixes.append((full_suffix, len(parts)))

return distinguishing_suffixes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reduce complexity and fix unused loop variables.

The function has high complexity with nested loops and could be simplified. Additionally, static analysis correctly identifies unused loop variables.

Apply this diff to fix the unused variables:

-    for i, path in enumerate(paths):
+    for i, _path in enumerate(paths):
         parts = path_parts[i]
 
         # Start with just the filename
         for depth in range(1, min_parts + 1):
             suffix_parts = parts[:depth]
             candidate_suffix = "/".join(reversed(suffix_parts))
 
             # Check if this suffix is unique among all paths
             is_unique = True
-            for j, other_path in enumerate(paths):
+            for j, _other_path in enumerate(paths):
                 if i != j:

Consider refactoring this function to reduce nesting and improve readability, perhaps by extracting the uniqueness check into a separate helper function.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def find_minimal_distinguishing_suffix(paths):
"""Find the minimal suffix needed to distinguish between paths."""
if len(paths) == 1:
return [(paths[0].name, 1)] # Return (suffix, depth) tuple
# Convert paths to lists of parts for easier manipulation (excluding root '/')
path_parts = [list(reversed(path.parts[1:])) for path in paths] # Skip the root '/'
min_parts = min(len(parts) for parts in path_parts)
distinguishing_suffixes = []
for i, path in enumerate(paths):
parts = path_parts[i]
# Start with just the filename
for depth in range(1, min_parts + 1):
suffix_parts = parts[:depth]
candidate_suffix = "/".join(reversed(suffix_parts))
# Check if this suffix is unique among all paths
is_unique = True
for j, other_path in enumerate(paths):
if i != j:
other_parts = path_parts[j]
if len(other_parts) >= depth:
other_suffix = "/".join(reversed(other_parts[:depth]))
if candidate_suffix == other_suffix:
is_unique = False
break
if is_unique:
distinguishing_suffixes.append((candidate_suffix, depth))
break
else:
# If we can't find a unique suffix, use relative path from root
full_suffix = "/".join(reversed(parts))
distinguishing_suffixes.append((full_suffix, len(parts)))
return distinguishing_suffixes
def find_minimal_distinguishing_suffix(paths):
"""Find the minimal suffix needed to distinguish between paths."""
if len(paths) == 1:
return [(paths[0].name, 1)] # Return (suffix, depth) tuple
# Convert paths to lists of parts for easier manipulation (excluding root '/')
path_parts = [list(reversed(path.parts[1:])) for path in paths] # Skip the root '/'
min_parts = min(len(parts) for parts in path_parts)
distinguishing_suffixes = []
for i, _path in enumerate(paths):
parts = path_parts[i]
# Start with just the filename
for depth in range(1, min_parts + 1):
suffix_parts = parts[:depth]
candidate_suffix = "/".join(reversed(suffix_parts))
# Check if this suffix is unique among all paths
is_unique = True
for j, _other_path in enumerate(paths):
if i != j:
other_parts = path_parts[j]
if len(other_parts) >= depth:
other_suffix = "/".join(reversed(other_parts[:depth]))
if candidate_suffix == other_suffix:
is_unique = False
break
if is_unique:
distinguishing_suffixes.append((candidate_suffix, depth))
break
else:
# If we can't find a unique suffix, use relative path from root
full_suffix = "/".join(reversed(parts))
distinguishing_suffixes.append((full_suffix, len(parts)))
return distinguishing_suffixes
🧰 Tools
🪛 Ruff (0.11.9)

48-48: Loop control variable path not used within loop body

Rename unused path to _path

(B007)


58-58: Loop control variable other_path not used within loop body

Rename unused other_path to _other_path

(B007)

🪛 Pylint (3.3.7)

[refactor] 37-37: Too many local variables (16/15)

(R0914)


[refactor] 48-73: Too many nested blocks (6/5)

(R1702)

🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/util.py between lines 37 and 76, the function
find_minimal_distinguishing_suffix has nested loops causing high complexity and
contains unused loop variables. To fix this, extract the uniqueness check into a
separate helper function to reduce nesting and improve readability. Also, remove
or rename any unused loop variables to satisfy static analysis. This refactoring
will simplify the logic and make the code easier to maintain.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
flowetl/tests/integration/test_qa.py (1)

16-16: Remove unnecessary f-string prefix.

The string has no placeholders, so the f prefix is unnecessary.

-    check_sql = jinja_env_staging_qa.get_template(f"count_added_rows.sql").render(
+    check_sql = jinja_env_staging_qa.get_template("count_added_rows.sql").render(
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88cf54c and ca1ef34.

📒 Files selected for processing (4)
  • flowetl/tests/integration/conftest.py (1 hunks)
  • flowetl/tests/integration/test_dags_present.py (2 hunks)
  • flowetl/tests/integration/test_full_pipeline.py (1 hunks)
  • flowetl/tests/integration/test_qa.py (19 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • flowetl/tests/integration/test_full_pipeline.py
  • flowetl/tests/integration/conftest.py
  • flowetl/tests/integration/test_dags_present.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
flowetl/tests/integration/test_qa.py (2)
flowetl/tests/integration/conftest.py (3)
  • flowdb_transaction (563-566)
  • jinja_env_staging_qa (553-559)
  • jinja_env_final_qa (543-549)
flowmachine/flowmachine/core/query_state.py (1)
  • execute (301-312)
🪛 Ruff (0.11.9)
flowetl/tests/integration/test_qa.py

16-16: f-string without any placeholders

Remove extraneous f prefix

(F541)

🪛 Flake8 (7.2.0)
flowetl/tests/integration/test_qa.py

[error] 16-16: f-string is missing placeholders

(F541)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: CodeQL-Build
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (2)
flowetl/tests/integration/test_qa.py (2)

9-9: Correct fixture usage for staging QA checks.

The function correctly uses jinja_env_staging_qa fixture, which aligns with the staging-specific QA check reorganisation.


25-25: Systematic migration to stage-specific QA fixtures completed correctly.

All final QA tests have been properly updated to use the jinja_env_final_qa fixture and the new subdirectory-based template paths ({cdr_type}/template_name.sql). This aligns perfectly with the QA check reorganisation by ETL stage.

Also applies to: 30-32, 42-42, 49-51, 61-61, 68-68, 78-78, 86-88, 96-96, 103-103, 113-113, 121-123, 131-131, 138-140, 148-148, 156-158, 166-166, 184-186, 196-196, 214-216, 226-226, 233-233, 242-244, 251-251, 260-260, 267-269, 276-276, 283-285, 292-292, 300-302, 310-310, 318-320, 328-328, 336-336, 346-346, 354-354, 364-364, 371-373, 380-380, 392-392, 401-401, 413-413, 422-422, 430-430, 439-439, 447-449, 456-456, 464-466

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
flowetl/tests/integration/test_qa.py (1)

9-21: Fix unnecessary f-string and verify template path consistency.

The f-string on line 16 has no placeholders and should be a regular string. Additionally, verify that the count_added_rows.sql template should not be organised under CDR type subdirectories like the other templates in this file.

-    check_sql = jinja_env_final_qa.get_template(f"count_added_rows.sql").render(
+    check_sql = jinja_env_final_qa.get_template("count_added_rows.sql").render(
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca1ef34 and 4a18987.

📒 Files selected for processing (1)
  • flowetl/tests/integration/test_qa.py (19 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
flowetl/tests/integration/test_qa.py (1)
flowetl/tests/integration/conftest.py (2)
  • flowdb_transaction (563-566)
  • jinja_env_final_qa (543-549)
🪛 Ruff (0.11.9)
flowetl/tests/integration/test_qa.py

16-16: f-string without any placeholders

Remove extraneous f prefix

(F541)

🪛 Flake8 (7.2.0)
flowetl/tests/integration/test_qa.py

[error] 16-16: f-string is missing placeholders

(F541)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: CodeQL-Build
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (1)
flowetl/tests/integration/test_qa.py (1)

25-471: LGTM! Systematic updates align with QA check reorganisation.

The fixture name changes and template path updates are consistent throughout all test functions, correctly adapting to the new hierarchical QA check directory structure with CDR type-specific subdirectories.

Copy link

codecov bot commented Jun 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.30%. Comparing base (560f5a5) to head (46a66da).

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #7067       +/-   ##
===========================================
+ Coverage   76.31%   88.30%   +11.98%     
===========================================
  Files         277      231       -46     
  Lines       10778     9181     -1597     
  Branches      697      563      -134     
===========================================
- Hits         8225     8107      -118     
+ Misses       2289      942     -1347     
+ Partials      264      132      -132     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@greenape greenape force-pushed the add-garbage-cell-info-check branch from 4a18987 to 5974859 Compare June 30, 2025 12:42
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)

460-484: Use enum values consistently instead of string literals.

The stage parameter is passed as string literals rather than using the defined enum values, which reduces type safety and could lead to inconsistencies.

Apply this diff to use enum values consistently:

         staging_checks = get_qa_checks(
             additional_qa_check_paths=additional_qa_check_paths,
-            stage="staging",
+            stage=ETLStage.STAGING,
         )

37-76: Reduce complexity and fix unused loop variables.

The function has high complexity with nested loops and could be simplified. Additionally, static analysis correctly identifies unused loop variables.

Apply this diff to fix the unused variables:

-    for i, path in enumerate(paths):
+    for i, _path in enumerate(paths):
         parts = path_parts[i]
 
         # Start with just the filename
         for depth in range(1, min_parts + 1):
             suffix_parts = parts[:depth]
             candidate_suffix = "/".join(reversed(suffix_parts))
 
             # Check if this suffix is unique among all paths
             is_unique = True
-            for j, other_path in enumerate(paths):
+            for j, _other_path in enumerate(paths):
                 if i != j:

Consider refactoring this function to reduce nesting and improve readability, perhaps by extracting the uniqueness check into a separate helper function.


117-117: Fix undefined type annotation.

The DAG type is used in the type hint but not imported, causing a static analysis error.

Apply this diff to fix the type annotation:

def get_qa_checks(
    *,
-    dag: Optional["DAG"] = None,
+    dag: Optional["airflow.DAG"] = None,
    additional_qa_check_paths: Optional[List[str]] = None,
    stage: Optional[ETLStage] = ETLStage.FINAL,
) -> List["QACheckOperator"]:

Alternatively, add the import at the top of the file:

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from airflow import DAG

535-537: Use enum value for extract stage.

Consistent with the staging stage issue, use the enum value instead of a string literal.

Apply this diff:

         extract_checks = get_qa_checks(
-            additional_qa_check_paths=additional_qa_check_paths, stage="extract"
+            additional_qa_check_paths=additional_qa_check_paths, stage=ETLStage.EXTRACT
         )
🧹 Nitpick comments (1)
flowetl/tests/integration/test_qa.py (1)

16-16: Remove unnecessary f-string prefix.

The f-string doesn't contain any placeholders, so it should be a regular string.

-    check_sql = jinja_env_final_qa.get_template(f"count_added_rows.sql").render(
+    check_sql = jinja_env_final_qa.get_template("count_added_rows.sql").render(
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a18987 and 5974859.

📒 Files selected for processing (79)
  • CHANGELOG.md (1 hunks)
  • flowdb/testdata/bin/run_qa_checks.py (1 hunks)
  • flowdb_synthetic_data.Dockerfile (1 hunks)
  • flowdb_synthetic_data.Dockerfile.dockerignore (1 hunks)
  • flowdb_testdata.Dockerfile (1 hunks)
  • flowdb_testdata.Dockerfile.dockerignore (1 hunks)
  • flowetl/flowetl/MANIFEST.in (1 hunks)
  • flowetl/flowetl/flowetl/operators/qa_check_operator.py (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql (1 hunks)
  • flowetl/flowetl/flowetl/util.py (10 hunks)
  • flowetl/tests/integration/conftest.py (1 hunks)
  • flowetl/tests/integration/test_dags_present.py (2 hunks)
  • flowetl/tests/integration/test_full_pipeline.py (1 hunks)
  • flowetl/tests/integration/test_qa.py (19 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (7 hunks)
💤 Files with no reviewable changes (2)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
✅ Files skipped from review due to trivial changes (1)
  • flowdb_testdata.Dockerfile
🚧 Files skipped from review as they are similar to previous changes (73)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
  • flowdb_testdata.Dockerfile.dockerignore
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
  • flowetl/flowetl/MANIFEST.in
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
  • flowdb/testdata/bin/run_qa_checks.py
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
  • flowdb_synthetic_data.Dockerfile
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
  • flowetl/tests/integration/test_full_pipeline.py
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
  • flowdb_synthetic_data.Dockerfile.dockerignore
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
  • flowetl/flowetl/flowetl/operators/qa_check_operator.py
  • flowetl/tests/integration/conftest.py
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
  • flowetl/tests/integration/test_dags_present.py
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py

[error] 117-117: undefined name 'DAG'

(F821)

flowetl/tests/integration/test_qa.py

[error] 16-16: f-string is missing placeholders

(F541)

🪛 Pylint (3.3.7)
flowetl/flowetl/flowetl/util.py

[refactor] 37-37: Too many local variables (16/15)

(R0914)


[refactor] 48-73: Too many nested blocks (6/5)

(R1702)

🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

48-48: Loop control variable path not used within loop body

Rename unused path to _path

(B007)


58-58: Loop control variable other_path not used within loop body

Rename unused other_path to _other_path

(B007)


117-117: Undefined name DAG

(F821)

flowetl/tests/integration/test_qa.py

16-16: f-string without any placeholders

Remove extraneous f prefix

(F541)

🪛 LanguageTool
CHANGELOG.md

[style] ~22-~22: Would you like to use the Oxford spelling “organized”? The spelling ‘organised’ is also correct.
Context: ...rovided alongside dag files must now be organised into the correct subfolder for the ETL ...

(OXFORD_SPELLING_Z_NOT_S)

🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md

11-11: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


12-12: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: CodeQL-Build
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (6)
CHANGELOG.md (1)

10-22: LGTM - Well-documented feature additions.

The changelog entries clearly document the new cell info QA checks and ETL enhancements. The formatting suggestions from static analysis are minor style preferences and don't affect functionality.

flowetl/flowetl/flowetl/util.py (4)

27-34: LGTM - Well-defined enum for ETL stages.

The ETLStage enum provides clear type safety for representing different ETL stages and will help prevent string literal errors throughout the codebase.


152-185: LGTM - Comprehensive stage-specific path construction.

The logic properly constructs search paths for different ETL stages, handles CDR type-specific checks, and integrates additional paths. The stage validation ensures type safety.


286-287: LGTM - Enhanced function signature for stage-specific QA checks.

The addition of additional_staging_qa_check_paths and additional_extract_qa_check_paths parameters properly extends the DAG creation functionality to support multi-stage QA checks.


460-575: LGTM - Well-structured multi-stage QA check integration.

The DAG construction logic properly integrates staging and extract QA checks with appropriate EmptyOperator chaining. The conditional logic ensures clean execution paths whether or not QA checks are present for each stage.

flowetl/tests/integration/test_qa.py (1)

9-470: Well-executed systematic refactoring of QA test integration.

The systematic changes to use the jinja_env_final_qa fixture and updated template paths with CDR type subdirectories correctly reflect the reorganization of QA check templates into stage- and type-specific subfolders. The changes maintain all existing test logic whilst adapting to the new template structure.

Comment on lines +194 to +213
disambiguated_paths = disambiguate_paths(templates)
dag.template_searchpath = [*dag.template_searchpath, *disambiguated_paths.keys()]
for pathgroup in sorted(disambiguated_paths.values()):
for tmpl in pathgroup:
tmpl = Path(tmpl)
print(f"Parsing template {tmpl}")
task_id = (
f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}"
if "cdr_type" in dag.params
else f"{tmpl.stem}.{stage}"
)
print(f"Task id is {task_id}")
ops.append(
QACheckOperator(
task_id=task_id,
sql=str(tmpl),
dag=dag,
)
)
return ops
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove debug print statements and fix string literal usage.

Multiple issues need to be addressed in this code segment:

  1. Debug print statements should be removed from production code
  2. String literals should be replaced with enum values for consistency

Apply this diff to fix these issues:

     for pathgroup in sorted(disambiguated_paths.values()):
         for tmpl in pathgroup:
             tmpl = Path(tmpl)
-            print(f"Parsing template {tmpl}")
             task_id = (
-                f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}"
+                f"{tmpl.stem}.{dag.params['cdr_type']}.{stage.value}"
                 if "cdr_type" in dag.params
-                else f"{tmpl.stem}.{stage}"
+                else f"{tmpl.stem}.{stage.value}"
             )
-            print(f"Task id is {task_id}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
disambiguated_paths = disambiguate_paths(templates)
dag.template_searchpath = [*dag.template_searchpath, *disambiguated_paths.keys()]
for pathgroup in sorted(disambiguated_paths.values()):
for tmpl in pathgroup:
tmpl = Path(tmpl)
print(f"Parsing template {tmpl}")
task_id = (
f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}"
if "cdr_type" in dag.params
else f"{tmpl.stem}.{stage}"
)
print(f"Task id is {task_id}")
ops.append(
QACheckOperator(
task_id=task_id,
sql=str(tmpl),
dag=dag,
)
)
return ops
disambiguated_paths = disambiguate_paths(templates)
dag.template_searchpath = [*dag.template_searchpath, *disambiguated_paths.keys()]
for pathgroup in sorted(disambiguated_paths.values()):
for tmpl in pathgroup:
tmpl = Path(tmpl)
task_id = (
f"{tmpl.stem}.{dag.params['cdr_type']}.{stage.value}"
if "cdr_type" in dag.params
else f"{tmpl.stem}.{stage.value}"
)
ops.append(
QACheckOperator(
task_id=task_id,
sql=str(tmpl),
dag=dag,
)
)
return ops
🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/util.py between lines 194 and 213, remove the debug
print statements used for logging template parsing and task id generation.
Replace any string literals used for keys like 'cdr_type' and 'stage' with their
corresponding enum values to ensure consistency and avoid hardcoding strings.
This will clean up the code and align it with the project's standards.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
flowetl/flowetl/flowetl/util.py (3)

37-76: Address previously identified issues.

The unused loop variables and high complexity issues in this function have already been flagged in previous reviews.


199-199: Remove debug print statements.

These debug print statements have been previously flagged and should be removed from production code.

Also applies to: 205-205


201-203: Use enum value instead of string literal.

The stage parameter should use the enum's value property for string representation to maintain consistency with the type system.

Apply this diff to use the enum value:

            task_id = (
-                f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}"
+                f"{tmpl.stem}.{dag.params['cdr_type']}.{stage.value}"
                if "cdr_type" in dag.params
-                else f"{tmpl.stem}.{stage}"
+                else f"{tmpl.stem}.{stage.value}"
            )
🧹 Nitpick comments (2)
flowetl/flowetl/flowetl/util.py (2)

372-372: Remove unused import.

The EmptyOperator import is not used anywhere in the code and should be removed.

Apply this diff:

-from airflow.operators.empty import EmptyOperator

567-567: Remove unused variable assignment.

The final_checks variable is assigned but never used, creating unnecessary code.

Apply this diff to remove the unused assignment:

-        final_checks = (
-            get_qa_checks(additional_qa_check_paths=additional_qa_check_paths),
-        )
+        get_qa_checks(additional_qa_check_paths=additional_qa_check_paths)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5974859 and 63bad7b.

📒 Files selected for processing (2)
  • flowetl/flowetl/flowetl/util.py (10 hunks)
  • flowetl/tests/integration/test_dags_present.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • flowetl/tests/integration/test_dags_present.py
🧰 Additional context used
🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

48-48: Loop control variable path not used within loop body

Rename unused path to _path

(B007)


58-58: Loop control variable other_path not used within loop body

Rename unused other_path to _other_path

(B007)


117-117: Undefined name DAG

(F821)


372-372: airflow.operators.empty.EmptyOperator imported but unused

Remove unused import: airflow.operators.empty.EmptyOperator

(F401)


567-567: Local variable final_checks is assigned to but never used

Remove assignment to unused variable final_checks

(F841)

🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py

[error] 117-117: undefined name 'DAG'

(F821)


[error] 372-372: 'airflow.operators.empty.EmptyOperator' imported but unused

(F401)


[error] 567-567: local variable 'final_checks' is assigned to but never used

(F841)

🪛 Pylint (3.3.7)
flowetl/flowetl/flowetl/util.py

[refactor] 37-37: Too many local variables (16/15)

(R0914)


[refactor] 48-73: Too many nested blocks (6/5)

(R1702)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: CodeQL-Build
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (5)
flowetl/flowetl/flowetl/util.py (5)

4-4: Well-implemented enum and import addition.

The defaultdict import is properly used in the new disambiguation logic, and the ETLStage enum follows best practices by inheriting from both str and Enum for type safety and string conversion.

Also applies to: 27-35


78-113: Good implementation of path disambiguation logic.

The function effectively groups and disambiguates paths using a sound algorithmic approach. The use of defaultdict is appropriate for the grouping operations.


119-119: Good addition of stage parameter.

The stage parameter with a sensible default of ETLStage.FINAL enhances the function's flexibility for different ETL stages.


286-287: Well-designed parameter additions.

The new parameters for stage-specific QA check paths follow the established pattern and provide good flexibility for different ETL stages.

Also applies to: 361-362


461-467: Excellent integration of stage-specific QA checks.

The implementation correctly incorporates staging and extract QA checks at the appropriate points in the ETL pipeline flow. The use of TaskGroup provides good organization and the conditional logic properly handles cases where no checks are present.

Also applies to: 478-501, 524-530, 538-562

@greenape greenape force-pushed the add-garbage-cell-info-check branch from 63bad7b to 1befbaf Compare June 30, 2025 14:38
@greenape greenape force-pushed the add-garbage-cell-info-check branch from 1befbaf to 46a66da Compare June 30, 2025 14:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
flowetl/flowetl/flowetl/util.py (4)

117-117: Fix undefined type annotation.

The DAG type is used in the type hint but not imported, causing a static analysis error.

Apply this diff to fix the type annotation:

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from airflow import DAG

Or import it directly if it's always available in the runtime context.


37-76: Refactor complex function to improve maintainability.

This function has high complexity with excessive nesting and unused loop variables, making it difficult to maintain and understand.

Apply this diff to fix the unused variables:

-    for i, path in enumerate(paths):
+    for i, _path in enumerate(paths):
         parts = path_parts[i]
 
         # Start with just the filename
         for depth in range(1, min_parts + 1):
             suffix_parts = parts[:depth]
             candidate_suffix = "/".join(reversed(suffix_parts))
 
             # Check if this suffix is unique among all paths
             is_unique = True
-            for j, other_path in enumerate(paths):
+            for j, _other_path in enumerate(paths):
                 if i != j:

Consider extracting the uniqueness check logic into a separate helper function to reduce nesting.


199-205: Remove debug print statements.

These debug print statements should not be included in production code as they will clutter the logs.

Apply this diff to remove the debug statements:

-            print(f"Parsing template {tmpl}")
             task_id = (
                 f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}"
                 if "cdr_type" in dag.params
                 else f"{tmpl.stem}.{stage}"
             )
-            print(f"Task id is {task_id}")

201-203: Use enum values consistently in string formatting.

The stage parameter should use .value when converted to string to ensure consistent behaviour with enum types.

Apply this diff to use enum values properly:

             task_id = (
-                f"{tmpl.stem}.{dag.params['cdr_type']}.{stage}"
+                f"{tmpl.stem}.{dag.params['cdr_type']}.{stage.value}"
                 if "cdr_type" in dag.params
-                else f"{tmpl.stem}.{stage}"
+                else f"{tmpl.stem}.{stage.value}"
             )
🧹 Nitpick comments (1)
flowetl/flowetl/flowetl/util.py (1)

372-372: Remove unused import.

The EmptyOperator import is not used anywhere in the code.

Apply this diff to remove the unused import:

-    from airflow.operators.empty import EmptyOperator
     from airflow.utils.task_group import TaskGroup
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63bad7b and 46a66da.

📒 Files selected for processing (79)
  • CHANGELOG.md (1 hunks)
  • flowdb/testdata/bin/run_qa_checks.py (1 hunks)
  • flowdb_synthetic_data.Dockerfile (1 hunks)
  • flowdb_synthetic_data.Dockerfile.dockerignore (1 hunks)
  • flowdb_testdata.Dockerfile (1 hunks)
  • flowdb_testdata.Dockerfile.dockerignore (1 hunks)
  • flowetl/flowetl/MANIFEST.in (1 hunks)
  • flowetl/flowetl/flowetl/operators/qa_check_operator.py (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql (0 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql (1 hunks)
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql (1 hunks)
  • flowetl/flowetl/flowetl/util.py (10 hunks)
  • flowetl/tests/integration/conftest.py (1 hunks)
  • flowetl/tests/integration/test_dags_present.py (2 hunks)
  • flowetl/tests/integration/test_full_pipeline.py (1 hunks)
  • flowetl/tests/integration/test_qa.py (19 hunks)
  • flowetl/tests/unit/test_qa_check_discovery.py (7 hunks)
💤 Files with no reviewable changes (2)
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_duplicated.sql
✅ Files skipped from review due to trivial changes (3)
  • flowdb_testdata.Dockerfile
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_duplicated.sql
🚧 Files skipped from review as they are similar to previous changes (71)
  • flowetl/flowetl/flowetl/qa_checks/final/calls/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_location_ids.sql
  • flowetl/tests/integration/test_full_pipeline.py
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_location_ids.sql
  • flowetl/flowetl/MANIFEST.in
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat_or_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_imeis.sql
  • flowdb_synthetic_data.Dockerfile
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imei.sql
  • flowdb_testdata.Dockerfile.dockerignore
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lon.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_location_ids.sql
  • flowdb_synthetic_data.Dockerfile.dockerignore
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/count_duped_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicated.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/earliest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_duplicates.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_location_ids.sql
  • flowetl/flowetl/flowetl/qa_checks/final/calls/count_null_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/latest_timestamp.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_null_imsis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imsi.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_imeis.sql
  • flowetl/flowetl/flowetl/qa_checks/final/topups/count_msisdns.sql
  • flowetl/flowetl/flowetl/operators/qa_check_operator.py
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/garbage_lat.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/max_msisdns_per_imei.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/mds/count_msisdns.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_locatable_events.sql
  • flowetl/flowetl/flowetl/qa_checks/final/sms/count_duplicates.sql
  • flowdb/testdata/bin/run_qa_checks.py
  • flowetl/flowetl/flowetl/qa_checks/staging/count_staging_rows.sql
  • flowetl/flowetl/flowetl/qa_checks/staging/cell_info/new_cell_ids.sql
  • flowetl/tests/integration/conftest.py
  • flowetl/flowetl/flowetl/qa_checks/final/mds/latest_timestamp.sql
  • flowetl/tests/integration/test_dags_present.py
  • flowetl/tests/unit/test_qa_check_discovery.py
🧰 Additional context used
🪛 Pylint (3.3.7)
flowetl/flowetl/flowetl/util.py

[refactor] 37-37: Too many local variables (16/15)

(R0914)


[refactor] 48-73: Too many nested blocks (6/5)

(R1702)

🪛 Flake8 (7.2.0)
flowetl/flowetl/flowetl/util.py

[error] 117-117: undefined name 'DAG'

(F821)


[error] 372-372: 'airflow.operators.empty.EmptyOperator' imported but unused

(F401)


[error] 567-567: local variable 'final_checks' is assigned to but never used

(F841)

flowetl/tests/integration/test_qa.py

[error] 16-16: f-string is missing placeholders

(F541)

🪛 Ruff (0.11.9)
flowetl/flowetl/flowetl/util.py

48-48: Loop control variable path not used within loop body

Rename unused path to _path

(B007)


58-58: Loop control variable other_path not used within loop body

Rename unused other_path to _other_path

(B007)


117-117: Undefined name DAG

(F821)


372-372: airflow.operators.empty.EmptyOperator imported but unused

Remove unused import: airflow.operators.empty.EmptyOperator

(F401)


567-567: Local variable final_checks is assigned to but never used

Remove assignment to unused variable final_checks

(F841)

flowetl/tests/integration/test_qa.py

16-16: f-string without any placeholders

Remove extraneous f prefix

(F541)

🪛 LanguageTool
CHANGELOG.md

[style] ~22-~22: Would you like to use the Oxford spelling “organized”? The spelling ‘organised’ is also correct.
Context: ...rovided alongside dag files must now be organised into the correct subfolder for the ETL ...

(OXFORD_SPELLING_Z_NOT_S)

🪛 markdownlint-cli2 (0.17.2)
CHANGELOG.md

11-11: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


12-12: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (4)
CHANGELOG.md (1)

10-22: Well-documented feature additions.

The changelog entries comprehensively document the new QA check functionality and associated changes. The structure clearly separates additions from changes, making it easy for users to understand the impact.

flowetl/flowetl/flowetl/util.py (1)

461-501: Well-structured conditional task chaining.

The logic for handling different flux sensor types while integrating staging QA checks is well-implemented. The conditional chaining ensures proper task dependencies are maintained across different scenarios.

flowetl/tests/integration/test_qa.py (2)

9-9: LGTM! Fixture parameter updates are consistent.

The systematic update from jinja_env to jinja_env_final_qa aligns with the reorganization of QA checks into stage-specific subdirectories.

Also applies to: 25-25, 42-42, 61-61, 78-78, 96-96, 113-113, 131-131, 148-148, 166-166, 196-196, 226-226, 242-244, 260-260, 276-276, 292-292, 310-310, 328-328, 346-346, 364-364, 380-380, 401-401, 422-422, 439-439, 456-456


30-32: LGTM! Template path updates follow the new subdirectory pattern.

The systematic change to use f"{cdr_type}/template_name.sql" paths aligns with the reorganization of QA checks into CDR type-specific subdirectories.

Also applies to: 49-51, 68-68, 86-88, 103-103, 121-123, 138-140, 156-158, 184-186, 214-216, 233-233, 251-251, 267-269, 283-285, 300-302, 318-320, 336-338, 354-356, 371-373, 392-394, 413-415, 430-430, 447-449, 464-466

Comment on lines +567 to +569
final_checks = (
get_qa_checks(additional_qa_check_paths=additional_qa_check_paths),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix unused variable assignment.

The final_checks variable is assigned but never used, and there's an unnecessary trailing comma in the tuple assignment.

Apply this diff to fix the assignment:

-        with TaskGroup(
-            group_id="final_qa_checks", prefix_group_id=False
-        ) as final_qa_checks_group:
-            final_checks = (
-                get_qa_checks(additional_qa_check_paths=additional_qa_check_paths),
-            )
+        with TaskGroup(
+            group_id="final_qa_checks", prefix_group_id=False
+        ) as final_qa_checks_group:
+            get_qa_checks(additional_qa_check_paths=additional_qa_check_paths)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
final_checks = (
get_qa_checks(additional_qa_check_paths=additional_qa_check_paths),
)
with TaskGroup(
group_id="final_qa_checks", prefix_group_id=False
) as final_qa_checks_group:
get_qa_checks(additional_qa_check_paths=additional_qa_check_paths)
🧰 Tools
🪛 Flake8 (7.2.0)

[error] 567-567: local variable 'final_checks' is assigned to but never used

(F841)

🪛 Ruff (0.11.9)

567-567: Local variable final_checks is assigned to but never used

Remove assignment to unused variable final_checks

(F841)

🤖 Prompt for AI Agents
In flowetl/flowetl/flowetl/util.py around lines 567 to 569, the variable
final_checks is assigned a tuple with a single element but is never used, and
the trailing comma makes it a tuple unnecessarily. Remove the assignment to
final_checks entirely or replace it with a direct call to get_qa_checks without
assigning it to a variable, eliminating the trailing comma to fix the unused
variable and tuple issue.

flowdb_transaction.execute(create_sql)
flowdb_transaction.execute(insert_sql)
check_sql = jinja_env.get_template("count_added_rows.sql").render(
check_sql = jinja_env_final_qa.get_template(f"count_added_rows.sql").render(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistent template path pattern.

This template path doesn't follow the {cdr_type}/ subdirectory pattern used by all other tests, which will likely cause template resolution failures.

Apply this diff to fix the inconsistency:

-    check_sql = jinja_env_final_qa.get_template(f"count_added_rows.sql").render(
+    check_sql = jinja_env_final_qa.get_template(f"{cdr_type}/count_added_rows.sql").render(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
check_sql = jinja_env_final_qa.get_template(f"count_added_rows.sql").render(
check_sql = jinja_env_final_qa.get_template(f"{cdr_type}/count_added_rows.sql").render(
🧰 Tools
🪛 Ruff (0.11.9)

16-16: f-string without any placeholders

Remove extraneous f prefix

(F541)

🪛 Flake8 (7.2.0)

[error] 16-16: f-string is missing placeholders

(F541)

🤖 Prompt for AI Agents
In flowetl/tests/integration/test_qa.py at line 16, the template path used in
get_template does not include the required {cdr_type}/ subdirectory prefix,
causing inconsistency and potential template resolution failures. Update the
template path to include the {cdr_type}/ prefix before "count_added_rows.sql" to
match the pattern used in other tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request FlowETL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Count rows of cell info qa check Garbage lat or lon qa check for cell info
1 participant