-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Littlepay Table Deduplication Adjustments #2993
Conversation
Warehouse report 📦 DAGLegend (in order of precedence)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling, just a few things:
In about half the tables, the assumed natural key of the table is fully unique. Some retain duplicates on the assumed natural key, which we intend to deal with in downstream models (or in staging models) if we learn more about the conditions causing the duplicates.
I was approaching this as if this round of updates was the investigation where we would try to learn more about the conditions causing the duplicates. Like for authorisations, where we have a few remaining duplicates, I identified exactly what is causing them and identified what would be needed to address and why it's not appropriate to do in staging (specifically, would require a join with settlements); I am realizing now that I should write it into a ticket to capture outside of Slack conversations.
Can we try to at least write tickets identifying the causes for these tables as well? Like, what columns are differing when the natural key is duplicated? Can we tell if they seem like true duplicates, or whether we should consider adding a new column to the key (even if that goes against LP documentation)? Sorry if I missed this documentation being added elsewhere.
A couple of these tables retain QUALIFY statements from their previous form, in addition to the new deduplication logic. It is likely valuable to put a careful set of extra eyes on those statements to ensure we're not introducing any unintended behavior or doing anything redundant.
Did you check whether these statements are still needed with the full-row duplicate dropping? (Does the row count change?) It looks like device_transactions
at least has a commented explanation, but I am a little interested to see that that qualify is using littlepay_transaction_id
and the primary key seems to be device_transaction_id
, did we confirm why that's the case?
warehouse/models/staging/payments/littlepay/stg_littlepay__customer_funding_source.sql
Outdated
Show resolved
Hide resolved
warehouse/models/staging/payments/littlepay/stg_littlepay__device_transactions.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the first round, I think everything mostly looks good but re-reviewing I kinda realized that funding sources and product data seem a bit different in kind from the other data and I am wondering if we should think about treating those as type 2 dimensions rather than just getting the latest version of the record? If so, perhaps they can/should be spun out separately.
warehouse/models/staging/payments/littlepay/stg_littlepay__customer_funding_source.sql
Show resolved
Hide resolved
warehouse/models/staging/payments/littlepay/stg_littlepay__device_transactions.sql
Show resolved
Hide resolved
_payments_key, | ||
_content_hash, | ||
FROM add_keys_drop_full_dupes | ||
-- Some products change in form over time, e.g. getting different 'capping_type' values or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question to above.... I am wondering whether these need to be handled as type 2 or if we should treat these as corrections. 🤔 Because the question would be if we're looking at older data, should we access the capping conditions that were in effect at the time? Or the capping conditions as they are now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline; we may need to revisit offline whether this data should be treated as a slowly changing dimension.
* Fix failing test on fct monthly routes (#2892) * refactor to address failing test and some docs * fix docs to reflect updated join * remove unused column * dbt modeling guidance docs (#2874) * start working on modeling decisions flowchart * fix admonition * fix single quotes * expand on bug identification and test linking * fixing bugs, grain overview * fix box type and more docs * example testing * add testing, documentation docs * fix links * rephrase some things, finish flowchat * fix emoji breaking mermaid and add callout about incremental models * add example stakeholder doc and rearrange a bit * move incremental warning, a few more tweaks * actually move incremental warning * pr comments * add loom links for debugging a failing test * reorder * tweak video link name * clarify bug types * more clarifications, esp for incremental models * linter * reference correct github action * dbt dev docs updates - linting and some extra context/links (#2896) * Support GTFS validator v4.1.0 in validation tasks (#2893) * Change GTFS RT / schedule mapping to handle schedule download failures (#2899) * change gtfs schedule mapping logic to better handle schedule download failures * update sql comment * bug fix: missing `mart_gtfs.fct_daily_scheduled_stops` (#2901) * fct_scheduled_stops uses fct_scheduled_trips as base, then left join * incremental where when trips had service, add test to check feed counts * remove incremental_where in dim_stop_times * 75% of trip feeds as threshold for check * Add dbt troubleshooting video (#2902) * add video of meeting to dbt warehouse docs page * add section on incremental models * add note to look at each section * typo add s to bugs * fix link ref * Bucket deprecation process update (#2898) * Clarify incremental docs (#2904) * clarify full refresh considerations * more tips on identifying incremental models * docs on the regular dbt dag task * move .github workflows readme * move k8s docs into readme * begin updating k8s docs * update cluster and jupyterhub docs * continue on k8s docs * add gitops diagram * fix marking mermaid for readme markdown not jupyterbook * remove period * couple tweaks * address PR feedback * style diagram * add note * clarify * rename build-dbt github action * fix typo * add docs about testing pod operators locally * add docs about schedule downloader secrets * Bump certifi from 2023.5.7 to 2023.7.22 in /services/gtfs-rt-archiver-v3 Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.5.7 to 2023.7.22. - [Commits](certifi/python-certifi@2023.05.07...2023.07.22) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump certifi from 2023.5.7 to 2023.7.22 in /ci Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.5.7 to 2023.7.22. - [Commits](certifi/python-certifi@2023.05.07...2023.07.22) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump certifi from 2023.5.7 to 2023.7.22 in /script Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.5.7 to 2023.7.22. - [Commits](certifi/python-certifi@2023.05.07...2023.07.22) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump certifi from 2023.5.7 to 2023.7.22 in /jobs/gtfs-rt-parser-v2 Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.5.7 to 2023.7.22. - [Commits](certifi/python-certifi@2023.05.07...2023.07.22) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump certifi from 2023.5.7 to 2023.7.22 in /jobs/gtfs-schedule-validator Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.5.7 to 2023.7.22. - [Commits](certifi/python-certifi@2023.05.07...2023.07.22) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump certifi from 2023.5.7 to 2023.7.22 in /warehouse Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.5.7 to 2023.7.22. - [Commits](certifi/python-certifi@2023.05.07...2023.07.22) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump certifi from 2023.5.7 to 2023.7.22 in /jobs/gtfs-aggregator-scraper Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.5.7 to 2023.7.22. - [Commits](certifi/python-certifi@2023.05.07...2023.07.22) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump certifi from 2023.5.7 to 2023.7.22 in /apps/maps Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.5.7 to 2023.7.22. - [Commits](certifi/python-certifi@2023.05.07...2023.07.22) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump tornado from 6.3.2 to 6.3.3 in /services/gtfs-rt-archiver-v3 Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.3.2 to 6.3.3. - [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst) - [Commits](tornadoweb/tornado@v6.3.2...v6.3.3) --- updated-dependencies: - dependency-name: tornado dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump certifi in /packages/calitp-data-analysis Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22. - [Commits](certifi/python-certifi@2022.12.07...2023.07.22) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump certifi from 2022.12.7 to 2023.7.22 in /packages/calitp-data-infra Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22. - [Commits](certifi/python-certifi@2022.12.07...2023.07.22) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * renaming missing form_factor values to 'Unidentified' (#2922) * Add mdformat to pre-commit (#2923) * include rides in (#2903) * revise payments date spine to exclude null values (#2924) * rearrange runbooks (#2926) * update k8s auth docs * Add Anaheim Transportation Network to LittlePay pipeline (#2928) * change fct_payments_rides_v2 join to include missing values (#2934) * make payments dags run hourly (#2933) * make payments dags run hourly * filter files to parse down to the hour, remove old daily outcomes * run dbt a bit earlier --------- Co-authored-by: Andrew Vaccaro <[email protected]> * GTFS Validator Upgrade Follow-Up (#2935) * Update accepted ranges for validator version test * Update values list in _mart_gtfs_quality YAML * Switch bucket ref for external table to hourly bucket * Add seed for validator v4.1.0 rule details * add stub docs on which software to request for Git/VS Code on Caltrans PCs (#2938) * add stub docs on which software to request * format * add incomplete variable fares to device transaction types so they will appear in payments rides (#2940) * remove @charlie-costanzo from codeowners (#2939) Co-authored-by: Evan Siroky <[email protected]> * add popular packages * updated version number * imported packages except mypy * trying something i found on stackoverflow * rm init update func * rm optional and add return for geo utils * added return val and list[Any] using typing * geo utils:corr return type, changed back to list * geo_utils pivot_table optional str and cast * geo utils make pivot table optional[str] and literal * geo_utils changed to sequence * geo utils changed back to optional[str] * added # type ignore * geo_utils: replaced seq * geo_utils rm aggregate func * Windows git instructions (#2946) * finish windows git instructions * format * updated calitp-data-analysis package date * update dask req for altair * changed altair version * updated geopandas version,ran poetry add calitp * updated shared_utils and switch instead of checkout * double checked all checkout and shared_utils were changed * fixed typo * added trim_make_empty_string_null macro to littlepay staging tables (#2953) * rm colons, added pygis * (docs): clearer code blocks, update analytics info * (docs): clean up code blocks * move pygis link * Initial authorisations modeling (#2954) * clean strings and use lp source macro * use column macros to extract lp filename attributes * update macro file name and start duplicate handling * fix date macro * add tests and key construction * dedupe full dup rows and only do date imputation once * refactor macros for qualify dedupe statements * add descriptions in yaml * fix duplicate yaml anchor * relax uniqueness and drop simple dup rows * Payments: Remove over-aggressive file deduping (#2990) * remove over-aggressive file deduping * add content hash docs and yaml anchor for fuzzy uniqueness * remove references to me in non-payments locations (#2941) Co-authored-by: Evan Siroky <[email protected]> * add missing columns for anaheim and all agencies (#2992) * Update CODEOWNERS (#2991) * Update CODEOWNERS * precommit * Bump urllib3 from 1.26.16 to 1.26.17 in /ci (#2972) Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.16 to 1.26.17. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@1.26.16...1.26.17) --- updated-dependencies: - dependency-name: urllib3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump urllib3 from 1.26.16 to 1.26.17 in /jobs/gtfs-aggregator-scraper (#2968) Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.16 to 1.26.17. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@1.26.16...1.26.17) --- updated-dependencies: - dependency-name: urllib3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Payments staging models refactor: settlements, micropayments, refunds (#2994) * fix deduping of same-timestamp rows * rename cte * handle micropayments in line with new approach * dedupe settlements * dedupe refunds * Payments: Add string handling, refactor modeling of Elavon data (#2957) * refactored staging table to focus on cleaning * broke out intermediate tables to focus on billing and deposits exclusively, and deduplicate * removed cleaning steps and refactored to focus on union of billing and deposits * replace previous transactions table * broke out intermediate tables to focus on billing and deposits exclusively, and deduplicate * made parse_elavon_date macro, implemented in stg_elavon_transactions * combined import CTEs for conciseness * implement qualify for intermediate table deduplication * changed elavon fact table name back to fct_elavon__transactions for the time being * removed _deduped from elavon intermediate tables since this is now handled in staging * fixed table names downstream for renamed tables * missed some renaming downstream * removed always null columns from int_elavon__billing_transactions * moved elavon deduplication to staging table * updated dbt docs * fix typo in stg_transit_database__funding_programs (#2985) * re-add columns in billing intermediate table to allow for union downstream, update dbt docs (#3002) * remove shape_array_key from yml (#3006) * Transit funding schema (#3003) * define schema explicitly * add coalesce * chore: update Codeowners (#3007) Include @tiffanychu90 as a codeowner of the warehouse. * fix: remove caltrans district from transit data quality issues table (#3005) This column is not included in the SQL and should be looked up via a bridge table anyways * Fix data type in transit funding programs schema (#3009) * remove dt from yml * fix data types * fix data types * Add agency information to fct_payments_rides_v2 (#3008) * updated docs for fct_payments_rides_v2 to include agency column descriptions * updated table fct_payments_rides_v2 to include agency columns agency_id and agency_name * Littlepay Table Deduplication Adjustments (#2993) * suppress duplicate row that is causing settlements test to fail (#3011) * Missing device transactions column (#3013) * add missing geography column back to device transactions * add geography field to yaml * fix join on feed_key (#3014) * Add new Littlepay columns + correct mistyped columns (#2996) * Add missing Littlepay columns to external tables * Correct column names and add new columns to models * Payments: Summarized authorisations model (#3001) * create authorisation deduping model and add upstream test * pivoted authorisations working wip * refactor payments row access policy macro for reusability * wip: summarize authorisations -- get latest rather than pivot * fix deduplication logic * completed summarized authorisations * remove unused yaml anchor * Payments: Intermediate settlements model (#3016) * stub: settlements intermediate model * wip settlements summarization: impute types and begin summarizing * working, finalized(?) settlements intermediate model * create docs macro for participant id column * summarizing settlements -- tweak refunds and dup handling * yaml updates * clarify descriptions of credits vs. debits * add column name * Suppress bad customer funding source row (#3021) * Expand Littlepay Sync Setup Docs (#3019) * Payments: update payments_gtfs_datasets seed to include elavon customer_name (#3024) * update payments_gtfs_datasets seed to include elavon customer name * update downstream models to use new seed column name * added all elavon customers to seed file, not just those that map to littlepay payments * removed not null test from littlepay_participant_id in payments seed (#3035) * Drop a few dupe micropayments (#3041) * Payments: Create `fct_payments_aggregations` model (#3040) * aggregate micropayments to aggregations * rename intermediate models * rename authorisations again to be even clearer * aggregations model * docs & yaml for aggregations model * add docs note per pr review * Payments: Add organization columns to mart tables for Metabase use (#3042) * add organization mapping to payments seed * rename payments mapping seed file * add organization information to payments mart; also add summarized aggregation date * Payments: Add end of month date columns (#3043) * add end of month dates * add not null test * Fix payments aggregation column names in YAML (#3051) * fix column name * fix other column name * Address a bunch of linter failures (#3054) * address a bunch of linter failures * more linter failures and fix some from before * more linter failures * one last one * Force python version for lint CI run --------- Co-authored-by: Soren Spicknall <[email protected]> * Create `fct_payments_settlements` (#3053) * payments fct settlements mart model * refactor intermediate model to use fct table * docs and yaml updates for fct payments settlements --------- Co-authored-by: Soren Spicknall <[email protected]> * Take most recent authorisation with non-null status for aggregation (#3052) * Payments: intermediate refunds tables (#3036) * Payments: intermediate refunds table * make micropayments refunds that appear in charge_amount positive * add table config to file * moved some casting and cleaning to micropayments staging table * renamed table to include deduped, deduped refunds table, added null refund_id column to micropayments * added more deduplication logic to refunds intermediate table * changes from review * revised dedup of aggregation_id and refund_amount, manually exclude micropayment dup refunds * fixed IN clause * substitute retrieval_reference_number for aggregation_id in dedups and joins, coalesce retrieval_reference_number to use aggregation_id if not present * fix coalesce * revisions to coalesced_id, other fixes from review * added documentaion, relationship tests * attempting to get relationship test to work / persist all refunds * continued investigating failing relationship test on aggregation_id * successfully deduped with passing relationship test * create refunds to aggregations table int_payments__refunds_to_aggregations * added comment * add dbt docs for int_payments__refunds_to_aggregations * revise date extraction to exclude time zone * transitioned to using proposed_amount for sums, revised imports and renaming, added commment describing deduplication * Break up Elavon transactions by type & other small tweaks (#3058) * remove erroneous currency code column * change date of payments activity to always be payments date * working billing transactions mart table * create yaml and change column name * refactor to just use littlepay participant id directly * remove acct, routing columns * create deposits table * chargebacks model * adjustment transactions * use participant id directly for all transactions * add organization name to fct payments settlements * Fix two problematic Payments model tests (#3060) * change settlements test to point to mart table * add exception to test * Expand disk space usage and pod offset guidance (#3069) * Fare systems: Update columns & types (#3067) * change column type to JSON to handle change from string to array * fix columns and types for fare systems * Add aggregation ID to fct payments rides (#3077) * add aggregation id to payments rides * add test for aggregations appearing in fct aggregations * Add note about pygraphviz on ARM Macs (#3072) * Remove Principal Customer ID Tests W/Poor Assumptions (#3070) * Bump pillow from 9.5.0 to 10.0.1 in /warehouse (#2988) Bumps [pillow](https://github.com/python-pillow/Pillow) from 9.5.0 to 10.0.1. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](python-pillow/Pillow@9.5.0...10.0.1) --- updated-dependencies: - dependency-name: pillow dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Soren Spicknall <[email protected]> * Bump pillow from 9.5.0 to 10.0.1 in /jobs/gtfs-rt-parser-v2 (#2987) Bumps [pillow](https://github.com/python-pillow/Pillow) from 9.5.0 to 10.0.1. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](python-pillow/Pillow@9.5.0...10.0.1) --- updated-dependencies: - dependency-name: pillow dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Soren Spicknall <[email protected]> * Bump pillow from 10.0.0 to 10.0.1 in /images/jupyter-singleuser (#2986) Bumps [pillow](https://github.com/python-pillow/Pillow) from 10.0.0 to 10.0.1. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](python-pillow/Pillow@10.0.0...10.0.1) --- updated-dependencies: - dependency-name: pillow dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Soren Spicknall <[email protected]> * Bump urllib3 from 1.26.17 to 1.26.18 in /jobs/gtfs-aggregator-scraper (#3034) Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.17 to 1.26.18. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@1.26.17...1.26.18) --- updated-dependencies: - dependency-name: urllib3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Soren Spicknall <[email protected]> * Bump urllib3 v1 to 1.26.18 (#3083) * Update pillow and urllib3 in requirements.txt files (#3084) * Fix missing Littlepay transaction ids (#3075) * fix missing transaction ids/duplicate rides for historical data * adjust tests * HOTFIX: Pin version of Netlify CLI used in docs publication (#3087) * Payments documentation: Adding new agencies to Metabase (#3037) * Payments documentation: Adding new agencies to Metabase * fix typos, continue building out create new dashboard section * documentation updates, still needs editing * switched some headers for ordered lists * edits pt 1 * edits to move from draft PR to published * quick formatting fix * revisions based on PR review * revisions based on Soren's PR review * Restore Unpinned netlify-cli Version In Deps and Dockerfiles (#3089) * Improve warehouse README with tips/warnings (#3091) * Use More Recent nodejs Version In Build Tasks (#3090) * HOTFIX gpg command in Dockerfiles (#3094) * Resolve npm command non-recognition on warehouse build (#3099) * Expand JSON column values in the Benefits table into their own columns (#3081) Co-authored-by: Machiko Yasuda <[email protected]> Co-authored-by: Kegan Maher <[email protected]> * Improve warehouse README (part 2) (#3096) * Payments: Make customer processing participant-specific (#3109) * make customer processing participant-specific * update documentation * grammar * DAG: copy_prod_archiver_configs_to_test (#3110) * DAG: copy_prod_archiver_configs_to_test * reword dag description in metadata * update readme * changes based on PR review * got dag to work in local airflow * Create new table for Mobility Marketplace's new provider map (#3066) * Add diagram showing RBAC/security for Payments dashboards (#3125) * add diagram showing rbac/security for payments dashboards * add quotes per mermaid-js/mermaid#4388 (comment) * clarify labels * update TOC per PR review * Fix various outdated links in READMEs (#3130) * Bump urllib3 from 1.26.17 to 1.26.18 in /ci (#3033) Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.17 to 1.26.18. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@1.26.17...1.26.18) --- updated-dependencies: - dependency-name: urllib3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump gitpython from 3.1.32 to 3.1.37 in /ci (#2999) Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.32 to 3.1.37. - [Release notes](https://github.com/gitpython-developers/GitPython/releases) - [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES) - [Commits](gitpython-developers/GitPython@3.1.32...3.1.37) --- updated-dependencies: - dependency-name: gitpython dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump gitpython from 3.1.32 to 3.1.37 in /images/jupyter-singleuser (#3000) Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.32 to 3.1.37. - [Release notes](https://github.com/gitpython-developers/GitPython/releases) - [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES) - [Commits](gitpython-developers/GitPython@3.1.32...3.1.37) --- updated-dependencies: - dependency-name: gitpython dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump aiohttp to 3.8.6 (#3131) * Update minor deps that require heightened access for CI runs (#3135) * Upgrade pyarrow dep in Jupyterhub image and data analysis image (#3136) * Unpin Python version for linting CI task (#3137) * add line number to sorting to make it deterministic (#3138) * add refund id (#3151) * Linkfix in /airflow README (#3154) * Settlements schema updates (#3153) * add new columns to settlements staging table * rename column * add yaml for fct settlements * settlements to aggregations updates * new fields on fct aggregations * Refactor payments rides for code readability and modularity: micropayments portion (#3123) * updates to refunds deduped and its documentation * working micropayments intermediate model * wip: docs macros for micropayments * remove unused docstring * keep migrating lp docs to docs macros * keep updating docs for micropayments model * more yaml documentation * update refunds modeling to accommodate micropayments modeling * finish adding yaml * wip: further dedupe refunds where refund id changes * make refund logic more sophisticated to handle id mmismatches * fix issue from rebase * use intermediate model for micropayments transformations * docs updates * Infra Docs Overhaul Part 1 - READMEs and Lagging Versions (#3156) --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Laurie <[email protected]> Co-authored-by: Andrew Vaccaro <[email protected]> Co-authored-by: tiffanychu90 <[email protected]> Co-authored-by: Andrew Vaccaro <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Charlie Costanzo <[email protected]> Co-authored-by: Eric Dasmalchi <[email protected]> Co-authored-by: Evan Siroky <[email protected]> Co-authored-by: amandaha8 <[email protected]> Co-authored-by: Amanda <[email protected]> Co-authored-by: tiffanychu90 <[email protected]> Co-authored-by: Vladimir Jimenez <[email protected]> Co-authored-by: Vladimir Jimenez <[email protected]> Co-authored-by: Machiko Yasuda <[email protected]> Co-authored-by: Kegan Maher <[email protected]> Co-authored-by: Github Action build-release-candidate <runner@fv-az1269-152>
Description
This represents a partial solution to #2945, encompassing deduplication and testing changes to the customer_funding_source, device_transaction_purchases, device_transactions, micropayment_adjustments, micropayment_device_transactions, and product_data tables from Littlepay. Laurie is taking care of the three remaining Littlepay pipeline tables.
Our intent was to rid ourselves of true duplicates and obvious stale data that is superseded by later, updated data. Every row in these tables has a unique combination of
_key
and_payments_key
. A couple of these tables contain QUALIFY statements that utilizelittlepay_export_ts
in addition to the new standard deduplication logic, because we've detected that these tables occasionally receive updated versions of previous rows.Type of change
How has this been tested?
All tables created in a staging environment and dbt tests run for each. Downstream models also run where relevant.
Post-merge follow-ups
Triage remaining payments test failures that exist after merging these uniqueness changes and Laurie's changes to the micropayments, refunds, and settlements tables