Skip to content

Commit

Permalink
Data quality: normalize channel title (#271)
Browse files Browse the repository at this point in the history
* wip

* test: add real tests
  • Loading branch information
polomarcus authored Oct 15, 2024
1 parent 7604fb5 commit 6ac72c8
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 2 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -399,3 +399,4 @@ There is a debt regarding the cleanest of the code right now. Let's just not mak

## Thanks
* [Eleven-Strategy](https://www.welcometothejungle.com/fr/companies/eleven-strategy)
* [Kevin Tessier](https://kevintessier.fr)
3 changes: 3 additions & 0 deletions quotaclimat/data_processing/mediatree/api_import.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,9 @@ def parse_reponse_subtitle(response_sub, channel = None, channel_program = "", c
inplace=True
)

logging.debug("setting channel_title")
new_df['channel_title'] = new_df.apply(lambda x: get_channel_title_for_name(x['channel_name']), axis=1)

logging.debug(f"setting program {channel_program}")
# weird error if not using this way: (ValueError) format number 1 of "20h30 le samedi" is not recognized
new_df['channel_program'] = new_df.apply(lambda x: channel_program, axis=1)
Expand Down
4 changes: 2 additions & 2 deletions test/sitemap/test_mediatree.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
"text": "france"
}
],
"channel":{"name":"m6","title":"M6","radio":false},"start":1704798000,
"channel":{"name":"m6","title":"fake m6","radio":false},"start":1704798000,
"plaintext":"test1"
},
{
Expand All @@ -51,7 +51,7 @@
"text": "adaptation"
}
],
"channel":{"name":"tf1","title":"TF1","radio":false},"start":1704798120,
"channel":{"name":"tf1","title":"fake TF1","radio":false},"start":1704798120,
"plaintext":"test2"}
],
"elapsed_time_ms":335}
Expand Down

1 comment on commit 6ac72c8

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage

Coverage Report
FileStmtsMissCoverMissing
postgres
   insert_data.py43784%36–38, 56–58, 63
   insert_existing_data_example.py19384%25–27
postgres/schemas
   models.py1571193%126–133, 146, 148–149, 214–215, 229–230
quotaclimat/data_ingestion
   scrap_sitemap.py1341787%27–28, 33–34, 66–71, 95–97, 138–140, 202, 223–228
quotaclimat/data_ingestion/ingest_db
   ingest_sitemap_in_db.py553733%21–42, 45–58, 62–73
quotaclimat/data_ingestion/scrap_html
   scrap_description_article.py36392%19–20, 32
quotaclimat/data_processing/mediatree
   api_import.py21313338%44–48, 53–74, 78–81, 87, 90–132, 138–153, 158, 171–183, 187–193, 206–218, 221–225, 231, 269–270, 273–304, 307–309
   channel_program.py1625367%21–23, 34–36, 53–54, 57–59, 98–99, 108, 124, 177, 180–216
   config.py15287%7, 16
   detect_keywords.py223996%222, 280–287, 323
   update_pg_keywords.py674927%15–108, 132, 135, 142–157, 180–206, 213
   utils.py792568%29–53, 56, 65, 86–87, 117–120
quotaclimat/utils
   healthcheck_config.py291452%22–24, 27–38
   logger.py241154%22–24, 28–37
   sentry.py11282%22–23
TOTAL129437671% 

Tests Skipped Failures Errors Time
97 0 💤 0 ❌ 0 🔥 8m 0s ⏱️

Please sign in to comment.