Skip to content

Commit

Permalink
refacto: update batch mode
Browse files Browse the repository at this point in the history
  • Loading branch information
polomarcus committed Feb 26, 2024
1 parent 85ae98f commit c362681
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 5 deletions.
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,21 @@ Otherwise, default is all channels
In case we have a new word detection logic, we must re apply it to all saved keywords inside our database.

We should use env variable `UPDATE` like in docker compose (should be set to "true")

In order to see actual change in the local DB, run the test first `docker compose up test` and then these commands :
```
docker exec -ti quotaclimat-postgres_db-1 bash
psql -h localhost --port 5432 -d barometre -U user
--> enter password : password
UPDATE keywords set number_of_keywords=1000 WHERE id = '71b8126a50c1ed2e5cb1eab00e4481c33587db478472c2c0e74325abb872bef6';
UPDATE keywords set number_of_keywords=1000 WHERE id = '975b41e76d298711cf55113a282e7f11c28157d761233838bb700253d47be262';
```

After having updated `UPDATE` env variable to true inside docker-compose.yml and running `docker compose up mediatree` you should see these logs :
```
update_pg_keywords.py:20 | Difference old 1000 - new_number_of_keywords 0
```

### Fix linting
Before committing, make sure that the line of codes you wrote are conform to PEP8 standard by running:
```bash
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ services:
POSTGRES_PORT: 5432
PORT: 5050 # healthcheck
HEALTHCHECK_SERVER: "0.0.0.0"
SENTRY_DSN: prod_only
# SENTRY_DSN: prod_only
# START_DATE: 1704576615 # to test batch import
# UPDATE: "true" # to batch update PG
# CHANNEL : fr3-idf # to reimport only one channel
Expand Down
10 changes: 6 additions & 4 deletions quotaclimat/data_processing/mediatree/update_pg_keywords.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,13 @@ def update_keywords(session: Session, batch_size: int = 50000) -> list:
for i in range(0, total_updates, batch_size):
batch_updates = saved_keywords[i:i+batch_size]
for keyword_id, plaintext, keywords_with_timestamp, number_of_keywords, start in batch_updates:
logging
new_number_of_keywords = count_keywords_duration_overlap_without_indirect(keywords_with_timestamp, start)
logging.debug(f"{keyword_id} new value {new_number_of_keywords}")
update_number_of_keywords(session, keyword_id, new_number_of_keywords)

if(number_of_keywords != new_number_of_keywords):
logging.info(f"Difference old {number_of_keywords} - new_number_of_keywords {new_number_of_keywords}")
logging.debug(f"{keyword_id} new value {new_number_of_keywords}")
update_number_of_keywords(session, keyword_id, new_number_of_keywords)
else:
logging.debug("No difference")
logging.info(f"bulk update done {i} out of {total_updates}")
session.commit()

Expand Down

1 comment on commit c362681

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage

Coverage Report
FileStmtsMissCoverMissing
postgres
   insert_data.py46785%38–40, 59–61, 66
   insert_existing_data_example.py20385%25–27
postgres/schemas
   models.py711579%74–81, 91–92, 101–111
quotaclimat/data_analytics
   analytics_signataire_charte.py29290%1–67
   bilan.py1081080%2–372
   data_coverage.py34340%1–94
   exploration.py1251250%1–440
   sitemap_analytics.py1181180%1–343
quotaclimat/data_ingestion
   categorization_program_type.py110%1
   config_youtube.py110%1
   scaleway_db_backups.py34340%1–74
   scrap_chartejournalismeecologie_signataires.py50500%1–169
   scrap_sitemap.py1341787%27–28, 33–34, 66–71, 95–97, 138–140, 202, 223–228
   scrap_tv_program.py62620%1–149
   scrap_youtube.py1141140%1–238
quotaclimat/data_ingestion/ingest_db
   ingest_sitemap_in_db.py594131%21–42, 45–65, 69–80
quotaclimat/data_ingestion/scrap_html
   scrap_description_article.py36392%19–20, 32
quotaclimat/data_processing/mediatree
   api_import.py17710242%38–42, 47–50, 54–57, 63, 66–93, 99–114, 119–121, 146–153, 157–160, 164–170, 181–192, 195–199, 205, 231–232, 236, 240–259, 263–274
   config.py15287%7, 16
   detect_keywords.py110496%131–133, 148
   utils.py662267%19, 30–54, 57, 76–77
quotaclimat/data_processing/sitemap
   sitemap_processing.py412734%15–19, 23–25, 29–47, 51–58, 66–96, 101–103
quotaclimat/utils
   channels.py660%1–95
   climate_keywords.py220%3–35
   healthcheck_config.py291452%22–24, 27–38
   logger.py14379%22–24
   plotly_theme.py17170%1–56
   sentry.py10280%21–22
TOTAL158196339% 

Tests Skipped Failures Errors Time
51 0 💤 0 ❌ 0 🔥 50.541s ⏱️

Please sign in to comment.