Skip to content

Latest commit

 

History

History
91 lines (86 loc) · 6.54 KB

README.md

File metadata and controls

91 lines (86 loc) · 6.54 KB

prices

README – prices project

The PricesAggregatorApp aggregates XML files that food-store chains publish on the web as a part of the “prices-transparency” new law in Israel. This app is executed every hour on the Hebrew University servers. It runs for a given chain, and stores the XML files that were published in the chain’s website in that day, and have not yet been aggregated earlier in the same day in previous application runs.

Program arguments:

  • aggtype: the aggregation type, can be either “daily” or “hourly”. Both types are executed every hour, and aggregate all XML files published in the current day and have not yet been aggregated. “Daily” type aggregates files with names starting with “PricesFull”, “PromosFull” and “Stores”, while “hourly” type aggregates files with names starting with “Prices” and “Promos” (these files should contain prices and promos differences occurred the same day).
  • chain: the food-store chain to aggregate its XML files. Possible values and their status:
    • rami: aggregates Rami-Levi files, given by Cerberus.
    • dosh: aggregates Super-Dosh files, given by Cerberus.
    • tiv: aggregates Tiv-Taam files, given by Cerberus.
    • dor: aggregates Doralon files, given by Cerberus.
      • URL: https://url.retail.publishedprices.co.il/ (Cerberus)
      • Implementation status: complete
      • Provided files status: Seems complete, but today, for instance, provided files with names having “NULL” prefix.
    • coop: aggregates Co-Op files.
    • shuf: aggregates Shufersal files.
      • URL: http://prices.shufersal.co.il/
      • Implementation status: often internal error server and timeout exceptions are thrown mainly when choosing a different Category on page.
      • Provided files status: seems complete, but not usable due to the exceptions on server.
    • hazi: aggregates Hazi-Hinam files, given by Cerberus.
    • keshet: aggregates Keshet-Teamim files, given by Cerberus.
    • yohan: aggregates Yohananof files, given by Cerberus.
    • osher: aggregates Osher-Ad files, given by Cerberus.
      • URL: https://url.retail.publishedprices.co.il/ (Cerberus)
      • Implementation status: currently assumes the files-table is in a single page (which is the case in the rest of the Cerberus chains) however needs proceed to the next page if table continues on different pages.
      • Provided files status: complete
    • victory: aggregates Victory files, given by Nibit.
      • URL: http://matrixcatalog.co.il/NBCompetitionRegulations.aspx (Nibit)
      • Implementation status: Incomplete. Need to be fixed: while iterating the table lines and clicking the download links, only the first link triggers the webWindowContentChanged method of the WebClient, and the others are not and are thus not saved. README is not yet provided.
      • Provided files status: (seems) complete. However, they should align and provide the standard name-files.
    • lahav: aggregates “Mahsanei-Lahav” files, given by Nibit.
    • hashuk: aggregates “Mahsanei-Hashuk” files, given by Nibit.
    • mega: aggregates Mega files.
      • URL: http://publishprice.mega.co.il/ + the current date
      • Implementation status: complete
      • Provided files status: complete (their system is the most convenient, full and stable.)
    • bitan: aggregates Yeinot-Bitan files.
    • eden: aggregates Eden-Teva files.

Output: Given a chain, the application saves the (extracted) XML files, to the following path:

  • If aggtype=daily: Save PricesFull and PromoFull files by their branch to “daily/yyyyMMdd/chain/branchId/”, and Stores, README, and error-log files to “daily/yyyyMMdd/chain/”.
  • If aggtype=hourly: Save Prices and Promo files by their branch to “hourly/yyyyMMdd/HH/chain/branchId/”, and README, and error-log files to “hourly/yyyyMMdd/HH/chain/”.

TODO:

  • Finish implementation according to above “Implementation status” per chain. This is mainly fixing Nibit three chains (victory, hashuk & lahav) and fixing Shufersal.
  • Error logging: now a basic error-logging is provided (per aggtype and chain), and saved under each chain directory. It still require further XML checking (format and content), and an automated way to summarize problems in aggregation and send them to the admin to take care.
  • Talk to the chains that still do not provide the information and make them start and publish their files.