Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Precision #353

Open
rbroth opened this issue Feb 22, 2023 · 4 comments
Open

Data Precision #353

rbroth opened this issue Feb 22, 2023 · 4 comments
Assignees

Comments

@rbroth
Copy link
Collaborator

rbroth commented Feb 22, 2023

Teams discussion

@Fuhji which fields have you had problems with?

@rbroth rbroth self-assigned this Jul 26, 2023
@rbroth
Copy link
Collaborator Author

rbroth commented Jul 26, 2023

@rbroth
Copy link
Collaborator Author

rbroth commented Jul 26, 2023

@LouiseAnder @kpitten

Importing excel data (the intervention costings data) in the database has introduced false precision due to the way computers store and calculate numbers and the way the libraries that connect our code to excel work. As a results, we have a whole bunch of numbers with a bajillion numbers after the decimal in the database. This is obviously not correct.

To fix it, there's two approaches:

  1. Just round all numbers to two decimal points. Takes 20 minutes to do (I've already got a merge request in for it). Doesn't fully reflect Katie's intention. We can improve it later if it's not good enough.
  2. Dig into the code of the library, figure out how numbers and decimal points are handled, write code to retrieve excel formatting information and convert it to python, etc. Will likely take at least hours of time.

I'm inclined to go with solution 1 for the time being and procrastinate on doing solution 2.

Yay/Nay?

@kpitten
Copy link

kpitten commented Jul 26, 2023 via email

@rbroth
Copy link
Collaborator Author

rbroth commented Jul 26, 2023

Would it be possible to round everything to the nearest whole number and then manually change specific fields to have decimals?

Very possible, though would obviously also be lots of work. And we wouldn't be able to do it until we release, since we're re-importing the data from the spreadsheets continuously.

Beyond decimals, there are other instances of too many significant figures e.g. the total 10-year cost is reported as $3,908,368,096 in the excel sheet (maybe we can get a $96 discount somewhere and make it an even $3,908,368,000 😉).

I figured that, at least for version 1, having a few too many decimals might not be too bad as long as it allows us to get rid of numbers like 3424.3184146701915

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants