Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add row limit to lookup table uploads #35426

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

gherceg
Copy link
Contributor

@gherceg gherceg commented Nov 22, 2024

Product Description

Technical Summary

https://dimagi.atlassian.net/browse/SAAS-16223

As discussed in the ticket, there aren't any existing lookup tables with more than 500k rows, so that seems like a very safe limit to start with. The risky part of this PR is that it adds an explicit limit to the WorkbookJSONReader class, currently set to 1 million. This limit doesn't have to be set to anything necessarily, but I figured in the spirit of implementing limits, putting a limit like this in code seems ideal. On the flip side, it could be viewed as a lazy way of adding a limit to any code that uses this class under the hood (like scheduling).

Feature Flag

Safety Assurance

Safety story

Automated test coverage

QA Plan

No

Rollback instructions

  • This PR can be reverted after deploy with no further considerations

Labels & Review

  • Risk label is set correctly
  • The set of people pinged as reviewers is appropriate for the level of risk of the change

@dimagimon dimagimon added the Risk: Medium Change affects files that have been flagged as medium risk. label Nov 22, 2024
@gherceg gherceg marked this pull request as ready for review November 22, 2024 22:48
corehq/util/workbook_json/excel.py Outdated Show resolved Hide resolved
@gherceg
Copy link
Contributor Author

gherceg commented Dec 3, 2024

I think I actually want to replace the implementation of _max_row here to use dimensions instead.

@gherceg
Copy link
Contributor Author

gherceg commented Dec 3, 2024

Just kidding. After reading some source code I realized the dimensions weren't going to work the way I had hoped, though there is some room for a deeper optimization. My understanding is that dimensions are unreliably set depending on the application that created the file in the first place. So we could rework our code to check for the total size of the workbook/worksheet when first loading it, using the dimensions if they are set, otherwise manually iterating over rows, whereas we currently just iterate over all rows to determine table sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Risk: Medium Change affects files that have been flagged as medium risk.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants