Add row limit to lookup table uploads #35426

gherceg · 2024-11-22T22:27:05Z

Product Description

Technical Summary

https://dimagi.atlassian.net/browse/SAAS-16223

As discussed in the ticket, there aren't any existing lookup tables with more than 500k rows, so that seems like a very safe limit to start with. The risky part of this PR is that it adds an explicit limit to the WorkbookJSONReader class, currently set to 1 million. This limit doesn't have to be set to anything necessarily, but I figured in the spirit of implementing limits, putting a limit like this in code seems ideal. On the flip side, it could be viewed as a lazy way of adding a limit to any code that uses this class under the hood (like scheduling).

Feature Flag

Safety Assurance

Safety story

Automated test coverage

QA Plan

No

Rollback instructions

This PR can be reverted after deploy with no further considerations

Labels & Review

Risk label is set correctly
The set of people pinged as reviewers is appropriate for the level of risk of the change

corehq/util/workbook_json/excel.py

gherceg · 2024-12-03T18:28:24Z

I think I actually want to replace the implementation of _max_row here to use dimensions instead.

gherceg · 2024-12-03T19:01:31Z

Just kidding. After reading some source code I realized the dimensions weren't going to work the way I had hoped, though there is some room for a deeper optimization. My understanding is that dimensions are unreliably set depending on the application that created the file in the first place. So we could rework our code to check for the total size of the workbook/worksheet when first loading it, using the dimensions if they are set, otherwise manually iterating over rows, whereas we currently just iterate over all rows to determine table sizes.

gherceg added 2 commits November 22, 2024 17:25

Implement row limit for lookup tables

fbf74b8

Put exceptions in separate file

caefc0e

dimagimon added the Risk: Medium Change affects files that have been flagged as medium risk. label Nov 22, 2024

Fix linting errors

3a06254

gherceg marked this pull request as ready for review November 22, 2024 22:48

gherceg requested review from millerdev and orangejenny November 22, 2024 22:48

millerdev approved these changes Nov 25, 2024

View reviewed changes

corehq/util/workbook_json/excel.py Outdated Show resolved Hide resolved

gherceg added 2 commits December 3, 2024 13:55

Use worksheet.max_row

27b63a5

Add comment explaining why we force calculating dimensions

dc58ae4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add row limit to lookup table uploads #35426

Add row limit to lookup table uploads #35426

gherceg commented Nov 22, 2024

gherceg commented Dec 3, 2024

gherceg commented Dec 3, 2024

Add row limit to lookup table uploads #35426

Are you sure you want to change the base?

Add row limit to lookup table uploads #35426

Conversation

gherceg commented Nov 22, 2024

Product Description

Technical Summary

Feature Flag

Safety Assurance

Safety story

Automated test coverage

QA Plan

Rollback instructions

Labels & Review

gherceg commented Dec 3, 2024

gherceg commented Dec 3, 2024