Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of openpyxl.utils.cell._get_column_letter assumes <= 18278 columns #416

Open
poulson opened this issue Jan 9, 2023 · 0 comments
Open

Comments

@poulson
Copy link

poulson commented Jan 9, 2023

I am unable to unflatten some UK government procurement data because flatten-tool's unflatten routine is internally making use of openpyxl.utils.cell's _get_column_letter routine to convert column indices into strings of the form [A-Z]+.

The issue is that _get_column_letter explicitly enforces the constraint that at most three letters are generated, which corresponds to a constraint of at most 26 + 26 * 26 + 26 * 26 * 26 = 18,278 columns.

Thus, for spreadsheets with more than 18,278 columns, the unflatten routine is failing with a dubious error message about an invalid column index.

Thankfully, the fix is quite simple: the _get_column_letter routine is self-contained and the limitation to three characters is self-imposed and easily removed.

The implementation in its entirety is:

def _get_column_letter(col_idx):
    """Convert a column number into a column letter (3 -> 'C')

    Right shift the column col_idx by 26 to find column letters in reverse
    order.  These numbers are 1-based, and can be converted to ASCII
    ordinals by adding 64.
    """
    # these indicies corrospond to A -> ZZZ and include all allowed
    # columns
    if not 1 <= col_idx <= 18278:
        raise ValueError("Invalid column index {0}".format(col_idx))
    letters = []
    while col_idx > 0:
        col_idx, remainder = divmod(col_idx, 26)
        # check for exact division and borrow if needed
        if remainder == 0:
            remainder = 26
            col_idx -= 1
        letters.append(chr(remainder+64))
    return ''.join(reversed(letters))

The routine functions for arbitrarily large column indices if you simply remove these four lines.

    # these indicies corrospond to A -> ZZZ and include all allowed
    # columns
    if not 1 <= col_idx <= 18278:
        raise ValueError("Invalid column index {0}".format(col_idx))

I recommend avoiding the external dependency on _get_column_letter via an internally implemented function of the form:

def get_column_string(col_idx):
    """Convert a column number into a string of characters between A and Z."""
    letters = []
    while col_idx > 0:
        col_idx, remainder = divmod(col_idx, 26)
        # check for exact division and borrow if needed
        if remainder == 0:
            remainder = 26
            col_idx -= 1
        letters.append(chr(remainder+64))
    return ''.join(reversed(letters))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant