Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing weird-but-valid sheets (namespaces and r attribute on cells) #64

Open
skipchris opened this issue Jan 9, 2025 · 0 comments
Open

Comments

@skipchris
Copy link

Hi,

In real-world use we’ve encountered a few valid OOXML spreadsheets that simple_xlsx_reader can’t parse correctly. These sheets are annoying, but they are spec-compliant, and as far as I can tell actually come from Microsoft tools like PowerBI.

First of all, sheets can contain namespaced tags, i.e.:

<?xml version="1.0" encoding="utf-8"?>
<x:worksheet xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:sheetData>
    <x:row>
      <x:c s="2" t="inlineStr">
        <x:is>
          <x:t>Hello</x:t>
        </x:is>
      </x:c>
    </x:row>
</x:sheetData>

As far as I can tell, the pragmatic approach here is to just drop the namespacing, i.e. name.split(':').last in start_element. This may be 'wrong' as far as correct XML parsing goes, but it’s simple, pragmatic, works in my testing, and is also basically the approach that xsv takes to solve the problem.

Second, it’s not mandatory for cells to have the r attribute, in which case parsers should infer the column by its position in the file.

This is easy enough to fix with a @column_counter variable which is set to 0 at the start of each row and incremented for each c, then doing something like @cell_name = attrs['r'] || column_number_to_letter(@ column_counter).

Again, this won’t stand up to something truly truly horrible like mixed present/absent r attributes, but as far as I’ve seen in real-world use, worksheets either have it, or they don’t.

If you’d be happy to merge a pull request to accommodate both of these cases, i’m happy to submit one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant