You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In real-world use we’ve encountered a few valid OOXML spreadsheets that simple_xlsx_reader can’t parse correctly. These sheets are annoying, but they are spec-compliant, and as far as I can tell actually come from Microsoft tools like PowerBI.
First of all, sheets can contain namespaced tags, i.e.:
As far as I can tell, the pragmatic approach here is to just drop the namespacing, i.e. name.split(':').last in start_element. This may be 'wrong' as far as correct XML parsing goes, but it’s simple, pragmatic, works in my testing, and is also basically the approach that xsv takes to solve the problem.
Second, it’s not mandatory for cells to have the r attribute, in which case parsers should infer the column by its position in the file.
This is easy enough to fix with a @column_counter variable which is set to 0 at the start of each row and incremented for each c, then doing something like @cell_name = attrs['r'] || column_number_to_letter(@ column_counter).
Again, this won’t stand up to something truly truly horrible like mixed present/absent r attributes, but as far as I’ve seen in real-world use, worksheets either have it, or they don’t.
If you’d be happy to merge a pull request to accommodate both of these cases, i’m happy to submit one!
The text was updated successfully, but these errors were encountered:
Hi,
In real-world use we’ve encountered a few valid OOXML spreadsheets that simple_xlsx_reader can’t parse correctly. These sheets are annoying, but they are spec-compliant, and as far as I can tell actually come from Microsoft tools like PowerBI.
First of all, sheets can contain namespaced tags, i.e.:
As far as I can tell, the pragmatic approach here is to just drop the namespacing, i.e.
name.split(':').last
instart_element
. This may be 'wrong' as far as correct XML parsing goes, but it’s simple, pragmatic, works in my testing, and is also basically the approach thatxsv
takes to solve the problem.Second, it’s not mandatory for cells to have the
r
attribute, in which case parsers should infer the column by its position in the file.This is easy enough to fix with a
@column_counter
variable which is set to 0 at the start of eachrow
and incremented for eachc
, then doing something like@cell_name = attrs['r'] || column_number_to_letter(@ column_counter)
.Again, this won’t stand up to something truly truly horrible like mixed present/absent
r
attributes, but as far as I’ve seen in real-world use, worksheets either have it, or they don’t.If you’d be happy to merge a pull request to accommodate both of these cases, i’m happy to submit one!
The text was updated successfully, but these errors were encountered: