Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloaded docx plans throw an error #377

Closed
mariapraetzellis opened this issue Sep 19, 2022 · 4 comments
Closed

Downloaded docx plans throw an error #377

mariapraetzellis opened this issue Sep 19, 2022 · 4 comments
Assignees
Labels

Comments

@mariapraetzellis
Copy link
Collaborator

A few users have reported that they get an error when trying to open DMPs that were downloaded as docx files.

For example, when downloading this plan: https://dmptool.org/plans/82519, the user gets the error below when trying to open the word file:
qXH99MDYAkTaZWD2azYL-Y1TJ7sdp0fHIw

@briri
Copy link
Collaborator

briri commented Sep 26, 2022

Seems to be related to the specific version of the NIH template that was used to create that plan. I just created one from the latest and it downloads without issue. Probably a weird (non-ascii character) causing an issue with the HTML -> DOCX conversion.
Will need to compare the question text between versions to confirm.

@briri
Copy link
Collaborator

briri commented Sep 27, 2022

I cannot determine why this particular docx fails to open in MS Word.

OSX Preview and TextEdit, Slack are able to render it. Google docs is able to import it.

MS Word for Max v 16.65 throws a:
Screen Shot 2022-09-27 at 9 31 34 AM

When attempting to 'recover' the docx as suggested, the 'repaired' file is binary and unreadable
Screen Shot 2022-09-27 at 9 32 51 AM

The corrupt docx opens if you export it without the question text. The original suspicion was that the issue lied with the bulleted (unordered) lists on the last few questions. This does not seem to be the case though after further research

I have created a DMP based off of the current NIH template and it has no issues. I created and published a sample template (as a UCOP admin) that contains bulleted lists. The docx for a DMP created from that sample template opens without issue.

There do not appear to be any readily available docx debugging tools available. I found a few possible options (e.g. this VB.Net one but they all require you to have a Windows environment) all comments on StackOverflow with regard to these solutions indicate that there's less than a 50% chance they'll tell us what's actually wrong.

A docx file is actually just a zip archive. If you change .docx to .zip and then unzip it you can access all of the XML files that make up the document 🤯
Screen Shot 2022-09-27 at 9 40 38 AM

A comparison of each file between the corrupted docx and the valid ones (both the latest NIH template and my sample template) show NO difference except for the content of the document.xml which is expected since the question text is different. All class/xml element name, attributes, document structure, etc. are the same.

Not sure what else to do here. It seems to be an issue in my opinion with MS Word and perhaps just specific versions. 🤷🏻‍♂️

There are plans in the system based off of this template that generates corrupted docx files. Around 60-70% of those appear to simply be tests (based on the 'test' flag or the plan names). I also suspect that few users export to docx.

Possible solutions would be to:

  • Downgrade the htmltoword gem and see if it helps. Issue here is that the current version was released in 2019, so its unlikely to help unless this issue has been present for several years.
  • Add some text to the 'Download' page that suggests that users import the docx into Google docs if they have trouble opening it
  • Ignore it since the latest version of the template exports docx without any issues

@briri
Copy link
Collaborator

briri commented Sep 27, 2022

another option may be to export as an OpenOffice doc format

@mariapraetzellis mariapraetzellis changed the title Dowloaded docx plans throw an error Downloaded docx plans throw an error Sep 27, 2022
@briri
Copy link
Collaborator

briri commented Oct 31, 2022

related to DMPRoadmap#3221

@briri briri closed this as completed Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants