Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SVCS-528] [Blocked] Improve gdoc rendering - WB Part #304

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

cslzchen
Copy link
Contributor

@cslzchen cslzchen commented Dec 5, 2017

Ticket

https://openscience.atlassian.net/browse/SVCS-528
MFR counterpart: CenterForOpenScience/modular-file-renderer#303

Purpose

This PR replaces #290. Credit goes to @AddisonSchiller 🎆 🎆 .

Allow MFR to request .gdoc files in .pdf format while keep .docx format for downloading. This will allow MFR to cut out the expensive unoconv conversion for .gdoc files

We can export .gdoc as a pdf if we know its coming from mfr. Added alt_export functions and properties to Gdrive utils and metadata. Gdrive download now looks for 'mfr' in its kwargs and will request file as pdf if used properly.

Changes

WB V1 provider now looks for a mfr= query param. If mfr=true, the google drive provider will use alt_export_name and alt_export_link() instead of its normal counterparts. This will cause the file being sent to be a .pdf instead of .docx.

Side effects

None

QA Notes

Test that .gdoc rendering works as usual.

Deployment Notes

This PR must be merged/deployed at the same time as its MFR counterpart.
This PR should be blocked by @TomBaxter 's v3 update: #276.

@coveralls
Copy link

coveralls commented Dec 5, 2017

Coverage Status

Coverage increased (+0.03%) to 89.843% when pulling 4be8f0e on cslzchen:feature/svcs-528-improve-gdoc into f6e049a on CenterForOpenScience:develop.

else:

# TODO figure out metadata.raw.get('downloadUrl')
download_url = metadata.raw.get('downloadUrl') or drive_utils.get_export_link(metadata.raw) # type: ignore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that metadata.raw.get('downloadUrl') is always None. Why?

Copy link
Contributor

@Johnetordoff Johnetordoff Dec 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of this line is that downloadUrl is always present unless it's a file that that isn't exportable in it's native type, so a .txt will have a downloadUrl, but a .gdoc won't. Google files have frustratingly nonstandard metadata.

@@ -234,9 +234,18 @@ def can_intra_copy(self,

metadata = await self.metadata(path, revision=revision)

if kwargs.get('mfr', None) and kwargs['mfr'].lower() == 'true':
download_url = drive_utils.get_alt_export_link(metadata.raw) # type: ignore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download_url = metadata.raw.get('downloadUrl') or drive_utils.get_alt_export_link(metadata.raw) 

@cslzchen cslzchen force-pushed the feature/svcs-528-improve-gdoc branch from 4768410 to 38b4e1a Compare December 6, 2017 15:38
AddisonSchiller and others added 2 commits December 6, 2017 10:45
We can export .gdoc as a pdf if we know its coming from
mfr. Added `alt_export` functions and properties to Gdrive
utils and metadata. Gdrive download now looks for 'mfr' in its
kwargs and will request file as pdf if used properly.
raises key error and update gdrive utils
to use `dict.get()` instead of `dict[]`.
@cslzchen cslzchen force-pushed the feature/svcs-528-improve-gdoc branch from 38b4e1a to a7c70ef Compare December 6, 2017 15:45
@@ -234,9 +234,16 @@ def can_intra_copy(self,

metadata = await self.metadata(path, revision=revision)

if kwargs.get('mfr', None) and kwargs['mfr'].lower() == 'true':
download_url = metadata.raw.get('downloadUrl') or drive_utils.get_alt_export_link(metadata.raw)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, fixed. Need to check metadata.raw.get('downloadUrl') first. There are file types on gdrive (e.g. .txt) that do not have export_links.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about

if kwargs.get('mfr', '').lower() == 'true':

Also I don't get the or here.

@cslzchen cslzchen changed the title [SVCS-528] Improve gdoc rendering (WB Part) [SVCS-528] [Blocked] Improve gdoc rendering (WB Part) Dec 6, 2017
@coveralls
Copy link

coveralls commented Dec 6, 2017

Coverage Status

Coverage increased (+0.03%) to 89.809% when pulling a7c70ef on cslzchen:feature/svcs-528-improve-gdoc into 6249a7a on CenterForOpenScience:develop.

@cslzchen cslzchen changed the title [SVCS-528] [Blocked] Improve gdoc rendering (WB Part) [SVCS-528] [Blocked] Improve gdoc rendering - WB Part Dec 7, 2017
@@ -234,9 +234,16 @@ def can_intra_copy(self,

metadata = await self.metadata(path, revision=revision)

if kwargs.get('mfr', None) and kwargs['mfr'].lower() == 'true':
download_url = metadata.raw.get('downloadUrl') or drive_utils.get_alt_export_link(metadata.raw)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about

if kwargs.get('mfr', '').lower() == 'true':

Also I don't get the or here.

if format['mime_type'] == metadata['mimeType']:
return format
for format_type in DOCS_FORMATS:
if format_type.get('mime_type') == metadata.get('mimeType'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata.get('mimeType') should be outside the for loop so it doesn't need to be accessed each iteration. Given that this data structure is tiny so I doubt there will be any perf. difference.

format = get_format(metadata)
return metadata['exportLinks'][format['type']]
format_type = get_format(metadata)
return metadata.get('exportLinks').get(format_type.get('type'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if metadata doesn't have an exportLinks?

@NyanHelsing
Copy link
Contributor

I think some of the stuff here may be better passed as a query param that waterbutler passes through to gdrive transparently, this pr currently puts code that deals with exporting into mfr, I'd like to look at if theres a way to put that logic in mfr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants