Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy from Zotero storage #82

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

g-simmons
Copy link

Copy files from Zotero storage instead of downloading, if possible. Discussed in #66.

Changes

  • import shutil for file copy
  • add zotero_storage arg to ZoteroDB init to specify the Zotero storage location, default is "~/Zotero/storage/"
  • if pdf for a document doesn't exist yet, first check Zotero storage.
  • if one pdf exists in Zotero storage, copy it to the paperqa storage location and return the destination path
  • if there is an error attempting to copy the pdf, fall back to downloading
  • if zero or more than one pdfs exist, fall back to downloading

@@ -44,8 +45,22 @@ def __init__(
library_id: Optional[str] = None,
api_key: Optional[str] = None,
storage: Optional[StrPath] = None,
zotero_storage: Optional[Union[StrPath,bool]] = "~/Zotero/storage/",
Copy link
Collaborator

@MilesCranmer MilesCranmer Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zotero_storage: Optional[Union[StrPath,bool]] = "~/Zotero/storage/",
local_zotero: Optional[StrPath] = None,
use_local_zotero: bool = True,

And initialize it later on, based on what operating system it is.

Also, should use pathlib.Path.home() / "Zotero" / "storage" for safety

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so we could do:

Optional[Union[StrPath,bool]] = None

StrPath: try to use the user-provided path, raise FileNotFoundError if it doesn't exist
None: try to use pathlib.Path.home() / "Zotero" / "storage", if it doesn't exist, set zotero_storage to None
False: never use zotero_storage

But what about

Union[StrPath,bool] = True

StrPath: try to use the user-provided path, raise FileNotFoundError if it doesn't exist
True: try to use pathlib.Path.home() / "Zotero" / "storage", if it doesn't exist, set zotero_storage to None
False: never use zotero_storage

I think default True makes it more obvious that the functionality is engaged by default. WYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also out of curiosity what's the diff btw pathlib.Path.home() / "Zotero" / "storage" and Path("~/Zotero/storage/").expanduser()?

Copy link
Collaborator

@MilesCranmer MilesCranmer Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pathlib question

It's just a more robust practice: On Windows, the "\" key is used instead of "/". So Pathlib's / operator will do the correct thing on each operating system.

Also sometimes the "~" is not expanded correctly. Better to rely on Pathlib.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being explicit is often best so I would recommend one parameter for specifying the location, and another for specifying whether you should use it or not. When variables can take on a bunch of different types it can make code confusing. (I edited my original comment with this suggestion)

if zotero_storage:
self.zotero_storage = Path(zotero_storage).expanduser()
else:
self.zotero_storage = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where you can automatically set it, if the OS default exists.

shutil.copy(zotero_pdf_path, pdf_path)
return pdf_path

except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other exceptions are you thinking of here? Maybe catch them explicitly, and throw the error if something unexpected comes up?

Copy link
Author

@g-simmons g-simmons Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting one or more of the three things below? Or something else?

  1. catch specific error cases and resolve them
  2. catch, warn and fall back to downloading
  3. catch, wrap with a message, and raise

For unexpected edge cases, do you think raising an error would be better or just warning the user and falling back to downloading?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have anything specific in mind - could imagine permissions errors at the destination dir.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was no specific error you had in mind I would just remove the try-except. If you think there could be permissions issues for the shutil.copy I would check for the specific permissions error, and warn, then fall back to downloading. If there's an unexpected error then it should be raised.

@jamesbraza
Copy link
Collaborator

Hi @g-simmons thank you for this PR. We just released v5 today which changes basically the whole repo.

If you are still interested, please rebase this PR atop the current main branch.

@g-simmons
Copy link
Author

@jamesbraza thanks for the info. I probably won't be able to rebase this any time soon (other time commitments) :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants