Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy from Zotero storage #82

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 37 additions & 1 deletion paperqa/contrib/zotero.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# This file gets PDF files from the user's Zotero library
import os
import shutil
from typing import Union, Optional
from pathlib import Path
import logging
Expand Down Expand Up @@ -44,8 +45,22 @@ def __init__(
library_id: Optional[str] = None,
api_key: Optional[str] = None,
storage: Optional[StrPath] = None,
zotero_storage: Optional[Union[StrPath,bool]] = "~/Zotero/storage/",
Copy link
Collaborator

@MilesCranmer MilesCranmer Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zotero_storage: Optional[Union[StrPath,bool]] = "~/Zotero/storage/",
local_zotero: Optional[StrPath] = None,
use_local_zotero: bool = True,

And initialize it later on, based on what operating system it is.

Also, should use pathlib.Path.home() / "Zotero" / "storage" for safety

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so we could do:

Optional[Union[StrPath,bool]] = None

StrPath: try to use the user-provided path, raise FileNotFoundError if it doesn't exist
None: try to use pathlib.Path.home() / "Zotero" / "storage", if it doesn't exist, set zotero_storage to None
False: never use zotero_storage

But what about

Union[StrPath,bool] = True

StrPath: try to use the user-provided path, raise FileNotFoundError if it doesn't exist
True: try to use pathlib.Path.home() / "Zotero" / "storage", if it doesn't exist, set zotero_storage to None
False: never use zotero_storage

I think default True makes it more obvious that the functionality is engaged by default. WYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also out of curiosity what's the diff btw pathlib.Path.home() / "Zotero" / "storage" and Path("~/Zotero/storage/").expanduser()?

Copy link
Collaborator

@MilesCranmer MilesCranmer Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pathlib question

It's just a more robust practice: On Windows, the "\" key is used instead of "/". So Pathlib's / operator will do the correct thing on each operating system.

Also sometimes the "~" is not expanded correctly. Better to rely on Pathlib.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being explicit is often best so I would recommend one parameter for specifying the location, and another for specifying whether you should use it or not. When variables can take on a bunch of different types it can make code confusing. (I edited my original comment with this suggestion)

**kwargs,
):
"""Initialize the ZoteroDB object.

Parameters
----------
storage: str, optional
The path to the directory where PDFs will be stored. Defaults to
`~/.paperqa/zotero`.
zotero_storage: str, optional
The path to storage directory where Zotero stores PDFs. Defaults to
`~/Zotero/storage/`. Set this to use previously-downloaded PDFs. Set to `False` to
disable this feature.
"""

self.logger = logging.getLogger("ZoteroDB")

if library_id is None:
Expand Down Expand Up @@ -76,9 +91,14 @@ def __init__(

if storage is None:
storage = CACHE_PATH.parent / "zotero"

if zotero_storage:
self.zotero_storage = Path(zotero_storage).expanduser()
else:
self.zotero_storage = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where you can automatically set it, if the OS default exists.


self.logger.info(f"Using cache location: {storage}")
self.storage = storage
self.storage = Path(storage)

super().__init__(
library_type=library_type, library_id=library_id, api_key=api_key, **kwargs
Expand Down Expand Up @@ -107,6 +127,22 @@ def get_pdf(self, item: dict) -> Union[Path, None]:
pdf_path = self.storage / (pdf_key + ".pdf")

if not pdf_path.exists():
if self.zotero_storage:
self.logger.info(f"| Looking for existing PDF for: {_get_citation_key(item)}")
try:
zotero_doc_folder = self.zotero_storage / pdf_key

if zotero_doc_folder.exists():
pdf_files = list(zotero_doc_folder.glob("*.pdf"))
if len(pdf_files) == 1:
self.logger.info(f"| Copying existing PDF for {_get_citation_key(item)} from Zotero storage.")
zotero_pdf_path = zotero_doc_folder / pdf_files[0]
shutil.copy(zotero_pdf_path, pdf_path)
return pdf_path
g-simmons marked this conversation as resolved.
Show resolved Hide resolved

except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other exceptions are you thinking of here? Maybe catch them explicitly, and throw the error if something unexpected comes up?

Copy link
Author

@g-simmons g-simmons Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting one or more of the three things below? Or something else?

  1. catch specific error cases and resolve them
  2. catch, warn and fall back to downloading
  3. catch, wrap with a message, and raise

For unexpected edge cases, do you think raising an error would be better or just warning the user and falling back to downloading?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have anything specific in mind - could imagine permissions errors at the destination dir.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was no specific error you had in mind I would just remove the try-except. If you think there could be permissions issues for the shutil.copy I would check for the specific permissions error, and warn, then fall back to downloading. If there's an unexpected error then it should be raised.

self.logger.warning(f"Could not copy file from Zotero storage, redownloading file. Error: {e}")

pdf_path.parent.mkdir(parents=True, exist_ok=True)
self.logger.info(f"| Downloading PDF for: {_get_citation_key(item)}")
self.dump(pdf_key, pdf_path)
Expand Down