Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore-datadir #238

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

Stew-McD
Copy link
Contributor

@Stew-McD Stew-McD commented Dec 17, 2023

Continuation of last week's pull request, but now with the addition of a function to restore the entire datadir to the default dir, or somewhere else.

This is useful, for example, when working on a remote cluster.

@Stew-McD Stew-McD marked this pull request as ready for review December 17, 2023 13:50
@cmutel
Copy link
Member

cmutel commented Dec 17, 2023

Thanks @Stew-McD.

We definitely can't merge this without some tests - I will try to add them over the break. Ping me if you get impatient.

I will also have time then to think about the question you asked. Not sure yet.

Copy link
Member

@cmutel cmutel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't notice this until recently. Thanks for the idea!

We should integrated this into the tests - create an example, load it to check the data, and compare the hash of a file created in the test against the fixture.

@@ -13,20 +13,23 @@


def backup_data_directory(
timestamp: Optional[bool] = True, dir_backup: Optional[Union[str, Path]] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't love changing argument order, but we haven't released with this yet and the new order is more logical, so OK.

timestamp: Optional[bool] = True, dir_backup: Optional[Union[str, Path]] = None
dir_backup: Optional[Union[str, Path]] = None,
timestamp: Optional[bool] = True,
file_suffix: Optional[str] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right name. The initial understanding of this would be the last part of the name (e.g. .tar.gz, see the pathlib docs). What about 'filename_extra' or similar instead?

):
"""
Backup the Brightway2 data directory to a `.tar.gz` (compressed tar archive) in a specified directory, or in the user's home directory by default.

The file name is of the form "brightway2-data-backup.{timestamp}.tar.gz", unless `timestamp` is False, in which case the file name is "brightway2-data-backup.tar.gz".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of extra filename parts, why not allow users to set the filename itself? This seems more reasonable than "brightway2-data-backup{file_suffix}{timestamp}.tar.gz", which is a mouthful.

dir_backup : str, Path, optional
Directory to backup. If None, use the user's home directory.
timestamp : bool, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give a spec (e.g. ISO 8601) or example of the timestamp

"""

dir_backup = Path(dir_backup or Path.home())

# Check if the backup directory exists and is writable
if not dir_backup.exists():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch


# if new_data_dir, set the environment variable (this will not be persistent, I only know how to do that on Linux or in a venv...)
if new_data_dir:
os.environ["BRIGHTWAY2_DIR"] = str(new_data_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work.

If I understand correctly, the idea is that after importing the data directory, you can use it from bw2data. But bw2io has a dependency on bw2data, which means it has read the environment variables already and chosen the base data directory. Setting the environment variable now won't change that.

Instead, I think you should add an input argument to this function: use_data_dir (bool) (or similar), which would call change_base_directories.


# Confirm that the user wants to overwrite the data directory, if it exists
if data_dir.is_dir():
confirm_overwrite = input(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, no user input allowed. Make it an input argument with a scary name and default False, and throw an error if overwriting is not allowed. We might also consider not allowing this functionality at all.

"""

# check if file exists
fp = Path(fp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are doing this, then I don't see any reason to allow str inputs. I thought the point was to allow for non-filesystem resources, but I would prefer for all filepaths to be Paths. I have taken this approach in bw2data.


# this block is maybe totally the wrong way to do it --> could be done better or removed
if data_dir != Path(projects._base_data_dir):
projects.change_base_directories(data_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -120,8 +242,12 @@ def backup_project_directory(
if not os.access(dir_backup, os.W_OK):
raise PermissionError(f"The directory {dir_backup} is not writable.")

timestamp_str = datetime.datetime.now().strftime("%d-%B-%Y-%I-%M%p") if timestamp else ""
backup_filename = f"brightway2-project-{project}-backup{timestamp_str}.tar.gz"
timestamp_str = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second time - factor this out into a standalone function.

@Stew-McD
Copy link
Contributor Author

Ah yea. Maybe I was getting caught up trying to solve something that wasn't a need for anyone else. It did work for me at the time, but probably I was just procrastinating. :D An external compress, extract, change datadir, do the same thing, I guess.

I've been sick for a while and getting back into it now. I appreciate your specific and constructive comments.
I'll go through and respond as soon as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants