Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate timestamp when using an "as_of" date to get the VersionID of an S3 object #24

Open
2 tasks done
bsweger opened this issue Sep 27, 2024 · 2 comments · May be fixed by #79
Open
2 tasks done

Incorporate timestamp when using an "as_of" date to get the VersionID of an S3 object #24

bsweger opened this issue Sep 27, 2024 · 2 comments · May be fixed by #79
Assignees

Comments

@bsweger
Copy link
Collaborator

bsweger commented Sep 27, 2024

This is a follow up to #23

That PR added the ability to look up an S3 versionID for a specific date, but as noted in that PR review, we really should make that feature "timestamp aware."

Definition of Done

  • If no as_of date is supplied when downloading files from Nextstrain's S3 bucket, the default date should be UTC datetime.now() (i.e., current timestamp) instead of the current date at midnight.
  • Allow user to specify a timestamp in addition to a date. If user specifies an as_of date without a timestamp, default the time to midnight UTC.

2025-01-3 update: If someone passes YYYY-MM-DD in string format (i.e., 2025-01-01), Cladetime should interpret that as datetime(2024, 1, 1, 11, 59, 59). However, if someone passes a Python datetime object without a time (i.e., datetime(2025, 1, 1, 0, 0, 0)), it's hard to interpret intent--maybe they want to use midnight as a time stamp? Therefore, in the latter case, Cladetime should not override the timestamp to 11:59:59)

@bsweger bsweger added this to Lab Work Sep 27, 2024
@bsweger bsweger converted this from a draft issue Sep 27, 2024
@bsweger bsweger added this to the Variant Nowcast Launch milestone Sep 27, 2024
@bsweger bsweger self-assigned this Oct 10, 2024
@bsweger
Copy link
Collaborator Author

bsweger commented Nov 13, 2024

@elray1 Am reviewing some old tickets, and I'm not sure I stated the first "definition of done" item correctly.

Cladetime now works exclusively in UTC time, but it sounds like if someone specifies a date w/o a time, the most helpful behavior would be to use 23:59 UTC as the timestamp instead of midnight.

Current behavior:

from cladetime import CladeTime
ct = CladeTime(tree_as_of="2024-09-03")
ct
CladeTime(sequence_as_of=2024-11-13 21:10:23+00:00, tree_as_of=2024-09-03 00:00:00+00:00)
  1. Can you confirm that what you want here is tree_as_of set to 2024-09-03 23:59:59+00:00?
  2. For unspecified dates (like sequence_as_of, above), ok to leave the default as the current UTC timestamp?

@bsweger bsweger removed their assignment Dec 10, 2024
@elray1
Copy link
Collaborator

elray1 commented Dec 10, 2024

  1. Yes, I think setting to the end of the specified date (UTC) is a good choice when a time is not specified.
  2. Yes, using the current UTC timestamp is reasonable if nothing is specified.

@bsweger bsweger moved this from Todo to In Progress in Lab Work Jan 3, 2025
@bsweger bsweger self-assigned this Jan 3, 2025
bsweger added a commit that referenced this issue Jan 3, 2025
Resolves #24

When someone instantiates a CladeTime object using string-based
date formats (YYYY-MM-DD) for sequence_as_of or tree_as_of,
set the corresponding timestamp to 11:59:59 to ensure that the
entire day is included when searching for S3 object versions
that match the date.
@bsweger bsweger moved this from In Progress to Ready for Review in Lab Work Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready for Review
Development

Successfully merging a pull request may close this issue.

2 participants