Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change data fetch method #30

Merged
merged 5 commits into from
Jul 12, 2024
Merged

Conversation

pranavanba
Copy link
Collaborator

Major Changes

  1. Significantly speed up pipeline execution time by connecting directly to S3 bucket, instead of syncing S3 bucket objects to local directory
  2. Read datasets directly from S3 bucket instead of from local file paths

Minor Changes

  1. Remove config parameters that are no longer needed due to Major Change # 1
  2. Make progress print statements prettier

… bucket instead of syncing files from S3 storage to local directory
… they are no longer needed due to connecting directly to S3 bucket instead of syncing files from the bucket to a local directory
…o longer synced to local directory and are instead directly read from S3 bucket connection
@pranavanba pranavanba self-assigned this Jul 12, 2024
@pranavanba pranavanba merged commit 21f3e7a into Sage-Bionetworks:main Jul 12, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant