Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module for sending datasets to Google cloud #18

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

josephchimebuka
Copy link

@josephchimebuka josephchimebuka commented Dec 26, 2024

Closes #17

Created a bucket to monitors a Cloud Storage for file uploads or changes.

Screenshot from 2025-01-03 17-28-39

Wrote a function for uploading the data
Screenshot from 2025-01-03 17-03-39

Uploaded a file and tested it on big query to see the datasets
Screenshot from 2025-01-03 17-05-02

@josephchimebuka josephchimebuka marked this pull request as ready for review January 1, 2025 03:10
@josephchimebuka josephchimebuka changed the title Module for sending retrieving datasets from gogle cloud Module for sending datasets to Google cloud Jan 1, 2025
@josephchimebuka
Copy link
Author

@akiraonstarknet and @ManvithaMolakala Can you pls review

@@ -0,0 +1,86 @@
import logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments throughtout.

from google.cloud import bigquery
from google.cloud import storage

import yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update pyproject.toml file

@@ -0,0 +1,30 @@
import unittest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this project, we use pytest. Please write the code using pytest for uniformity

src/data/datasets.py Show resolved Hide resolved
src/data/datasets.py Outdated Show resolved Hide resolved
src/data/datasets.py Outdated Show resolved Hide resolved
src/data/datasets.py Outdated Show resolved Hide resolved
@@ -0,0 +1,30 @@
import unittest
from unittest.mock import Mock, patch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly mention in the README or comments if the script depends on gcloud auth for credentials or requires a GOOGLE_APPLICATION_CREDENTIALS environment variable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If additional configurations are needed (e.g., a secrets file or service account JSON), document their usage and location.

src/data/datasets.py Outdated Show resolved Hide resolved
# Assert no exceptions raised (would require more detailed mocks for deeper validation)

if __name__ == '__main__':
unittest.main()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the above comments before I take a final look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create module to store and retrieve datasets from google cloud storage
2 participants