Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added script and kubernetes job to export mongodb collections to S3 #1340

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AncientPatata
Copy link
Contributor

Motivation

To further our ability to analyze the behavior of ArmoniK, this PR introduces a script to export TaskData (and later on other MongoDB collections) to an S3 bucket. They can be importer later on onto another MongoDB setup using mongo-tools (a separate PR with a script for that) or ideally just work with the JSON file directly or using a smaller database (such as TinyDB) to analyze the data.

Description

This PR introduces a shell script and a kubernetes job that makes use of Sling to export MongoDB data to an S3 bucket.

Testing

Tested on both an AWS and localhost deployment of ArmoniK.

Impact

Not Applicable.

Additional Information

None.

Checklist

  • My code adheres to the coding and style guidelines of the project.
  • I have performed a self-review of my code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • I have thoroughly tested my modifications and added tests when necessary.
  • Tests pass locally and in the CI.
  • I have assessed the performance impact of my modifications.

Copy link

sonarqubecloud bot commented Dec 5, 2024

export MONGODB="mongodb://$MONGO_USER:$MONGO_PASS@$MONGO_HOST:$MONGO_PORT/?authSource=database"

# Run the Sling command
sling run --src-conn MONGODB --src-stream 'database.TaskData' --tgt-conn S3 --tgt-object "s3://{{BUCKET_NAME}}/{{FILENAME}}_TaskData.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to export the TaskData collection ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now.. the imports have to be done per collection so this won't be "generalized" until I decide on how to do imports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants