Checkoff 2 #18

sheikhshack · 2020-11-24T19:59:23Z

Hey guys, can comment whatever questions here to be asked to prof. Thanks!

sheikhshack · 2020-11-25T18:58:43Z

Production

Do we need to use elastic IPs? Can we use placement groups for increasing throughput ?
Are we graded based on speed of deployment? Or will just a deployment script that works be sufficient
For 'take in credentials as input', are we assuming user feeds via .\aws\creds or must we let user specify each argument via CLI?
For teardown, is terminating instances sufficient

Can we just install mongo on the namenode and do an import direct via mongo cli?
Or can we import mongo db data via spark?

Any performance metric/benchmark we should compare with to evaluate how good our script is?
What is meant by code quality?

sheikhshack · 2020-11-26T13:11:20Z

Checkpoint 2 DB
Creds

Can be either
Elastic IP
Just continue as we were
Speed
Minimum requirements is functional
If you can do it fast, more points
Flintrock
You are allowed to use it!
Logs
Yep this is what we want
Hadoop and Spark
Store output of TF-IDF on a file or on the CLI
TF-IDF might be very big, might want to store it somewhere
Timing of performance
Set up of infra, and running processes
Prof will specify how many nodes he wants
Startup when
Separate scripts for production and analytics
Give prof the option to startup up the analytics when production is done

sheikhshack added the question label Nov 24, 2020

sheikhshack added this to the Checkoff 2 milestone Nov 24, 2020

sheikhshack assigned sheikhshack, ERJoseJohnson, taykiathong, onglichang and jeroee Nov 24, 2020

Repository owner deleted a comment from muhammadmnorouzi Feb 23, 2024

Repository owner deleted a comment from naudachu Mar 2, 2024