Take-Home-Engineering-Challenge
Getting around large cities can be quite a hassle, and New York City is no exception. One thing that helps, is being able to predict how long a trip might take and how much that trip might cost. Luckily, NYC provides public data about transportation which includes all of the metrics we need!
Your assignment, is to help us quickly look at transportation fare data for tips between different boroughs in NYC so that when we travel there, it is easier for us to get around.
This is a freeform assignment. You can write a web API that returns a set of trip metrics. You can write a web frontend that visualizes the trips and shows cheapest/fastest options. We also spend a lot of time in the shell, so a CLI that gives us a few options would be great. And don't be constrained by these ideas if you have a better one!
The only requirements for the assignment are:
- We can filter based on yellow cab, green cab, and for-hire vehicle.
- We can provide a start and end borough for our trip.
- We can filter based on datetime.
- The returned data shows some interesting metrics that will help us get around.
- Your code is well-tested.
- Documentation is provided for how to build and run your code.
Feel free to tackle this problem in a way that demonstrates your expertise of an area -- or takes you out of your comfort zone. For example, if you build Web APIs by day and want to build a frontend to the problem or a completely different language instead, by all means go for it - learning is a core competency in our group. Let us know this context in your solution's documentation.
New York City transportation data is located here. A copy of the Jan 2018 data as well is located here.
- Clone the repo
- Download Jan 2018 data and extract them into the application folder. The data CSV files need to be in the same folder as the Python file.
- Run the application using Python.exe
I converted one of the taxi zones file to a CSV to make this work.
I knew using Python that the application wouldn't be scalable and the most performant. I just wanted to get a proof of concept working in the time constraint.I would definitely look into making this scalable and efficient. There has to be a way this can run in Azure. Azure Databricks?.
I only made it work with Jan 2018 data. What about other months? Create web apis to get the data.
I was not able to make the data filterable by datatime. I would make that work.
I was going to add some visualization to it by plotting the fares in a chart by using Matplotlib
So better UI, scalable and fast data analysis. I think the answer to all my wants exist somewhere in Azure,This might also work as a Power App and a Databricks connector. I wish I had more time!