Skip to content

KyleLJohnson/Take-Home-Engineering-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Take-Home-Engineering-Challenge

Take-Home-Engineering-Challenge

The Problem

Getting around large cities can be quite a hassle, and New York City is no exception. One thing that helps, is being able to predict how long a trip might take and how much that trip might cost. Luckily, NYC provides public data about transportation which includes all of the metrics we need!

Your assignment, is to help us quickly look at transportation fare data for tips between different boroughs in NYC so that when we travel there, it is easier for us to get around.

This is a freeform assignment. You can write a web API that returns a set of trip metrics. You can write a web frontend that visualizes the trips and shows cheapest/fastest options. We also spend a lot of time in the shell, so a CLI that gives us a few options would be great. And don't be constrained by these ideas if you have a better one!

The only requirements for the assignment are:

  1. We can filter based on yellow cab, green cab, and for-hire vehicle.
  2. We can provide a start and end borough for our trip.
  3. We can filter based on datetime.
  4. The returned data shows some interesting metrics that will help us get around.
  5. Your code is well-tested.
  6. Documentation is provided for how to build and run your code.

Feel free to tackle this problem in a way that demonstrates your expertise of an area -- or takes you out of your comfort zone. For example, if you build Web APIs by day and want to build a frontend to the problem or a completely different language instead, by all means go for it - learning is a core competency in our group. Let us know this context in your solution's documentation.

New York City transportation data is located here. A copy of the Jan 2018 data as well is located here.

Instructions:

  1. Clone the repo
  2. Download Jan 2018 data and extract them into the application folder. The data CSV files need to be in the same folder as the Python file.
  3. Run the application using Python.exe

Thinking Out Loud

Rationale Behind Technical Choice

When I first saw the large data files I immediately thought of Python. Due to the time constraint with Python along with a few modules, I could get something working outputting quickly. I'm not a Data Scientist but Python is a great tool for this sort of problem and I have some knowledge of Pandas (Python Data Analysis Library)

I converted one of the taxi zones file to a CSV to make this work.

Trade-Offs

I knew using Python that the application wouldn't be scalable and the most performant. I just wanted to get a proof of concept working in the time constraint.

If I Had More Time

I would definitely look into making this scalable and efficient. There has to be a way this can run in Azure. Azure Databricks?.

I only made it work with Jan 2018 data. What about other months? Create web apis to get the data.

I was not able to make the data filterable by datatime. I would make that work.

I was going to add some visualization to it by plotting the fares in a chart by using Matplotlib

So better UI, scalable and fast data analysis. I think the answer to all my wants exist somewhere in Azure,

This might also work as a Power App and a Databricks connector. I wish I had more time!

About

Take-Home-Engineering-Challenge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages