This repository is derived from the three-part video series for Module 3 - Machine Learning with Snowflake: Snowpark ML Modeling. The original code was provided by Northstar Builder Education as part of the Intro to Snowflake for Devs, Data Scientists, Data Engineers course on Coursera. I have made modifications to create a complete end-to-end workflow.
-
You are welcome to use this repository as a reference or starting point for your own project.
-
If you choose to fork this repository, please ensure that you comply with the terms of the Apache License and give proper credit to the original authors.
As a data engineer, my obsession with the ice cream sandwiches from the Freezing Point food truck has led me to take on this project. The challenge is that the truck’s location is unpredictable. The food truck company informs me that the truck visits only one neighborhood per day, chosen by the driver, and there are eight possible neighborhoods it might go to. They also provide me with 20 years of historical location data.
My task is to load this historical data into Snowflake, a cloud-based data warehousing platform, and analyze it to identify patterns. The goal is to develop a predictive model that can forecast the truck’s location on any given day.
- Create and upload a dataset to Snowflake
- Clean and transform the data
- Train an XGBoost model on the prepared data
- Evaluate the model's performance
- Register the trained model in the Snowflake Model Registry
For this project, the Freezing Point data was generated based on the truck driver’s strict routine, which she has followed for the past 20 years to decide which neighborhood to visit each day.
Here’s her routine: In January, she visits neighborhood 1 on the 1st, 8th, 15th, 22nd, and 29th because her mother lives there, and she likes to see her weekly. She goes to neighborhood 2 on all other days in January. From February to November, she follows a pattern where she visits neighborhood 1 on the 1st, neighborhood 2 on the 2nd, neighborhood 3 on the 3rd, and so on, looping back to neighborhood 1 after neighborhood 7. This pattern continues until the end of the month, when it restarts. December is straightforward; she visits neighborhood 8 every day because she loves the holiday decorations there.
Using this description of her neighborhood selection algorithm, one year’s worth of data was generated. This dataframe was then concatenated 20 times and uploaded to Snowflake.
- Northstar Builder Education © 2024 Snowflake Inc. All rights reserved.