Integrating ModelKits into Jupyter Notebook Workflows: A Practical Example

Introduction

The kaggle competition, Titanic - Machine Learning from Disaster, issues a challenge to create a model that uses Titanic passenger data (name, age, price of ticket, etc) to try to predict who survived and who died.

While this Notebook does build out a solution to the problem posed, the primary goal isn't to create the best predictive model, but, instead, to demonstrate how to leverage KitOps Modelkits within a machine learning workflow.

And,though the current context applies to Jupyter Notebooks written in Python, the code provided could be used just as effectively in workflows existing outside of a Notebook environment, as well. Also, the code's functionality could be easily reproduced in other programming languages.

Before You Begin

If you haven't aready done so, sign up for a free account with Jozu.ml
After you log into Jozu, add a new Repository named "titanic-survivability", which we'll use in this Notebook.
In the same directory as this Notebook--which we'll call the Project directory--create a .env file.
Edit the .env file and add an entry for your JOZU_USERNAME, your JOZU_PASSWORD and your JOZU_NAMESPACE (aka your Personal Organization name). For example:

    [email protected]
    JOZU_PASSWORD=my_password
    JOZU_NAMESPACE=brett

Be sure to save the changes to your .env file before continuing.

Project Setup

Set Up Your Python Environment

This project was created using Python 3.12, but should work for Python versions >= 3.9.
We recommend using a Python or Conda virtual environment to isolate this project's code to prevent it from affecting the system-installed Python.
If you name your Python or Conda environment something other than ".venv" or "venv", then be sure to add the name to the .gitignore file. This step assumes you'll be using git for version control of this project.

Project Deliverables

While working through the project's Notebook, a total of three separate ModelKits will be packed and pushed to Jozu Hub:

Immediately after the training and test datasets are loaded, the first ModelKit version is packed and pushed with a tag similar to: collated-data_v1_2024-10-21_13-03-28_UTC.
After the data has been cleaned and processed for model training, but before any model training is done, the second ModelKit version is packed and pushed with a tag similar to: processed-data_v2_2024-10-21_13-19-02_UTC.
Finally, when the model has been trained and validated, the third ModelKit version is packed and pushed with a tag similar to: trained_model_v2_2024-10-21_13-42-49_UTC.

You can view the details for these tagged ModelKit versions by viewing your titanic-survivability repository in Jozu Hub.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
docs		docs
model		model
template		template
.gitattributes		.gitattributes
.gitignore		.gitignore
Kitfile		Kitfile
kitfile_helpers.py		kitfile_helpers.py
modelkit_helpers.py		modelkit_helpers.py
requirements.txt		requirements.txt
titanic_survivability.ipynb		titanic_survivability.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Integrating ModelKits into Jupyter Notebook Workflows: A Practical Example

Introduction

Before You Begin

Project Setup

Set Up Your Python Environment

Project Deliverables

About

Releases

Packages

Contributors 2

Languages

brett-hodges/notebook-with-modelkit

Folders and files

Latest commit

History

Repository files navigation

Integrating ModelKits into Jupyter Notebook Workflows: A Practical Example

Introduction

Before You Begin

Project Setup

Set Up Your Python Environment

Project Deliverables

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages