MLOPS Group 46: Final project

Final project for DTU 02476 Machine Learning Operations course

By: Kevin Moore, Mario Medoni, Angeliki-Artemis Doumeni and Konstantina Freri

Project description

Overall Goal of the Project

Our project leverages natural language processing (NLP) to identify mental health disorders from Reddit posts across five major mental health subreddits. By building and fine-tuning a machine learning model, we aim to classify Reddit posts into categories that reflect what the author might suffer from or seek help with. This approach showcases the potential of NLP in analyzing user-generated content and offers a step toward developing tools to enhance mental health awareness and support systems.

What Framework Are You Going to Use?

We will use the Hugging Face Transformers library, a powerful tool for NLP tasks. It offers pre-trained transformer models such as BERT and RoBERTa, known for their strong performance in understanding textual data. These models will be fine-tuned using both the title and selftext (body) of Reddit posts to classify them into five specific mental health disorder categories. This framework enables us to utilize state-of-the-art NLP techniques efficiently while focusing on our specific classification problem.

Additionally, we will incorporate PyTorch Lightning to streamline code and reduce boilerplate, enabling faster experimentation and improved project organization.

What Data Are You Going to Run On?

The project will utilize the “Mental Disorders Identification (Reddit)” dataset from Kaggle. This dataset contains approximately 700k rows, including:

Post title (string)
Post body (selftext) (string)
Creation time (timestamp)
Over 18 tag (boolean)
Subreddit name (mental disorder label, string)

Note: We will exclude the mentalillness subreddit, as it represents a generalized disorder, reducing the classification task to five subreddits.

Link to the Kaggle Dataset: Mental Disorders Identification (Reddit)

What Models Do You Expect to Use?

We will fine-tune transformer models such as BERT and RoBERTa. As a baseline, we will use random guessing, assuming a balanced dataset where each category has a 20% probability of being predicted. This baseline will provide a clear comparison to demonstrate the performance of our models.

Project structure

The directory structure of the project looks like this:

├── .github/                  # Github actions and dependabot
│   ├── dependabot.yaml
│   └── workflows/
│       └── tests.yaml
├── configs/                  # Configuration files
├── data/                     # Data directory
│   ├── processed
│   └── raw
├── dockerfiles/              # Dockerfiles
│   ├── api.Dockerfile
│   └── train.Dockerfile
├── docs/                     # Documentation
│   ├── mkdocs.yml
│   └── source/
│       └── index.md
├── models/                   # Trained models
├── notebooks/                # Jupyter notebooks
├── reports/                  # Reports
│   └── figures/
├── src/                      # Source code
│   ├── project_name/
│   │   ├── __init__.py
│   │   ├── api.py
│   │   ├── data.py
│   │   ├── evaluate.py
│   │   ├── models.py
│   │   ├── train.py
│   │   └── visualize.py
└── tests/                    # Tests
│   ├── __init__.py
│   ├── test_api.py
│   ├── test_data.py
│   └── test_model.py
├── .gitignore
├── .pre-commit-config.yaml
├── LICENSE
├── pyproject.toml            # Python project file
├── README.md                 # Project README
├── requirements.txt          # Project requirements
├── requirements_dev.txt      # Development requirements
└── tasks.py                  # Project tasks

Created using mlops_template, a cookiecutter template for getting started with Machine Learning Operations (MLOps)

Development commands

Run FASTApi locally with the model

Build docker docker build -t api-model -f dockerfiles/api.dockerfile .
Run container docker run --name mycontainer -p 8000:8000 api-model
Open http://localhost:8000/docs
Profit

Deploy docker api to Google Cloud Run

Build docker (remember to specify x86_64 if youre on Apple silicon) docker buildx build --platform linux/amd64 -t api-model -f dockerfiles/api.dockerfile .
(optional) set project to the correct in gcloud gcloud config set project dtumlops-448110
Tag the image docker tag api-model gcr.io/dtumlops-448110/api-model:latest
Push docker to gcloud docker push gcr.io/dtumlops-448110/api-model:latest
Run/Deploy gcloud service

gcloud run deploy api-model-service \
    --image gcr.io/dtumlops-448110/api-model:latest \
    --platform managed \
    --region europe-west1 \
    --allow-unauthenticated \
    --port 8000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLOPS Group 46: Final project

Project description

Overall Goal of the Project

What Framework Are You Going to Use?

What Data Are You Going to Run On?

What Models Do You Expect to Use?

Project structure

Development commands

Run FASTApi locally with the model

Deploy docker api to Google Cloud Run

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
.dvc		.dvc
.github		.github
dockerfiles		dockerfiles
docs		docs
models		models
reports		reports
src/final_project		src/final_project
tests		tests
.dvcignore		.dvcignore
.gcloudignore		.gcloudignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
data.dvc		data.dvc
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
tasks.py		tasks.py
training_data.pt		training_data.pt

License

moorekevin/dtu-02476-mlops-project

Folders and files

Latest commit

History

Repository files navigation

MLOPS Group 46: Final project

Project description

Overall Goal of the Project

What Framework Are You Going to Use?

What Data Are You Going to Run On?

What Models Do You Expect to Use?

Project structure

Development commands

Run FASTApi locally with the model

Deploy docker api to Google Cloud Run

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages