Skip to content

Commit

Permalink
Merge pull request #1 from bettersg/housekeeping-dev-setup
Browse files Browse the repository at this point in the history
Housekeeping for improved dev setup
  • Loading branch information
yevkim authored Aug 13, 2024
2 parents a17458b + ec39af2 commit 7bcece2
Show file tree
Hide file tree
Showing 2,942 changed files with 752 additions and 240,274 deletions.
Binary file removed .DS_Store
Binary file not shown.
177 changes: 175 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,182 @@
# Initially taken from Github's Python gitignore file

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# tests and logs
tests/fixtures/cached_*_text.txt
logs/
lightning_logs/
lang_code_data/

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# vscode
.vs
.vscode

# Pycharm
.idea

# TF code
tensorflow_code

# Models
proc_data

# examples
runs
/runs_old
/wandb
/examples/runs
/examples/**/*.args
/examples/rag/sweep

# data
/data
serialization_dir

# emacs
*.*~
debug.env

# vim
.*.swp

#ctags
tags

# pre-commit
.pre-commit*

# .lock
*.lock

# DS_Store (MacOS)
.DS_Store

# ruff
.ruff_cache


backend/.env
backend/ml_logic/schemesv2-torch-allmpp-model
backend/ml_logic/schemesv2-torch-allmpp-model/config.json
backend/ml_logic/schemesv2-torch-allmpp-model/model.safetensors
backend/ml_logic/schemesv2-torch-allmpp-tokenizer
backend/ml_logic/schemesv2-torch-allmpp-tokenizer/special_tokens_map.json
backend/ml_logic/schemesv2-torch-allmpp-tokenizer/tokenizer_config.json
backend/ml_logic/schemesv2-torch-allmpp-tokenizer/tokenizer.json
backend/ml_logic/schemesv2-torch-allmpp-tokenizer/tokenizer_config.json
backend/ml_logic/schemesv2-torch-allmpp-tokenizer/vocab.txt
backend/ml_logic/schemesv2-your_embeddings.npy
backend/ml_logic/schemesv2-your_index.faiss
backend/ml_logic/schemesv2-your_index.faiss
1 change: 0 additions & 1 deletion Procfile

This file was deleted.

146 changes: 133 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,141 @@ The technical journey to realize this vision has been nothing short of transform
For data processing, we employed **NLP (Natural Language Processing)** techniques, using tools like **spacy** and **re** for preprocessing and lemmatization. The **sentence-transformers all-mpnet-base-v2** model then helped us generate embeddings that truly captured the nuances of each scheme's purpose. We used **FAISS** to create a powerful indexing system, enabling users to search and retrieve scheme information efficiently.


## Getting started
### Prerequisites

### What's here:
Ensure you have the following installed on your machine:

* [Flask](https://docs.streamlit.io/) on the frontend
* [FastAPI](https://fastapi.tiangolo.com/) on the backend
* Backend and frontend can be deployed with Docker
- **Docker**: [Install Docker](https://docs.docker.com/get-docker/)
- **Docker Compose**: Docker Compose is included with Docker Desktop
- **Poetry 1.8.3**: [Install Poetry](https://python-poetry.org/docs/#installation)
- **Python 3.9 above**: [Install Python](https://www.python.org/downloads/)
- **Download model files**: Download the model files from Google Drive or build yourself using model-creation-transformer-faiss.ipynb

> From inside the `backend` folder:
Download the model files from Google Drive or build yourself using model-creation-transformer-laiss.ipynb
Run export KMP_DUPLICATE_LIB_OK=TRUE if you facing issue
You can serve the API with `uvicorn fast_api.api:app --reload` (default port is `8000`)
### Developing in local environment

> From inside the `root` folder:
<!-- Create .env file and make sure you have BigQuery setup -->
You can serve the frontend with `python app.py`
We use Poetry as the dependency manager because it provides a consistent and straightforward way to manage dependencies and virtual environments across both Windows and Mac systems.

### Docker
Refer to Readme files in backend folder
#### Poetry config set up (Must Run)
```bash
# ensure all poetry environments are installed in the directory
poetry config virtualenvs.in-project true
```

#### Developing in jupyter notebooks

You may launch jupyter notebook via poetry or use visual studio code's native jupyter extension.

1. Launch jupyter notebook via poetry
```bash
# Change directory to backend or frontend because the pyproject.toml file is in those directories
cd backend

# deactivate any existing virtual environment, i.e. anaconda
deactivate

# Create new python virtual env and install dependencies using poetry
poetry install
# notice that a .venv/ directory will be installed in the directory

# Initialize python virtual env
poetry shell

# launch jupyter notebook
jupyter notebook
```

2. Launch visual studio code and open jupyter notebook (preferred method)

#### Deploy frontend and backend locally

```bash
# Install dependencies for frontend and deploy flask app w/ vanilla js
cd frontend
deactivate
poetry install
poetry run python app.py

# In a new terminal, install dependencies for backend and deploy locally
cd ../backend
deactivate
poetry install
poetry run uvicorn fast_api.api:app --host 0.0.0.0 --port 8000

# If the virtual environment fails to build due to an invalid poetry.lock file:
## Regenerate poetry.lock file and re-install
poetry lock --no-update
poetry install

# Access the frontend service:
# Open your browser and navigate to http://localhost:9099.
```

### Deployment in local docker

Local Docker environment is designed to closely replicate our production environments. This ensures that the application behaves consistently from development to deployment, minimizing issues that might arise due to differences in individual developer setups.

For example, if you make changes to core logic in `backend/` directory, you have to deploy the both frontend and backend via docker and perform functionality tests manually.

```bash
# Build the image and run the containers
docker compose up --build

# Access the frontend service:
# Open your browser and navigate to http://localhost:9099.
```

### Adding Development Dependencies

To add dependencies specifically for development (e.g., testing frameworks, linters, etc.), you can use Poetry's `add` command with the `--group dev` option. This ensures that these dependencies are only installed in development environments and not in production.

<details>
<summary>Poetry useful commands</summary>

```bash
# Activate the Poetry environment
poetry shell

# Install all dependencies (including development dependencies if needed)
poetry install

# Add a production dependency
poetry add <dependency_name>
# e.g. poetry add pandas
# e.g. poetry add pandas@^2.2.2

# Add a development dependency
poetry add --group dev <dependency_name>

# Remove a dependency
poetry remove <dependency_name>

# Regenerate the poetry.lock file without updating dependencies
poetry lock --no-update

# List all installed dependencies
poetry show --all

# List only production dependencies
poetry show --only main

# List only development dependencies
poetry show --only dev

# Check the status of dependencies (e.g., if they are outdated)
poetry show --outdated

# Run a script or command within the Poetry environment
poetry run <command>
# e.g. poetry run python app.py

# Check the project's environment and configuration
poetry check
```
</details>

---

# Notes
- Ensure your Docker Desktop is running before executing any Docker commands.
- If you encounter any issues, you can pm Traci on slack or whatsapp.
Binary file removed __pycache__/config.cpython-310.pyc
Binary file not shown.
Binary file removed __pycache__/config.cpython-312.pyc
Binary file not shown.
Binary file removed __pycache__/schemes_model.cpython-310.pyc
Binary file not shown.
Binary file removed __pycache__/schemes_model.cpython-312.pyc
Binary file not shown.
Binary file removed __pycache__/schemes_model.cpython-38.pyc
Binary file not shown.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
42 changes: 33 additions & 9 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,17 +1,41 @@
FROM python:3.10.6-buster
# Use a lightweight official Python image as a parent image
FROM python:3.9-slim-buster

# Set environment variables to ensure that Python outputs everything in the container and doesn't buffer stdout/stderr
ENV PYTHONUNBUFFERED=1 \
POETRY_VERSION=1.8.3 \
POETRY_VIRTUALENVS_CREATE=false \
POETRY_NO_INTERACTION=1 \
KMP_DUPLICATE_LIB_OK=TRUE

# Install Poetry and create a non-root user
RUN apt-get update && apt-get install --no-install-recommends -y curl \
&& curl -sSL https://install.python-poetry.org | python3 - \
&& ln -s $HOME/.local/bin/poetry /usr/local/bin/poetry \
&& apt-get purge -y --auto-remove curl \
&& rm -rf /var/lib/apt/lists/* \
&& useradd --create-home appuser

# Set the working directory
WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy only the necessary files to install dependencies
COPY --chown=appuser:appuser pyproject.toml poetry.lock /app/

COPY . .
# Install production dependencies only
RUN poetry install --only main --no-root

# require download spacy en core web sm.run
# Copy the rest of the application code and set ownership to the non-root user
COPY --chown=appuser:appuser . /app

# Download the spaCy model
RUN python -m spacy download en_core_web_sm

# might face issue with library version conflict. run:
# export KMP_DUPLICATE_LIB_OK=TRUE
# Switch to non-root user
USER appuser

# Expose the port (optional: you can set the port via an environment variable)
EXPOSE $PORT

# You can add --port $PORT if you need to set PORT as a specific env variable
CMD uvicorn fast_api.api:app --host 0.0.0.0 --port $PORT
# Run the application with Uvicorn using shell form to expand environment variables
CMD uvicorn fast_api.api:app --host 0.0.0.0 --port ${PORT:-8000}
Loading

0 comments on commit 7bcece2

Please sign in to comment.