This model is based on the Kaggle Medical Transcriptions dataset. It has medical transcripts along with the medical specialty they represent. We will build a classifier that will predict the medical specialty given the transcription text. While the dataset has thousands of specialties, we limit ourselves to a subset of 10.
Create a virtual environment using your favorite environment management tool and install the requirements. As an example,
python3 -m venv env
source env/bin/activate
pip3 install arthurai
pip3 install -r requirements.txt
Note that the requirements.txt
file in this directory assumes python versions 3.6-3.8
, as these are currently the only supported versions for the arthur SDK.
The notebook notebooks/Quickstart.ipynb shows an example of onboarding a model and sending data.
While this repo contains a pre-trained model and everything else you need to get started, the code used to generate the model is included for your reference.
- create_model.py is the code used to create the model
- entrypoint.py is the code used to enable explainability
- Pickle files are result of running
create_model.py