Medical Named Entity Recognition (MedNER)

Overview

Medical Named Entity Recognition (MedNER) is a deep learning-based project designed to extract medical entities from text using a fine-tuned BERT model. This project utilizes the Hugging Face transformers library to identify named entities such as diseases, medications, genes, and other biomedical terms.

Project Outline

Dataset
- The dataset is sourced from parsa-mhmdi/Medical_NER on Hugging Face.
- It consists of tokenized medical text with annotated named entities in the IOB format.
Model
- A fine-tuned bert-base-cased model is used for Named Entity Recognition (NER).
- The model is trained using the Hugging Face Trainer API.
Training Pipeline
- Tokenization using AutoTokenizer from Hugging Face.
- Data alignment to match tokenized input with entity labels.
- Training with evaluation and model selection based on best validation performance.
Deployment
- The trained model is deployed as a Hugging Face Space using Gradio.
- A web-based interactive demo is provided for real-time text analysis.

Project Files

This repository contains the following essential files:

.git - Version control folder (not necessary for direct use).
.gradio - Configuration files for Gradio interface settings.
.gitattributes - Defines Git LFS tracking for large files.
app.py - Main script for running the Gradio interface.
config.json - Configuration file for the model, specifying hyperparameters.
README.md - Documentation containing project details and usage instructions.
requirements.txt - Lists all dependencies required to run the project.
tokenizer.json - Tokenizer configuration containing vocabulary and model-specific settings.
tokenizer_config.json - Configuration settings for the tokenizer.
trainer_code.ipynb - Jupyter Notebook containing training scripts and model fine-tuning process.
vocab.txt - Vocabulary file used by the tokenizer.

Installation

To run the project locally, clone the repository and install dependencies:

git clone https://huggingface.co/spaces/parsa-mhmdi/MedNER
cd MedNER
pip install -r requirements.txt

Usage

Run the application using:

python app.py

This will launch a Gradio interface where you can enter medical text to identify named entities.

Training the Model

To train the model from scratch, run the following script:

python train.py

This will:

Load the dataset
Tokenize and preprocess text
Train the bert-base-cased model
Save the best-performing model checkpoint

Model Compression & Upload

To save storage space, the best model is compressed and uploaded to Hugging Face:

import shutil
shutil.make_archive("./ner_model_compressed", 'zip', "./ner_model")

The compressed model is then uploaded to the repository:

from huggingface_hub import upload_folder
upload_folder(repo_id="parsa-mhmdi/MedNER", folder_path="./ner_model_compressed.zip")

Demo Link

Try the live demo of MedNER on Hugging Face Spaces: 🔗 MedNER Hugging Face Space

Contribution

We welcome contributions! Feel free to fork the repository and submit a pull request with improvements.

License

This project is open-source and available under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Named Entity Recognition (MedNER)

Overview

Project Outline

Project Files

Installation

Usage

Training the Model

Model Compression & Upload

Demo Link

Contribution

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gradio		.gradio
.gitattributes		.gitattributes
README.md		README.md
app.py		app.py
config.json		config.json
requirements.txt		requirements.txt
tokenizer.json		tokenizer.json
tokenizer_config.json		tokenizer_config.json
trainer_code.ipynb		trainer_code.ipynb
vocab.txt		vocab.txt

PARSA-MHMDI/MedNER

Folders and files

Latest commit

History

Repository files navigation

Medical Named Entity Recognition (MedNER)

Overview

Project Outline

Project Files

Installation

Usage

Training the Model

Model Compression & Upload

Demo Link

Contribution

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages