Skip to content

ConNER, a neural sequence labeling model, identifies academic concepts in text using BIO tagging.

License

Notifications You must be signed in to change notification settings

superseted/conner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ConNER

Poster

ConNER is a neural sequence labeling model that identifies domain-specific concepts in text using BIO (Beginning, Inside, Outside) tagging. It's built on top of BERT and fine-tuned for concept extraction tasks.

Features

  • BERT-based token classification architecture
  • BIO tagging scheme for concept boundary detection
  • Support for variable-length sequences
  • Automatic handling of WordPiece tokenization
  • Configurable maximum sequence length
  • Built-in concept extraction pipeline

Example

Given any text paragraph (no longer than 3 sentences), ConNER can extract the academic concepts from the text. For example, given the following text:

Understanding mental health and brain chemistry requires studying psychology.

ConNER will output the following entities:

['mental health', 'brain chemistry']

Same as the following code:

from conner import ConNER

model = ConNER.load_model("saved_models/conner")

concepts = model.extract_concepts(
  "Understanding mental health and brain chemistry requires studying psychology."
)

print(f"Identified concepts: {", ".join(concepts)}")

Architecture

  • Base: BERT (default: prajjwal1/bert-tiny)
  • Dropout layer (rate=0.1)
  • Dense classification layer (3 classes: O, B-CONCEPT, I-CONCEPT)
  • Attention mask application for variable-length sequences
  • Label scheme:
    • O: Non-concept tokens
    • B-CONCEPT: Beginning of concept
    • I-CONCEPT: Inside/continuation of concept

About

ConNER, a neural sequence labeling model, identifies academic concepts in text using BIO tagging.

Topics

Resources

License

Stars

Watchers

Forks