Skip to content

carbonsilicon-ai/CarsiChemIE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CarsiChemIE

Molecule detection

the result of molecule detection

Model IOU@50% Precision IOU@50% Recall IOU@75% Precision IOU@75% Recall IOU@90% Precision IOU@90% Recall
DECERMER 0.852 0.907 0.374 0.578 0.004 0.007
MolCoref 0.820 0.895 0.539 0.710 0.009 0.058
yolo 0.987 0.992 0.678 0.720 0.004 0.029
Ours 0.920 0.950 0.898 0.933 0.828 0.891
Ours(+yolo) 0.962 0.992 0.928 0.972 0.914 0.966

the result accurancy of match of index and molecule

Model IOU@50% Precision IOU@50% Recall IOU@75% Precision IOU@75% Recall IOU@90% Precision IOU@90% Recall
MolCoref 0.991 0.752 0.993 0.669 1.000 0.042
Ours 1.000 0.727 1.000 0.583 1.000 0.020
Ours(+yolo) 1.000 0.935 1.000 0.899 1.000 0.939

visualization

Visualization 1 Visualization 2

the light box is the Idtentiter of molecule, while oridinary box is text or molecule of moleclue.

Molecule Structure Recognization

The Result of Molecule Structure Recognization

Model Synthetic Realistic Pertubed
Indigo ChemDraw USPTO Staker CLEF UOB ACS CLEF_p UOB_p USPTO_p Staker_p
OSRA* 95.00 87.30 87.40 0.00 84.60 78.50 55.30 11.50 68.30 4.00 0.00
MolScibe 97.50 93.80 92.60 86.90 88.90 87.90 71.90 90.40 86.70 92.50 65.00
Ours 98.59 97.02 94.78 88.21 93.70 90.26 81.91 90.04 89.09 94.67 69.81

visualization

Visualization 1 Visualization 2

the red box here is that much error (especially the recognition of chiral bonds) in molecule recoginazation.

the result of table detection

Model IOU@50% Precision IOU@50% Recall IOU@75% Precision IOU@75% Recall IOU@90% Precision IOU@90% Recall
DETR 0.807 0.818 0.665 0.712 0.112 0.246
Ours 0.961 0.996 0.926 0.975 0.762 0.881

the result of table structure recoginization (cell level)

Model IOU@50% Precision IOU@50% Recall IOU@75% Precision IOU@75% Recall IOU@90% Precision IOU@90% Recall
DETR 0.455 0.560 0.118 0.163 0.009 0.062
Ours 0.772 0.777 0.759 0.775 0.731 0.759

visulization

Original Image Table Transformer Ours
Original Image 1 Table Transformer Prediction 1 Our Prediction 1
Original Image 2 Table Transformer Prediction 2 Our Prediction 2
Original Image 3 Table Transformer Prediction 3 Our Prediction 3
Original Image 4 Table Transformer Prediction 4 Our Prediction 4
Original Image 5 Table Transformer Prediction 5 Our Prediction 5
Original Image 6 Table Transformer Prediction 6 Our Prediction 6

the result of ocr

model CER
easyocr 0.2259
docTR 0.3410
tesseract 0.2435
LatexOCR(ours) 0.0384

our ocr model based LatexOCR from Texteller to read latex code from image because there are various formats in chemical texts, such as superscripts, subscripts, and unconventional characters, we finetuned this model and add some postprocess such as latex2text to convert

visualization

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published