Models evaluated in "ChemTables: A dataset for semantic classificationon tables in chemical patents"
Implementation of Table-BERT
Implementation of TabNet and TBResNet
Implementation of SVM and Naive Bayes baselines
Name your downloaded patents in XML format with their patent IDs. Then wrap each individual patent with a folder named by its patent ID. Then run the following script
python extract_tables_from_xmls.py [xml_root] [output_path]