Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In progress: fix ntp references #3

Draft
wants to merge 80 commits into
base: master
from
Draft
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
3353ddf
Added preannotated training file
m-makulec Nov 15, 2022
4d8e05f
Properly annotated train file
m-makulec Nov 15, 2022
0e2a30e
TOREVISE: added NTP, NIH and NIEHS to Lexicon
m-makulec Nov 15, 2022
b4fe82d
1st alternative citation model
m-makulec Nov 16, 2022
c5d07a5
Created different versions of TEI annotation for paper in question
m-makulec Nov 16, 2022
922bc97
removed v1 rm5 file
m-makulec Nov 16, 2022
b3e6eb6
Added model v2
m-makulec Nov 17, 2022
a81aa96
Removed "National Toxicology Program as surname from Lexicon"
m-makulec Nov 17, 2022
62bdb6d
New model, properly parsing NTP references
m-makulec Nov 17, 2022
00bac56
Added old models
m-makulec Nov 19, 2022
bc3482a
Added new models
m-makulec Nov 19, 2022
e895b41
Merge branch 'fix-NTP-references' of https://github.com/evidenceprime…
m-makulec Nov 19, 2022
5ae7c0a
Cleaned up old XML files, removed all but 1 citation model
m-makulec Nov 24, 2022
7abb7cc
Added preannotated files
m-makulec Dec 1, 2022
71d3a09
Annotated Abdulrahman 2012 for segmentation
m-makulec Dec 1, 2022
ac9c792
Annotated Al Digheari 2018 for reference model
m-makulec Dec 6, 2022
c428a77
Copied Al DIgheari 2018 pdf to output dir
m-makulec Dec 7, 2022
8610bcc
Brought dots in AL DIgheari back (GROBID guidelines compliance)
m-makulec Dec 7, 2022
2a59aee
Replace invalid <header> for <footnote> in Abdulrahman 2012
m-makulec Dec 7, 2022
671f19e
Told GROBID to ignore Ingenta notes
m-makulec Dec 7, 2022
b1fbbba
Annotated Blaiss 2007 for segmentation model
m-makulec Dec 7, 2022
68a8062
Annotated Blaiss 2007 for fulltext model
m-makulec Dec 7, 2022
457462a
Removed redundant Blaiss 2007 filles, copied pdf to output
m-makulec Dec 7, 2022
3bc2b55
Removed Blauvelt 2021 from dataset
m-makulec Dec 7, 2022
e001ed7
Annotated Blome 2020 for segmentation model
m-makulec Dec 8, 2022
a261171
Annotated Blome 2020 for fulltext model
m-makulec Dec 8, 2022
cc5b6f3
Remobed other Blome 2020 files
m-makulec Dec 8, 2022
9ce1883
Replaced Bousquet with appropriate version
m-makulec Dec 8, 2022
3135d11
Added preannotated files for Bousquet 2017
m-makulec Dec 8, 2022
25468a6
Annotated Bousquet 2017 for segmentation model
m-makulec Dec 8, 2022
df706f3
Removed othe Bousquet 2017 files
m-makulec Dec 8, 2022
81ab516
Copied Blome and Bousquet pdfs to output
m-makulec Dec 8, 2022
4fb23df
Annotated Ellis 2020 for segmentation
m-makulec Dec 8, 2022
38c6b7e
Removed old Ellis 2020 files
m-makulec Dec 8, 2022
7d341be
Annotated Grewar 1998 for segmentation
m-makulec Dec 12, 2022
af39b6b
Removed other Grewar 1998 files - fulltext is yet to be annotated (id…
m-makulec Dec 12, 2022
6b56af8
Annotated Katelaris 2011 for segmentation
m-makulec Dec 12, 2022
6cffa2d
Annotated Katelaris 2011 for fulltext model
m-makulec Dec 12, 2022
de04390
Removed other Katelaris 2011 files
m-makulec Dec 12, 2022
4999cba
Removed Canonica 2008 from dataset
m-makulec Dec 12, 2022
d53bdc1
Removed Kumanomidou 2022 from dataset
m-makulec Dec 12, 2022
da896dc
Removed Lo 2006 from dataset
m-makulec Dec 12, 2022
d4c0b1a
Annotated Matsuno for references (just a bit)
m-makulec Dec 12, 2022
b7f080f
Annotated Matsuno 2022 for segmentation
m-makulec Dec 12, 2022
0dc2f6d
Annotated Matsuno 2022 for fulltext
m-makulec Dec 12, 2022
c5754ad
Removed other files for Matsuno 2022
m-makulec Dec 12, 2022
542c68e
Reverted token files deletion
m-makulec Dec 16, 2022
31d2faf
Small fix for Katelaris 2011
m-makulec Dec 16, 2022
aae8128
Annotated Maurer 2018 for segmentation
m-makulec Dec 19, 2022
12ac5a9
Annotated Maurer 2018 for reference segmenter
m-makulec Dec 19, 2022
f14cc87
Removed old Maurer 2018 files
m-makulec Dec 19, 2022
b490c47
Annotated Meltzer 2009 for segmentation model
m-makulec Dec 21, 2022
66ec20b
Removed other Meltzer 2009 files
m-makulec Dec 21, 2022
85739b9
Annotated Neffen 2010 for segmentation
m-makulec Dec 21, 2022
8414dec
Annotated Neffen 2010 for fulltext model
m-makulec Dec 21, 2022
70c43be
Removed other Neffen 2010 files
m-makulec Dec 21, 2022
168223b
Removed Retzler 2018 from dataset
m-makulec Dec 21, 2022
b27de8a
Removed Ricard 1999 from dataset
m-makulec Dec 21, 2022
7300fe9
Annotated Saito 2022 for segmentation
m-makulec Dec 21, 2022
292364f
Removed other Saito 2022 files
m-makulec Dec 21, 2022
80c5253
Annotated Sato 2023 for segmentation
m-makulec Dec 21, 2022
4ae1650
Annotated Sato 2023 for reference segmenter
m-makulec Dec 21, 2022
2400df3
Annotated Sato 2023 for fulltext
m-makulec Dec 21, 2022
13ae952
Removed other Sato 2023 files
m-makulec Dec 21, 2022
35e50e7
Reverted Ricard 1999 deletion from dataset, annotated for reference s…
m-makulec Dec 21, 2022
973346e
Annotated Ricard 1999 for reference segmenter
m-makulec Dec 21, 2022
a58b09e
Copied Ricard 1999 to output dir
m-makulec Dec 21, 2022
ace27fa
Removed Shedden 2005 from dataset
m-makulec Dec 21, 2022
f156d5e
Annotated Suh 2014 for references model
m-makulec Dec 27, 2022
b2f2737
Annotated Suh 2014 for fulltext model
m-makulec Dec 28, 2022
7e21a3b
Removed other Suh 2014 files
m-makulec Dec 28, 2022
9f3b450
Annotated Tamayama 2009 for fulltext and references model
m-makulec Dec 28, 2022
9cedfb9
Annotated Tsuji 2023 for reference segmenter, references and segmenta…
m-makulec Dec 28, 2022
6404393
Annotated Valovirta 2008 for reference segmenter, references, sefgmen…
m-makulec Dec 31, 2022
4eab0f8
One more file removed
m-makulec Dec 31, 2022
eab5dba
Removed Grewar 1998 fulltext file
m-makulec Dec 31, 2022
84c402b
Moved training files to corpus directory, deleted redundant token files
m-makulec Dec 31, 2022
4ad51cd
Merge branch 'fix-NTP-references' of https://github.com/evidenceprime…
m-makulec Dec 31, 2022
f94d0d6
Removed pdfs copies from output
m-makulec Dec 31, 2022
2c70cd7
added new models
m-makulec Jan 5, 2023

Sorry, this diff is taking too long to generate.

It may be too large to display on GitHub.