You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pyamihtml by default creates an html file with heuristic sections, often based on decimals. docanalysis takes xml sections based on PMC/JATS sectioning. These sections are listed in docanalysis/glob_trail.py , e.g.:
# define constants
ABS = ['*abstract.xml']
ACK = ['*ack.xml']
AFF = ['*aff.xml']
AUT = ['*contrib-group.xml']
CON = ['*conclusion*/*.xml']
DIS = ['*discussion*/**/*_title.xml', '*discussion*/**/*_p.xml'] # might bring unwanted sections like tables, fig. captions etc. Maybe get only title and paragraphs?
ETH = ['*ethic*/*.xml']
FIG = ['*fig*.xml']
INT = ['*introduction*/*.xml', '*background*/*.xml']
KEY = ['*kwd-group.xml']
MET = ['*method*/*.xml', '*material*/*.xml'] # also gets us supplementary material. Not sure how to exclude them
RES = ['*result*/*/*_title.xml', '*result*/*/*_p.xml'] # not sure if we should use recursive globbing or not.
TAB = ['*table*.xml']
TIL = ['*article-meta/*title-group.xml']
proposal 1
Define these sections in a JSON file read in by docanalysis so that
the regexes can be edited without affecting the code
new keywords can be added
proposal 2
make pyamihtml output a directory structure compatible with JATS/docanalysis
This requires discrete sections in a hierarchy which are labelled *.xml.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
pyamihtml
by default creates anhtml
file with heuristic sections, often based on decimals.docanalysis
takesxml
sections based on PMC/JATS sectioning. These sections are listed in docanalysis/glob_trail.py , e.g.:proposal 1
Define these sections in a JSON file read in by
docanalysis
so thatproposal 2
pyamihtml
output a directory structure compatible with JATS/docanalysisThis requires discrete sections in a hierarchy which are labelled
*.xml
.Beta Was this translation helpful? Give feedback.
All reactions