Skip to content

Commit

Permalink
added correct preannotated files
Browse files Browse the repository at this point in the history
  • Loading branch information
anneferger committed Jan 31, 2024
1 parent c6a685d commit a90f119
Show file tree
Hide file tree
Showing 35 changed files with 1,904 additions and 1,446 deletions.
217 changes: 122 additions & 95 deletions data/JTEI/10_2016-19/jtei-10-burghart-source.xml

Large diffs are not rendered by default.

19 changes: 11 additions & 8 deletions data/JTEI/10_2016-19/jtei-10-haaf-source.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?><?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_jtei.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?><?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_jtei.rng" type="application/xml"
schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="https://github.com/DH-RSE/software-citation/raw/main/schema/tei_jtei_annotated.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="https://github.com/DH-RSE/software-citation/raw/main/schema/tei_jtei_annotated.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" rend="jTEI">
<teiHeader>
<fileDesc>
Expand Down Expand Up @@ -396,10 +397,11 @@
>Jupiter</hi> noch 18<lb/>[…] </egXML>
<p>Another example for similar inline phenomena in manuscripts and printed texts is the
underlining of important phrases or keywords, represented in the DTABf as <tag>hi
rendition="#u"</tag> for printed texts and manuscripts alike. Furthermore, though this
feature is far more frequent in prints, manuscripts may also contain catchwords or
signature marks at the bottom of the page, which we tag as <gi>fw</gi> with
<att>type</att>=<val>catch</val> or <att>type</att>=<val>sig</val>, respectively.</p>
rendition=&quot;#u&quot;</tag> for printed texts and manuscripts alike. Furthermore,
though this feature is far more frequent in prints, manuscripts may also contain
catchwords or signature marks at the bottom of the page, which we tag as <gi>fw</gi>
with <att>type</att>=<val>catch</val> or <att>type</att>=<val>sig</val>,
respectively.</p>
<figure xml:id="figure6" n="1.6">
<graphic url="images/image6.png" width="624px" height="119px"/>
<head type="legend">The last two lines of running text, followed by a signature mark and
Expand Down Expand Up @@ -681,7 +683,7 @@
</figure>
<egXML xmlns="http://www.tei-c.org/ns/Examples"> Als der <hi rendition="#aq"
>Enke</hi>ſche Comet in <hi rendition="#u" hand="#pencil">Paramala</hi> wieder
er-<lb/> ſchien will <choice><abbr>H&#xFFFC;</abbr><expan>Herr</expan></choice>
er-<lb/> ſchien will <choice><abbr>H</abbr><expan>Herr</expan></choice>
<hi rendition="#aq"><hi rendition="#u" hand="#pencil">Dummler</hi></hi> eine Rotation
des Schweifes von<lb/>[…] </egXML>
<p>In the following example (<ptr target="#figure18" type="crossref"/>) it is obvious
Expand Down Expand Up @@ -1207,7 +1209,8 @@
<p>The DTA project is an example of the application of the TEI Guidelines to large-scale
corpora. Our primary goal is to be as inclusive as possible, allowing for other projects
to benefit from our resources (i.e., our comprehensive guidelines and documentation as
well as the technical infrastructure that includes Schemas, ODDs, and XSLT scripts) and
well as the technical infrastructure that includes Schemas, ODDs, and <ptr type="software"
xml:id="XSLT" target="#XSLT"/><rs type="soft.name" ref="#XSLT">XSLT</rs> scripts) and
contribute to our corpora. We also want to ensure interoperability of all data within the
DTA corpora. The underlying TEI format has to be continuously maintained and adapted to
new necessities with these two premises in mind.</p>
Expand Down
83 changes: 50 additions & 33 deletions data/JTEI/10_2016-19/jtei-10-romary-source.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?><?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_jtei.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?><?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_jtei.rng" type="application/xml"
schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="https://github.com/DH-RSE/software-citation/raw/main/schema/tei_jtei_annotated.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="https://github.com/DH-RSE/software-citation/raw/main/schema/tei_jtei_annotated.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" rend="jTEI">
<teiHeader>
<fileDesc>
Expand Down Expand Up @@ -644,13 +645,15 @@
available at <ptr target="https://github.com/TEIC/TEI/issues/1512"/>. In our proposal,
the <gi>etym</gi> element has to be made recursive in order to allow the fine-grained
representations we propose here. The corresponding ODD customization, together with
reference examples, is available on GitHub.</note> and the fact that a change occurred
within the contemporary lexicon (as opposed to its parent language) is indicated by
means of <att>xml:lang</att> on the source form.<note>There may also be cases in which
it is unknown whether a given etymological process occurred within the contemporary
language or parent system; in such cases the encoder can just use the main language
tag for both the diachronic and synchronic portions of the entry as a default (see,
for instance, <ptr target="#example11" type="crossref"/>).</note></p>
reference examples, is available on <ptr type="software" xml:id="GitHub"
target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>.</note> and the
fact that a change occurred within the contemporary lexicon (as opposed to its parent
language) is indicated by means of <att>xml:lang</att> on the source form.<note>There
may also be cases in which it is unknown whether a given etymological process occurred
within the contemporary language or parent system; in such cases the encoder can just
use the main language tag for both the diachronic and synchronic portions of the entry
as a default (see, for instance, <ptr target="#example11" type="crossref"
/>).</note></p>
<p>In the TEI encoding, the former two can be respectively labeled as: <egXML
xmlns="http://www.tei-c.org/ns/Examples"><etym type="borrowing">…</etym></egXML> and
<egXML xmlns="http://www.tei-c.org/ns/Examples"><etym type="inheritance"
Expand Down Expand Up @@ -765,12 +768,14 @@
text.<note>The interested reader may ponder here the possibility to also encode
scripts by means of the <att>notation</att> attribute instead of using a cluttering of
language subtags on <att>xml:lang</att>. For more on this issue, see the proposal in
the TEI GitHub (<ptr target="https://github.com/TEIC/TEI/issues/1510"/>).</note> This
is why we have extended the <att>notation</att> attribute to <gi>orth</gi> in order to
allow for better representation of both language identification and the orthographic
content. With this double mechanism, we intend to describe content expressed in the same
language by means of the same language tag, thus allowing more reliable management,
access, and search procedures over our lexical content.</p>
the TEI <ptr type="software" xml:id="GitHub" target="#GitHub"/><rs type="soft.name"
ref="#GitHub">GitHub</rs> (<ptr target="https://github.com/TEIC/TEI/issues/1510"
/>).</note> This is why we have extended the <att>notation</att> attribute to
<gi>orth</gi> in order to allow for better representation of both language
identification and the orthographic content. With this double mechanism, we intend to
describe content expressed in the same language by means of the same language tag, thus
allowing more reliable management, access, and search procedures over our lexical
content.</p>
<p>We are aware that we open a can of worms here, since such an editorial practice could
be easily extended to all text elements in the TEI Guidelines. We have actually
identified several cases in the sole context of lexical representations (e.g.,
Expand Down Expand Up @@ -915,39 +920,48 @@
<etym type="inheritance">
<cit type="etymon" xml:id="kápŭ" next="#kábu">
<pRef notation="private" xml:lang="la">kápŭ</pRef>
</cit> <cit type="etymon" xml:id="kábu" prev="#kápŭ">
</cit>
<cit type="etymon" xml:id="kábu" prev="#kápŭ">
<!-- intervocalic voicing -->
<date notBefore="0350" notAfter="0399"/>
<pRef notation="private" xml:lang="la">kábu</pRef>
<!-- gallo-latin or (VL-Gaul) -->
</cit> <cit type="etymon" xml:id="k̯áβo̥" prev="#kábu" next="#t̯áβo̥">
</cit>
<cit type="etymon" xml:id="k̯áβo̥" prev="#kábu" next="#t̯áβo̥">
<date notBefore="0400" notAfter="0499"/>
<pRef notation="private">k̯áβo̥</pRef>
<!-- late gallo-latin ?-->
</cit> <cit type="etymon" xml:id="t̯ávo̥" prev="#k̯áβo̥" next="t͡sávo̥">
</cit>
<cit type="etymon" xml:id="t̯ávo̥" prev="#k̯áβo̥" next="t͡sávo̥">
<date notBefore="0400" notAfter="0499"/>
<pRef notation="private">t̯áβo</pRef>
<!-- late gallo-latin ?-->
</cit> <cit type="etymon" xml:id="t͡sávo̥" prev="#t̯ávo̥" next="#t͡šíe̥vo̥">
</cit>
<cit type="etymon" xml:id="t͡sávo̥" prev="#t̯ávo̥" next="#t͡šíe̥vo̥">
<date notBefore="0400" notAfter="0499"/>
<pRef notation="private">t͡sávo̥</pRef>
<!-- late gallo-latin ?-->
</cit> <cit type="etymon" xml:id="t͡šíe̥vo̥" prev="#t͡šávo̥" next="#tšíe̥f">
</cit>
<cit type="etymon" xml:id="t͡šíe̥vo̥" prev="#t͡šávo̥" next="#tšíe̥f">
<date notBefore="0450" notAfter="0550"/>
<pRef notation="private">t͡šíe̥vo̥</pRef>
<!-- late gallo-latin ?/early gallo-romance-->
</cit> <cit type="etymon" xml:id="tšíe̥f" prev="#t͡šíe̥vo" next="#šye̥f">
</cit>
<cit type="etymon" xml:id="tšíe̥f" prev="#t͡šíe̥vo" next="#šye̥f">
<date notBefore="0600" notAfter="0699"/>
<pRef notation="private">tšíe̥f</pRef>
<!-- early gallo-romance-->
</cit> <cit type="etymon" xml:id="šyé̥f" prev="#tšíe̥f" next="#šé̥f">
</cit>
<cit type="etymon" xml:id="šyé̥f" prev="#tšíe̥f" next="#šé̥f">
<date notBefore="0700" notAfter="0799"/>
<pRef notation="private">šyé̥f</pRef>
<!-- early/Proto Old French (?) -->
</cit> <cit type="etymon" xml:id="šé̥f" prev="#šyé̥f" next="#šę́f">
</cit>
<cit type="etymon" xml:id="šé̥f" prev="#šyé̥f" next="#šę́f">
<date notBefore="1500" notAfter="1650"/>
<pRef notation="private" xml:lang="frm">šé̥f</pRef>
</cit> <cit type="etymon" xml:id="šę́f" prev="#šé̥f">
</cit>
<cit type="etymon" xml:id="šę́f" prev="#šé̥f">
<date notBefore="1500" notAfter="1650"/>
<pRef notation="private" xml:lang="frm">šę́f</pRef>
</cit>
Expand All @@ -972,9 +986,10 @@
respectively.</p>
<p>The <gi>date</gi><note>The element <gi>date</gi> as a child of <gi>cit</gi> is another
example which does not adhere to the current TEI standards. We have allowed this
within our ODD document. A feature request proposal will be made on the GitHub page
and this feature may or may not appear in future versions of the TEI
Guidelines.</note> element is listed within each etymon block; the values of
within our ODD document. A feature request proposal will be made on the <ptr
type="software" xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub"
>GitHub</rs> page and this feature may or may not appear in future versions of the
TEI Guidelines.</note> element is listed within each etymon block; the values of
attributes <att>notBefore</att> and <att>notAfter</att> specify the range of time
corresponding to the period of time that the given form was in use according to the
authors.<note>In the (French language) source of this example (<ref
Expand Down Expand Up @@ -1749,7 +1764,8 @@
<pos>cardinalNumber</pos>
</gramGrp>
<gloss>ten</gloss>
</cit> <cit type="etymon">
</cit>
<cit type="etymon">
<oRef corresp="#num-3">uni</oRef>
<gramGrp>
<pos>cardinalNumber</pos>
Expand Down Expand Up @@ -2469,11 +2485,12 @@
<head>Problematic and Unresolved Issues</head>
<p>For the issues regarded as the most fundamentally important to creating a dynamic and
sustainable model for both etymology and general lexicographic markup in TEI, we have
submitted formal requests for changes to the TEI GitHub, and will continue to submit
change requests as needed. While this work represents a large step in the right direction
for those looking for means of representing etymological information, there are still a
number of unresolved issues that will need to be addressed. These remaining issues pertain
to: <list rend="inline simple">
submitted formal requests for changes to the TEI <ptr type="software" xml:id="GitHub"
target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>, and will continue to
submit change requests as needed. While this work represents a large step in the right
direction for those looking for means of representing etymological information, there are
still a number of unresolved issues that will need to be addressed. These remaining issues
pertain to: <list rend="inline simple">
<item>(i) expanding the types of etymological information and refining the
representation of the processes and features which are covered; and</item>
<item>(ii) the need for continued progress in a number of issues within the body of
Expand Down
Loading

0 comments on commit a90f119

Please sign in to comment.