Turn CTS TEI corpora like CHS and OGL's First1KGreek into CEX collection files.
- Download the latest release and unpack the
binaries.zip
. - Copy the binary for your system into the unpacked data folder of e.g. First1Greek.
- Open a terminal in that folder and type:
./TEItoCEX-OSX 1kGreek.cex
(you might have to chmod +x the executable before you can use it) - Enjoy your new CEX collection file!
- Copy the binary for your system into the unpacked data folder of e.g. First1Greek.
- Open a terminal in that folder and type:
./TEItoCEX-OSX 1kGreek.csv -CSV
- Enjoy your new CSV collection file!
The numbers and letters shows the scheme that has been used in the original XML file:
KKGGGKGG58GKGGGGGGGGGGGGGGGGKGGKGKGGGGGGG7IGGGGGGGKKKKKGGGGGKKGGGKGGGGGGGGGKGGGGKGGGGGGKKGGGGGGGGGGKKGKKKKGKKKKGGGGGGKGLKKKGGGGGKGKLKGGKGKKKGGGGGKGGGGGGGGGGKGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDKSJJKGKKGGGGKGM5555555LLKKGGGGGGGGKGGKKKGGGGGGLGKKLKGGGKGGKGGGGGKKGGGRGKGGGGGGGKKGGGKGGGGGGKKKKKKKKKKKKKKLKKKGKGGGGKGGGGGKLGKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKGGGKGKKKKKGKKKKKKKGGGKGGKGGKKKKKGGGKGGKGGGGGKKGKKGGGKKKGKGLKGKKGLKGGGEEGKKLGGGKGKLLGKGGGGGGLGGGGGKGGGKGGGGGGGGGGGGGGGKGGGKGGGGGGGGGGGGGGGGGKGGGQGGKKGKGGGKKLKKKKKKGGGGGGKKLKGGGGGGGGGKKLK4GKKLKGGGLLKKKKKKKKKKKKKGGGGKKKGGGGKKLGGGGGKGGGGGGGGKGLGGGGGKGLGGGGKLGKLLKGGGLKKLLK9GGGGGGKKGKKGKGGGGKKKKKGGGGGGGGGKKGGKKGGGGGGG3LKGKKKKKGGGGGGGKKKKKKKKKKKKKKLKLGKKLKKGGGGKGKGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGKGGKGGGGGGGGGGGGGGGGGKKGGGGGGGGKGGGGGKGGKGKKGGKGGGKGGPKKKKKKLKKKKLKKKGGGKKKGGKKGGLLLGGGGGGGKGGGGKLGGGGGGGGGGGKKGGKGGGGGGGGGKKGLKLGLGGGGGLGGLLGGLGGGGGGGGGGGGGGGGGGGKGGKGGGGGGGKGGGKGGKKKK88KKKKKGGLG
Read 974 of 974 files.
Write nodes to file now:
Writing CSV-File
Wrote 227668 nodes.
23340077 words written in the Greek alphabet.
4331600 words written in the Latin alphabet.
5996 words written in the Arabic alphabet.
The following schemes were used:
K 310
8 3
M 1
R 1
Q 1
L 47
D 1
S 1
J 2
5 8
7 1
I 1
4 1
P 1
G 591
E 2
9 1
3 1
CTSExtract.go` is written in Go and can be easily compiled for your system. Flick me a message if you are interested.
CTSExtract can be used to extract metadta fields of TEI-XML annotated input. Currently export to CSV, JSON and XML (and SQL) is possible. The XML format complies to OAI-DC format (DataCite). Please see OAI-PMH.md for information on OAI-PMH compliant hosting.
./TEItoCEX-OSX catalog.json -Cat
The catalog can then replace the catalog.json
in the gh-pages branch of the First1KGreek repo.
TEItoCEX
now offers the possibility to produce Markdown files from the Open Greek and Latin XML versions:
./TEItoCEX-OSX x -Markdown
Those files can then edited and used to produce PDFs and EPUBs with pandoc.