omnilingo-ipfs

Steps

There are three main steps in adding your data to OmniLingo. The first step is importing the data into IPFS, the second is indexing the data and the final step is publishing the data.

Import

Import data into your local IPFS node and generate an index:

$ importer.py dataset_dir index_path

e.g.

$ importer.py ./cv-corpus-7.0-2021-07-21/tr/ tr.json

where the dataset_dir is in Common Voice format.

Index

Index the data, extracting a balanced subset of clips by a complexity metric:

$ indexer.py locale index_path

e.g.

$ indexer.py tr tr.json

This will return a CID that looks like QmXpgcavH2shpBbfnFoymPxEw2zpr4MdAgi1aaoZT4Yeho

Publish

Publish data to the global index in OmniLingo on IPFS:

$ publisher.py locale cid

e.g.

$ publisher.py tr QmXpgcavH2shpBbfnFoymPxEw2zpr4MdAgi1aaoZT4Yeho

Publish to a name using the local node ID:

ipfs name publish cid

e.g.

ipfs name publish QmXpgcavH2shpBbfnFoymPxEw2zpr4MdAgi1aaoZT4Yeho

Publishing models

To publish model files (e.g. for the pronunciation assistance) you need a directory, containing two files:

models/LOCALE.tflite: The binary for the ASR model
models/LOCALE.json: Metadata for the model

The metadata file, e.g. pt.json for Portuguese, should look like:

{"format": "coqui", "type": "acoustic", "licence":"AGPL-3.0", "src":"https://itml.cl.indiana.edu/models/"}

You can publish using:

python3 publisher.py --merge QmXMp1Dv1Sf7ZHXcH6puqbudBhDNkqngopadzcy8Qikuqt --with-model models/pt.tflite pt QmbWXcHWVdRFh3ZmXEbf4tXTk6nqp8zkaNa4aAxaeQ9VTQ

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
doc		doc
.gitignore		.gitignore
AUTHORS		AUTHORS
COPYING		COPYING
README.md		README.md
importer.py		importer.py
importer_mp.py		importer_mp.py
indexer.py		indexer.py
languages.py		languages.py
orthography.py		orthography.py
publisher.py		publisher.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

omnilingo-ipfs

Steps

Import

Index

Publish

Publishing models

About

Releases

Packages

Contributors 3

Languages

License

omnilingo/omnilingo-ipfs

Folders and files

Latest commit

History

Repository files navigation

omnilingo-ipfs

Steps

Import

Index

Publish

Publishing models

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages