Skip to content

omnilingo/omnilingo-ipfs

Repository files navigation

omnilingo-ipfs

Matrix #omnilingo:matrix.org GitHub licence

Steps

There are three main steps in adding your data to OmniLingo. The first step is importing the data into IPFS, the second is indexing the data and the final step is publishing the data.

Import

Import data into your local IPFS node and generate an index:

$ importer.py dataset_dir index_path

e.g.

$ importer.py ./cv-corpus-7.0-2021-07-21/tr/ tr.json

where the dataset_dir is in Common Voice format.

Index

Index the data, extracting a balanced subset of clips by a complexity metric:

$ indexer.py locale index_path

e.g.

$ indexer.py tr tr.json

This will return a CID that looks like QmXpgcavH2shpBbfnFoymPxEw2zpr4MdAgi1aaoZT4Yeho

Publish

Publish data to the global index in OmniLingo on IPFS:

$ publisher.py locale cid

e.g.

$ publisher.py tr QmXpgcavH2shpBbfnFoymPxEw2zpr4MdAgi1aaoZT4Yeho

Publish to a name using the local node ID:

ipfs name publish cid 

e.g.

ipfs name publish QmXpgcavH2shpBbfnFoymPxEw2zpr4MdAgi1aaoZT4Yeho

Publishing models

To publish model files (e.g. for the pronunciation assistance) you need a directory, containing two files:

  • models/LOCALE.tflite: The binary for the ASR model
  • models/LOCALE.json: Metadata for the model

The metadata file, e.g. pt.json for Portuguese, should look like:

{"format": "coqui", "type": "acoustic", "licence":"AGPL-3.0", "src":"https://itml.cl.indiana.edu/models/"}

You can publish using:

python3 publisher.py --merge QmXMp1Dv1Sf7ZHXcH6puqbudBhDNkqngopadzcy8Qikuqt --with-model models/pt.tflite pt QmbWXcHWVdRFh3ZmXEbf4tXTk6nqp8zkaNa4aAxaeQ9VTQ

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages