-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathvoices-and-languages.html
134 lines (95 loc) · 5.57 KB
/
voices-and-languages.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>meSpeak – Voices & Languages</title>
<style type="text/css">
li { margin-bottom: 0.5em; }
h2,h3 { margin-top: 2em; }
</style>
</head>
<body>
<h1>meSpeak – Voices & Languages</h1>
<p>A short guide to the set-up of languages and voices for meSpeak.<br />
Please mind that meSpeak is based on an Emscripten-port of <a href="http://espeak.sourceforge.net/" target="_blank">eSpeak</a>, so all of the eSpeak grammar applies also to meSpeak.</p>
<h2>Standard Language Files</h2>
<p>meSpeak's language-files provide eSpeak's language- and voice-files in a single package.<br />(Since a voice usually refers to a language and its dictionary, it seems suitable to bundle them together in a single file.)<br />The language-files are of the following structure (JSON):</p>
<xmp>{
"voice_id": "<filename>",
"dict_id": "<filename>",
"dict": "<base64-encoded octet stream>",
"voice": "<base64-encoded octet stream>"
}
</xmp>
<p>The values of <em>voice_id</em> and <em>dict_id</em> are actually UNIX-filenames, <code>dict_id</code> relative to the path of eSpeak's data-directory "<code>espeak-data/</code>", <em>voice_id</em> relative to "<code>espeak-data/voices/</code>".</p>
<p>If we were to embed the files for the langage "<code>en-en</code>", these would be:</p>
<ul>
<li>"<code>en/en-en</code>" for the voice and</li>
<li>"<code>en_dict</code>" for the dictionary used by "en-en"</li>
</ul>
<p>For a standard language-file, you would add a base64-representation as the string value of <em>dict</em> and <em>voice</em> of the respective eSpeak-files.</p>
<h2>Customizing</h2>
<p>There is an alternate layout for meSpeak's language-files, which is espacially usefull for the purpose of customizing and testing:</p>
<xmp>{
"voice_id": "<filename>",
"dict_id": "<filename>",
"dict": "<base64-encoded octet stream>",
"voice": "<text-string>",
"voice_encoding": "text"
}
</xmp>
<p>Since eSpeak's voice-files are actually plain-text files, you may use a simple string for these, if you provide an additional property <code>"voice_encoding": "text"</code> at the same time.</p>
<p><em>For dictionaries, which are a binary files with eSpeak, see the note at the end of the page.</em></p>
<h3>Example</h3>
<p>For an example we will configure a basic female voice for "en-us", which will be named "en-us-f".</p>
<ol>
<li>Make a copy of a meSpeak-language file (json), which you want to modify (in this case "<code>voices/en/en-us.json</code>).</li>
<li>Rename the file (e.g.: "<code>en-us-f.json</code>") and open it in editor.</li>
<li>Download the source of <a href="http://espeak.sourceforge.net/" target="_blank">eSpeak</a> and go to the "<code>espeak-data/</code>" directory.</li>
<li>The eSpeak-file "<code>espeak-data/voices/en-us</code>" looks like this:
<xmp>// moving towards US English
name english-us
language en-us 2
language en-r
language en 3
gender male
// and more, skipped here
</xmp></li>
<li>Rename the "<code>name</code>" parameter to make it unique (e.g.: "<code>name english-us-f</code>").</li>
<li>Change any paramaters as you whish, in this case change "<code>gender male</code>" to "<code>gender female</code>" for a female voice.</li>
<li>You should have arrived at something like this (first line removed, since it is just a comment):
<xmp>name english-us-f
language en-us 2
language en-r
language en 3
gender female
</xmp></li>
<li>Replace any line-breaks by "<code>\n</code>" in order to get a valid JSON-string:
<xmp>"name english-us-f\nlanguage en-us 2\nlanguage en-r\nlanguage en 3\ngender female"</xmp>
And use this as a value for the "<code>voice</code>"-property of the JSON-file.</li>
<li>Add the line <code>"voice_encoding": "text"</code> to the JSON to indicate that the voice is plain-text.<br />Your voice file should now look like this:
<xmp>Content of file: "en-us-f.json":
{
"voice_id": "en-us-f",
"dict_id": "en_dict",
"dict": "<base64-encoded octet stream>",
"voice": "name english-us-f\nlanguage en-us 2\nlanguage en-r\nlanguage en 3\ngender female",
"voice_encoding": "text"
}
</xmp></li>
<li>Save it and load it into meSpeak.</li>
</ol>
<p><em>Please note that eSpeak is not very graceful with syntax errors in a voice-definition and will just throw an error, which will — in the case of meSpeak — show up in the console-log.</em></p>
<p>For further details on voice-parameters and fine-tuning, please refer to the eSpeak-documentation: <a href="http://espeak.sourceforge.net/voices.html" target="_blank">http://espeak.sourceforge.net/voices.html</a>.</p>
<h2>Custom Dictionaries</h2>
<p>eSpeak's dictonaries are binary files, which must be compiled with eSpeak first.<br />
You would have to install eSpeak and compile a file following the <a href="http://espeak.sourceforge.net/docindex.html" target="_blank">eSpeak documentation</a>.</br />
Further, you would insert a base64-encoded string of the resulting object-file's content as the value of the <em>dict</em> property of a meSpeak-language-file.<br />
Finally, you would set a suiting and unique value for the property <em>dict_id</em> (UNIX file path).</p>
<p>There is no shortcut to this. Sorry.</p>
<p>Please see also the section on the <em>extended voice format</em> at the <a href="./">main-page</em>.</p>
<p> </p>
<p>Norbert Landsteiner<br />
Vienna, July 2013</p>
</body>
</html>