Skip to content

kviksilver/TLSphinx

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TLSphinx

TLSphinx is a Swift wrapper around Pocketsphinx, a portable library based on CMU Sphinx, that allow an application to perform speech recognition withouth the audio leave the device

This repository has two main parts. First is syntetized version of the pocketsphinx and sphinx base repositories with a module map to access the library as a Clang module. This module is accessed under the name Shpinx and has two submodules: Pocket and Base in reference to pocketsphinx and sphinx base.

The second part is TLSphinx, a Swift framework that use the Sphinx Clang module and expose a Swift-like API that talks to pocketsphinx.

Note: I write a blog post about TLSphinx here at the Tryolabs Blog. Check it for a short history about why I write this.

Usage

The framework provide three classes:

  • Config describe the configuration needed to recognize the speech.
  • Decoder is the main class that has the API to perform the decode.
  • Hypotesis is the result of a decode attempt. It has a text and a score properties.

Config

Represents the cmd_ln_t opaque structure in Sphinx. The default constructor take an array of tuples with the form (param name, param value) where "param name" is the name of one of the parameters recognized for Sphinx. In this example we are passing the acustic model, the languaje model and the dictionary. For a complete list of recognized parameters check the Sphinx docs.

The class has a public property to turn on-off the debug info from printed out from Sphinx:

public var showDebugInfo: Bool

Decoder

Represent the ps_decoder_t opaque struct in Sphinx. The default constructor take a Config object as parameter.

This has the functions to perform the decode from a file or from the mic. The result is returned in an optional Hypotesis object, following the naming convention of the Pocketsphinx API. The functions are:

To decode speech from a file:

public func decodeSpeechAtPath (filePath: String, complete: (Hypotesis?) -> ())

The audio pointed by filePath must have the following characteristics:

  • single-channel (monaural)
  • little-endian
  • unheadered
  • 16-bit signed
  • PCM
  • sampled at 16000 Hz

To control the size of the buffer used to read the file the Decoder class has a public property

public var bufferSize: Int

To decode a live audio stream from the mic:

public func startDecodingSpeech (utteranceComplete: (Hypotesis?) -> ())
public func stopDecodingSpeech ()

You can use the same Decoder instance many times.

Hypotesis

This struct represent the result of a decode attempt. It has a text property with the best scored text and a score with the score value. This struct implement Printable so you can print it with println(hypotesis_value).

Examples

Process an Audio File

As an example let's see how to decode the speech in an audio file. To do so you first need to create a Config object and pass it to the Decoder constructor. With the decoder you can perform automatic speech recognition from an audio file like this:

import TLSphinx

let hmm = ...   // Path to the acustic model
let lm = ...    // Path to the languaje model
let dict = ...  // Path to the languaje dictionary

if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
  if let decoder = Decoder(config:config) {
      
      let audioFile = ... // Path to an audio file
      
      decoder.decodeSpeechAtPath(audioFile) {
          
          if let hyp: Hypotesis = $0 {
              // Print the decoder text and score
              println("Text: \(hyp.text) - Score: \(hyp.score)")
          } else {
              // Can't decode any speech because an error
          }
      }
  } else {
      // Handle Decoder() fail
  }
} else {
  // Handle Config() fail  
}

The decode is performed with the decodeSpeechAtPath function in the bacground. Once the process finish the complete closure is called in the main thread.

Speech from the Mic

import TLSphinx

let hmm = ...   // Path to the acustic model
let lm = ...    // Path to the languaje model
let dict = ...  // Path to the languaje dictionary

if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
  if let decoder = Decoder(config:config) {
      
      decoder.startDecodingSpeech {
          
          if let hyp: Hypotesis = $0 {
              println(hyp)
          } else {
              // Can't decode any speech because an error
          }
      }
  } else {
      // Handle Decoder() fail
  }
} else {
  // Handle Config() fail  
}

//At some point in the future stop listen to the mic
decoder.stopDecodingSpeech()

Installation

The more clear way to integrate TLSphinx is using Carthage or similar method to get the framework bundle. This let you integrate the framework and the Sphinx module without magic.

Carthage

In your Cartfile add a reference to the last version of TLSphinx:

github "Tryolabs/TLSphinx" ~> tag_pointing_to_the_last_version

Then run carthage update, this should fetch and build the last version of TLSphinx. Once it's done drag the TLSphinx.framewok bundle to the XCode Linked Frameworks and Libraries. You must tell XCode where to find Sphinx module that is located in the Carthage checkout. To do so:

  • add $(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/include to Header Search Paths recursive
  • add $(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/lib to Library Search Paths recursive
  • in Swift Compiler - Search Paths add $(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/include to Import Paths

Manual

Download the project from this repository and drag the TLSpinx project to your XCode project. If you hit errors about missing headers and/or libraries for Sphinx please add the Spinx/include to your header search path and Sphinx/lib to the library search path and mark it as recursive

Author

BrunoBerisso, [email protected]

License

TLSphinx is available under the MIT license. See the LICENSE file for more info.

About

Swift wrapper around Pocketsphinx

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • C 90.3%
  • C++ 5.5%
  • Swift 4.1%
  • Objective-C 0.1%