This repository contains the results of three benchmarks that compare natural language understanding services offering:
- built-in intents (Apple’s SiriKit, Amazon’s Alexa, Microsoft’s Luis, Google’s API.ai, and Snips.ai) on a selection of various intents. This benchmark was performed in December 2016. Its results are described in length in the following post.
- custom intent engines (Google's API.ai, Facebook's Wit, Microsoft's Luis, Amazon's Alexa, and Snips' NLU) for seven chosen intents. This benchmark was performed in June 2017. Its results are described in a paper and a blog post.
- extension of Braun et al., 2017 (Google's API.AI, Microsoft's Luis, IBM's Watson, Rasa) This experiment replicates the analysis made by Braun et al., 2017, published in Evaluating Natural Language Understanding Services for Conversational Question Answering Systems as part of SIGDIAL 2017 proceedings. Snips and Rasa are added. Details are available in a paper and a blog post.
The data is provided for each benchmark and more details about the methods are available in the README file in each folder.
Any publication based on these datasets must include a full citation to the following paper in which the results were published by the Snips Team1:
accepted for a spotlight presentation at the Privacy in Machine Learning and Artificial Intelligence workshop colocated with ICML 2018.
1 The Snips team has joined Sonos in November 2019. These open datasets remain available and their access is now managed by the Sonos Voice Experience Team. Please email [email protected] with any question.