Skip to content

BergamotAPI

Ulrich Germann edited this page Jul 3, 2020 · 3 revisions

Bergamot API

The Bergamot API is provided at http://<host>:<port>/api/bergamot/v1/ and supports the POST method only. You POST and receive MIME type application/json.

Requests (property names in square brackets indicate optional elements; all others are mandatory)

{ "[options]": { "[inputFormat]": "sentence|paragraph|wrappedText",
                 "[nBest]: <integer>,
                 "[returnWordAlignment]": true|false,
                 "[returnSentenceScore]": true|false,
                 "[returnSoftAlignment]": true|false,
                 "[returnQualityEstimate": true|false,
                 "[returnWordScores]": true|false,
                 "[returnTokenization]": true|false,
                 "[returnOriginal]": true|false },
  "text": <string>|<array>|<object>
}

Response format

The response replicates the input structure but replaces the value of "text" as follows:

  • if "text" is a string, it replaces the value with the Translation Output (see below)
  • if "text" is an array, it replaces each element in the array with the respective Translation Output
  • if "text" is an object, it replaces the value of the "text" property in that object with the respective Translation Output. A local "options" property overrides the settings of the parent (works, but not recommended)

Translation Output

  • if nBest is empty (defaults to 1) or 1, and none of the returnX properties are true, the translation output is a string with:

    • one sentence per line if inputFormat was sentence
    • one paragraph per line if inputFormat was paragraph
    • empty-line separated paragraphs with one sentence per line each if inputFormat was wrappedText (default)
  • otherwise, the response is a list of Paragraphs, where each Paragraph is a list of Sentence Translations, and each Sentence Translation is as follows. Note that the properties "original" and "originalTokenized" are lists, because Marian in principle could accommodate multi-input translation with multiple input strings for a single translation item. Currently, the server accommodates only single-input translation; the use of lists here is looking into the future, where we don't want to make major changes to the API.

    { "original": [<original input sentence>, ...],
      "originalTokenized": [[<original input sentence as tokenized>, ...]],
      "nBest": [<list of n-best translations>],
      "errors": [<list of non-fatal error messages from the translation server, if any>],
      "warnings": [<list of warnings from the translation server>]
    }
    

    Each element of the "nBest" property above is structured as follows (elements are present only if requested). Note that the "wordAlignment" and "softAlignment" properties are lists of alignments, each mapping to the respective element in the originalTokenized list. Currently, Marian supports only alignment to the first input element, and, in the case of ensemble decoders, only the alignment reported by the first model in the ensemble.

    { "sentenceScore": <float>,
      "sentenceQualityScore": integer,
      "translation": <string>,
      "translationTokenized": [<list of tokens <string>],
      "wordScores": [<list of floats with word score for each output token>],
      "wordAligment": [[<list of mappings from translation tokens to original tokens>]],
      "softAlignment": [list of soft alignemnt matrices (lists if lists) with alignment distribution from target position to respective source positions]
      "errors": [<list of non-fatal error messages from the translation server, if any>],
      "warnings": [<list of warnings from the translation server>]
    }
    

    Each word alignment is a vector of integers mapping from target token positions to source token positions; unaligned tokens have the value -1.