adding metadata of whisper-wrapper.v11

clamsproject · Oct 1, 2024 · 80948f3 · 80948f3
1 parent 4a2181d
commit 80948f3
Show file tree

Hide file tree

Showing 5 changed files with 279 additions and 2 deletions.
diff --git a/docs/_apps/whisper-wrapper/v11/index.md b/docs/_apps/whisper-wrapper/v11/index.md
@@ -0,0 +1,137 @@
+---
+layout: posts
+classes: wide
+title: "Whisper Wrapper (v11)"
+date: 2024-10-01T19:15:40+00:00
+---
+## About this version
+
+- Submitter: [keighrim](https://github.com/keighrim)
+- Submission Time: 2024-10-01T19:15:40+00:00
+- Prebuilt Container Image: [ghcr.io/clamsproject/app-whisper-wrapper:v11](https://github.com/clamsproject/app-whisper-wrapper/pkgs/container/app-whisper-wrapper/v11)
+- Release Notes
+
+    > - Now based on whisper 240930 version with support of `turbo` model  
+    > - Beam search size is set to 5 for the decoder, following the `whisper` command's default  
+    > - (temporarily) Disabled multiprocessing web app via gunicorn and fell back to flask built-in to work around CUDA memory issue
+
+## About this app (See raw [metadata.json](metadata.json))
+
+**A CLAMS wrapper for Whisper-based ASR software originally developed by OpenAI.**
+
+- App ID: [http://apps.clams.ai/whisper-wrapper/v11](http://apps.clams.ai/whisper-wrapper/v11)
+- App License: Apache 2.0
+- Source Repository: [https://github.com/clamsproject/app-whisper-wrapper](https://github.com/clamsproject/app-whisper-wrapper) ([source tree of the submitted version](https://github.com/clamsproject/app-whisper-wrapper/tree/v11))
+- Analyzer Version: 20240930
+- Analyzer License: MIT
+
+
+#### Inputs
+(**Note**: "*" as a property value means that the property is required but can be any value.)
+
+One of the following is required: [
+- [http://mmif.clams.ai/vocabulary/AudioDocument/v1](http://mmif.clams.ai/vocabulary/AudioDocument/v1) (required)
+(of any properties)
+
+- [http://mmif.clams.ai/vocabulary/VideoDocument/v1](http://mmif.clams.ai/vocabulary/VideoDocument/v1) (required)
+(of any properties)
+
+
+
+]
+
+
+#### Configurable Parameters
+(**Note**: _Multivalued_ means the parameter can have one or more values.)
+
+- `modelSize`: optional, defaults to `tiny`
+
+    - Type: string
+    - Multivalued: False
+    - Choices: **_`tiny`_**, `True`, `base`, `b`, `small`, `s`, `medium`, `m`, `large`, `l`, `large-v2`, `l2`, `large-v3`, `l3`, `turbo`, `tu`
+
+
+    > The size of the model to use. When `modelLang=en` is given, for non-`large` models, English-only models will be used instead of multilingual models for speed and accuracy. (For `large` models, English-only models are not available.) (also can be given as alias: tiny=t, base=b, small=s, medium=m, large=l, large-v2=l2, large-v3=l3, turbo=tu)
+- `modelLang`: optional, defaults to `""`
+
+    - Type: string
+    - Multivalued: False
+
+
+    > Language of the model to use, accepts two- or three-letter ISO 639 language codes, however Whisper only supports a subset of languages. If the language is not supported, error will be raised.For the full list of supported languages, see https://github.com/openai/whisper/blob/20240930/whisper/tokenizer.py . In addition to the langauge code, two-letter region codes can be added to the language code, e.g. "en-US" for US English. Note that the region code is only for compatibility and recording purpose, and Whisper neither detects regional dialects, nor use the given one for transcription. When the langauge code is not given, Whisper will run in langauge detection mode, and will use first few seconds of the audio to detect the language.
+- `task`: optional, defaults to `transcribe`
+
+    - Type: string
+    - Multivalued: False
+    - Choices: **_`transcribe`_**, `translate`
+
+
+    > (from whisper CLI) whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate')
+- `initialPrompt`: optional, defaults to `""`
+
+    - Type: string
+    - Multivalued: False
+
+
+    > (from whisper CLI) optional text to provide as a prompt for the first window.
+- `conditionOnPreviousText`: optional, defaults to `true`
+
+    - Type: boolean
+    - Multivalued: False
+    - Choices: `false`, **_`true`_**
+
+
+    > (from whisper CLI) if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
+- `noSpeechThreshold`: optional, defaults to `0.6`
+
+    - Type: number
+    - Multivalued: False
+
+
+    > (from whisper CLI) if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
+- `pretty`: optional, defaults to `false`
+
+    - Type: boolean
+    - Multivalued: False
+    - Choices: **_`false`_**, `true`
+
+
+    > The JSON body of the HTTP response will be re-formatted with 2-space indentation
+- `runningTime`: optional, defaults to `false`
+
+    - Type: boolean
+    - Multivalued: False
+    - Choices: **_`false`_**, `true`
+
+
+    > The running time of the app will be recorded in the view metadata
+- `hwFetch`: optional, defaults to `false`
+
+    - Type: boolean
+    - Multivalued: False
+    - Choices: **_`false`_**, `true`
+
+
+    > The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata
+
+
+#### Outputs
+(**Note**: "*" as a property value means that the property is required but can be any value.)
+
+(**Note**: Not all output annotations are always generated.)
+
+- [http://mmif.clams.ai/vocabulary/TextDocument/v1](http://mmif.clams.ai/vocabulary/TextDocument/v1)
+(of any properties)
+
+- [http://mmif.clams.ai/vocabulary/TimeFrame/v5](http://mmif.clams.ai/vocabulary/TimeFrame/v5)
+    - _timeUnit_ = "milliseconds"
+
+- [http://mmif.clams.ai/vocabulary/Alignment/v1](http://mmif.clams.ai/vocabulary/Alignment/v1)
+(of any properties)
+
+- [http://vocab.lappsgrid.org/Token](http://vocab.lappsgrid.org/Token)
+(of any properties)
+
+- [http://vocab.lappsgrid.org/Sentence](http://vocab.lappsgrid.org/Sentence)
+(of any properties)
+
diff --git a/docs/_apps/whisper-wrapper/v11/metadata.json b/docs/_apps/whisper-wrapper/v11/metadata.json
@@ -0,0 +1,130 @@
+{
+  "name": "Whisper Wrapper",
+  "description": "A CLAMS wrapper for Whisper-based ASR software originally developed by OpenAI.",
+  "app_version": "v11",
+  "mmif_version": "1.0.5",
+  "analyzer_version": "20240930",
+  "app_license": "Apache 2.0",
+  "analyzer_license": "MIT",
+  "identifier": "http://apps.clams.ai/whisper-wrapper/v11",
+  "url": "https://github.com/clamsproject/app-whisper-wrapper",
+  "input": [
+    [
+      {
+        "@type": "http://mmif.clams.ai/vocabulary/AudioDocument/v1",
+        "required": true
+      },
+      {
+        "@type": "http://mmif.clams.ai/vocabulary/VideoDocument/v1",
+        "required": true
+      }
+    ]
+  ],
+  "output": [
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/TextDocument/v1"
+    },
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v5",
+      "properties": {
+        "timeUnit": "milliseconds"
+      }
+    },
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/Alignment/v1"
+    },
+    {
+      "@type": "http://vocab.lappsgrid.org/Token"
+    },
+    {
+      "@type": "http://vocab.lappsgrid.org/Sentence"
+    }
+  ],
+  "parameters": [
+    {
+      "name": "modelSize",
+      "description": "The size of the model to use. When `modelLang=en` is given, for non-`large` models, English-only models will be used instead of multilingual models for speed and accuracy. (For `large` models, English-only models are not available.) (also can be given as alias: tiny=t, base=b, small=s, medium=m, large=l, large-v2=l2, large-v3=l3, turbo=tu)",
+      "type": "string",
+      "choices": [
+        "tiny",
+        true,
+        "base",
+        "b",
+        "small",
+        "s",
+        "medium",
+        "m",
+        "large",
+        "l",
+        "large-v2",
+        "l2",
+        "large-v3",
+        "l3",
+        "turbo",
+        "tu"
+      ],
+      "default": "tiny",
+      "multivalued": false
+    },
+    {
+      "name": "modelLang",
+      "description": "Language of the model to use, accepts two- or three-letter ISO 639 language codes, however Whisper only supports a subset of languages. If the language is not supported, error will be raised.For the full list of supported languages, see https://github.com/openai/whisper/blob/20240930/whisper/tokenizer.py . In addition to the langauge code, two-letter region codes can be added to the language code, e.g. \"en-US\" for US English. Note that the region code is only for compatibility and recording purpose, and Whisper neither detects regional dialects, nor use the given one for transcription. When the langauge code is not given, Whisper will run in langauge detection mode, and will use first few seconds of the audio to detect the language.",
+      "type": "string",
+      "default": "",
+      "multivalued": false
+    },
+    {
+      "name": "task",
+      "description": "(from whisper CLI) whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate')",
+      "type": "string",
+      "choices": [
+        "transcribe",
+        "translate"
+      ],
+      "default": "transcribe",
+      "multivalued": false
+    },
+    {
+      "name": "initialPrompt",
+      "description": "(from whisper CLI) optional text to provide as a prompt for the first window.",
+      "type": "string",
+      "default": "",
+      "multivalued": false
+    },
+    {
+      "name": "conditionOnPreviousText",
+      "description": "(from whisper CLI) if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop",
+      "type": "boolean",
+      "default": true,
+      "multivalued": false
+    },
+    {
+      "name": "noSpeechThreshold",
+      "description": "(from whisper CLI) if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence",
+      "type": "number",
+      "default": 0.6,
+      "multivalued": false
+    },
+    {
+      "name": "pretty",
+      "description": "The JSON body of the HTTP response will be re-formatted with 2-space indentation",
+      "type": "boolean",
+      "default": false,
+      "multivalued": false
+    },
+    {
+      "name": "runningTime",
+      "description": "The running time of the app will be recorded in the view metadata",
+      "type": "boolean",
+      "default": false,
+      "multivalued": false
+    },
+    {
+      "name": "hwFetch",
+      "description": "The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata",
+      "type": "boolean",
+      "default": false,
+      "multivalued": false
+    }
+  ]
+}
diff --git a/docs/_apps/whisper-wrapper/v11/submission.json b/docs/_apps/whisper-wrapper/v11/submission.json
@@ -0,0 +1,6 @@
+{
+  "time": "2024-10-01T19:15:40+00:00",
+  "submitter": "keighrim",
+  "image": "ghcr.io/clamsproject/app-whisper-wrapper:v11",
+  "releasenotes": "- Now based on whisper 240930 version with support of `turbo` model\n- Beam search size is set to 5 for the decoder, following the `whisper` command's default\n- (temporarily) Disabled multiprocessing web app via gunicorn and fell back to flask built-in to work around CUDA memory issue\n\n"
+}
diff --git a/docs/_data/app-index.json b/docs/_data/app-index.json
@@ -1,8 +1,12 @@
 {
   "http://apps.clams.ai/whisper-wrapper": {
     "description": "A CLAMS wrapper for Whisper-based ASR software originally developed by OpenAI.",
-    "latest_update": "2024-08-29T22:13:54+00:00",
+    "latest_update": "2024-10-01T19:15:40+00:00",
     "versions": [
+      [
+        "v11",
+        "keighrim"
+      ],
       [
         "v10",
         "keighrim"

diff --git a/docs/_data/apps.json b/docs/_data/apps.json