On-device Whisper inference on Android mobile using whisper.tflite(quantized 40MB model) #506
Replies: 22 comments 26 replies
-
I released Whisper Android App based on Whisper.tflite ~40MB quantized model to the Android App Store for testing; if anyone is interested, please let me know. Feel free to download the openai/whisper-tiny tflite-based Android Whisper ASR APP from Google App Store. Feel free to download the openai/whisper-tiny tflite-based Apple Whisper ASR APP from Apple App Store. |
Beta Was this translation helpful? Give feedback.
-
This is only a proof-of-concept project to create an Android app based on Whisper TFLite, which leverages the stock Android UI to show off its features. |
Beta Was this translation helpful? Give feedback.
-
Please feel free to download the openai/whisper-tiny tflite-based Android APP from Google App Store. |
Beta Was this translation helpful? Give feedback.
-
Hey @nyadla-sys, this seems amazing, a lightweight model to run on mobile phones! However, I am just curious how did you convert the torch model to tflite?? If yan you share some references/resources regarding the same, it would be very helpful! Also how accurate is the conversion from torch to tflite? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Please see the below notebook which converts to TFLite model |
Beta Was this translation helpful? Give feedback.
-
Huggingface has already been converted from a PyTorch model to a TF model, which I then converted into a TFLite model. |
Beta Was this translation helpful? Give feedback.
-
It would be good to improve the automatic detection of the Spanish language in the mobile application. |
Beta Was this translation helpful? Give feedback.
-
whether it is possible to deploy bigger whisper model on the phone. I assume you're using the English-only tiny.en model, can the app extend to the multi-language model. |
Beta Was this translation helpful? Give feedback.
-
I am facing 1 issue in android older versions can you fix it please ? |
Beta Was this translation helpful? Give feedback.
-
@nyadla-sys How do I get this working with timestamps enabled? |
Beta Was this translation helpful? Give feedback.
-
nice work |
Beta Was this translation helpful? Give feedback.
-
This is awesome! Thank you so much! Could you please upload to f-droid too instead of only google play, so that y; u app can still be downloaded in the future (google regularly changes the rules and kick off older apps). |
Beta Was this translation helpful? Give feedback.
-
Also could you maybe add a button to input an audio file instead of only mic recording? Then it would make this poc very much usable in combination with other FLOSS apps such as https://github.com/Dimowner/AudioRecorder |
Beta Was this translation helpful? Give feedback.
-
Refer the below github repo for Android example, which uses whisper-tiny.en.tflite model |
Beta Was this translation helpful? Give feedback.
-
Hi @nyadla-sys which TF version you used? I tried to run the steps in the notebook you mentioned above, with TF 2.14 (which is the latest from pip install) and I got errors with StridedSlice op: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2178] TFLite interpreter needs to link Flex delegate in order to run the model since it contains the following Select TFop(s): |
Beta Was this translation helpful? Give feedback.
-
Yes I have that line, which is already in your original colab.
What's your output of the convert() function? My guess is that you have the
same output as mine, but since I use benchmark_tool to evaluate this might
be the difference.
My convert() output is:
2023-10-16 20:54:34.116612: W
tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378]
Ignored output_format.
2023-10-16 20:54:34.116650: W
tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381]
Ignored drop_control_dependency.
2023-10-16 20:54:34.117319: I tensorflow/cc/saved_model/reader.cc:83]
Reading SavedModel from: tf_whisper_tiny.tfsaved
2023-10-16 20:54:34.163104: I tensorflow/cc/saved_model/reader.cc:51]
Reading meta graph with tags { serve }
2023-10-16 20:54:34.163136: I tensorflow/cc/saved_model/reader.cc:146]
Reading SavedModel debug info (if present) from: tf_whisper_tiny.tfsaved
2023-10-16 20:54:34.273076: I
tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1
optimization pass is not enabled
2023-10-16 20:54:34.297966: I tensorflow/cc/saved_model/loader.cc:233]
Restoring SavedModel bundle.
2023-10-16 20:54:34.811451: I tensorflow/cc/saved_model/loader.cc:217]
Running initialization op on SavedModel bundle at path:
tf_whisper_tiny.tfsaved
2023-10-16 20:54:35.102822: I tensorflow/cc/saved_model/loader.cc:316]
SavedModel load for tags { serve }; Status: success: OK. Took 985506
microseconds.
2023-10-16 20:54:35.436697: I
tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling
MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to
enable.
2023-10-16 20:54:41.041211: W
tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2178] TFLite interpreter
needs to link Flex delegate in order to run the model since it contains the
following Select TFop(s):
Flex ops: FlexStridedSlice
Details:
tf.StridedSlice(tensor<?x?xf32>, tensor<4xi32>, tensor<4xi32>,
tensor<4xi32>) -> (tensor<1x1x?x?xf32>) : {begin_mask = 12 : i64, device =
"", ellipsis_mask = 0 : i64, end_mask = 12 : i64, new_axis_mask = 3 : i64,
shrink_axis_mask = 0 : i64}
tf.StridedSlice(tensor<?x?xf32>, tensor<4xi32>, tensor<4xi32>,
tensor<4xi32>) -> (tensor<?x1x1x?xf32>) : {begin_mask = 9 : i64, device =
"", ellipsis_mask = 0 : i64, end_mask = 9 : i64, new_axis_mask = 6 : i64,
shrink_axis_mask = 0 : i64}
See instructions: https://www.tensorflow.org/lite/guide/ops_select
BTW the code is:
import tensorflow as tf
from transformers import TFWhisperModel, WhisperFeatureExtractor
from datasets import load_dataset
model = TFWhisperModel.from_pretrained("openai/whisper-tiny")
feature_extractor =
WhisperFeatureExtractor.from_pretrained("openai/whisper-tiny")
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean",
split="validation")
inputs = feature_extractor(
ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"],
return_tensors="tf"
)
input_features = inputs.input_features
decoder_input_ids = tf.convert_to_tensor([[1, 1]]) *
model.config.decoder_start_token_id
last_hidden_state = model(input_features,
decoder_input_ids=decoder_input_ids).last_hidden_state
model.save('tf_whisper_tiny.tfsaved')
converter =
tf.lite.TFLiteConverter.from_saved_model('tf_whisper_tiny.tfsaved')
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
tf.__version__ shows 2.14.0
|
Beta Was this translation helpful? Give feedback.
-
Hi, I tried to run the quantized model (https://github.com/usefulsensors/openai-whisper/blob/main/models/whisper-int8.tflite) but it has flex ops (like FlexConv), I guess it might be that it's converted from onnx. I tried to quantize directly on TF and I got segment fault. I think it's something related to the calibration dataset? Not sure if you have tried. The code is here (mostly taken from your notebook): processor = WhisperProcessor.from_pretrained("openai/whisper-tiny.en") class GenerateModel(tf.Module): @tf.function( def representative_dataset(): saved_model_dir = 'tf_whisper_saved' Convert the modelconverter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) Save the modeltflite_model_path = 'whisper.tiny.tflite' |
Beta Was this translation helpful? Give feedback.
-
The Int8 Whisper TFLite isn't behaving as anticipated, as one of its layers, such as "mul," struggles to handle its input in Int8. Perhaps it's worth considering converting to Int16 activation and Int8 weights. |
Beta Was this translation helpful? Give feedback.
-
I dont know will anyone want to use whisper using browser whatever android ios pc, using server https://gitlab.com/krafi/whisperweb |
Beta Was this translation helpful? Give feedback.
-
Hi, does anyone have any solution to changing the name of the node of the input and output of a whisper tflite model viewed from netron? |
Beta Was this translation helpful? Give feedback.
-
Hi! Love the work done on Whisper in TFLite. Just wondering if it's possible to run the model on audio files longer than 30s? Or will it have to be done separately? Thanks. |
Beta Was this translation helpful? Give feedback.
-
hi @nyadla-sys can we use GPU-Delegate for int8 model? |
Beta Was this translation helpful? Give feedback.
-
I developed Android APP based on tiny whisper.tflite (quantized ~40MB tflite model)
Ran inference in ~2 seconds for 30 seconds audio clip on Pixel-7 mobile phone
https://github.com/usefulsensors/openai-whisper/blob/main/android_app/release/app-release.apk
Use the below command to push on to android phone
adb install -r -t ~/android_app/release/app-release.apk
Beta Was this translation helpful? Give feedback.
All reactions