On-device Whisper inference on Android mobile using whisper.tflite(quantized 40MB model) #506

nyadla-sys · 2022-11-10T02:47:07Z

nyadla-sys
Nov 10, 2022

I developed Android APP based on tiny whisper.tflite (quantized ~40MB tflite model)
Ran inference in ~2 seconds for 30 seconds audio clip on Pixel-7 mobile phone
https://github.com/usefulsensors/openai-whisper/blob/main/android_app/release/app-release.apk
Use the below command to push on to android phone
adb install -r -t ~/android_app/release/app-release.apk

nyadla-sys · 2022-11-11T20:15:25Z

nyadla-sys
Nov 11, 2022
Author

I released Whisper Android App based on Whisper.tflite ~40MB quantized model to the Android App Store for testing; if anyone is interested, please let me know.

Feel free to download the openai/whisper-tiny tflite-based Android Whisper ASR APP from Google App Store.

Feel free to download the openai/whisper-tiny tflite-based Apple Whisper ASR APP from Apple App Store.

3 replies

nyadla-sys Sep 11, 2023
Author

Yes,you can use 10sec data for inference by mimicking 20seconds audio data as zeros

nyadla-sys Sep 11, 2023
Author

https://github.com/nyadla-sys/whisper.tflite

nyadla-sys Sep 11, 2023
Author

If we pad zero's time reduces alot compared real 30s audio,however it requires fine tuning the tflite model for 10 sec processing

nyadla-sys · 2022-11-12T00:54:09Z

nyadla-sys
Nov 12, 2022
Author

This is only a proof-of-concept project to create an Android app based on Whisper TFLite, which leverages the stock Android UI to show off its features.
Whisper-TFLIte-Android-Example

0 replies

nyadla-sys · 2022-11-29T18:48:17Z

nyadla-sys
Nov 29, 2022
Author

Please feel free to download the openai/whisper-tiny tflite-based Android APP from Google App Store.
https://play.google.com/store/apps/details?id=com.whisper.android.tflitecpp

2 replies

edanweis Dec 11, 2022

Amazing! Multilingual and different sized models would be useful for comparison in the app. Great work

shubham0204 Jun 9, 2024

@nyadla-sys the provided Google Play URL is broken

rushi-the-neural-arch · 2022-12-08T15:26:03Z

rushi-the-neural-arch
Dec 8, 2022

Hey @nyadla-sys, this seems amazing, a lightweight model to run on mobile phones! However, I am just curious how did you convert the torch model to tflite?? If yan you share some references/resources regarding the same, it would be very helpful! Also how accurate is the conversion from torch to tflite? Thanks!

0 replies

nyadla-sys · 2022-12-08T17:00:05Z

nyadla-sys
Dec 8, 2022
Author

Please see the below notebook which converts to TFLite model
https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/generate_tflite_from_whisper.ipynb.
https://github.com/usefulsensors/openai-whisper/tree/main/notebooks

1 reply

zzy981019 Jun 13, 2024

Hi @nyadla-sys , really great work! I ran this colab today without any changes to the code, but the transcription came out a little weird. Could you please share where you think the problem lies? Thanks a lot!

nyadla-sys · 2022-12-08T18:13:23Z

nyadla-sys
Dec 8, 2022
Author

Huggingface has already been converted from a PyTorch model to a TF model, which I then converted into a TFLite model.
Please see the model card listed below for additional information and its word error rate.
https://huggingface.co/openai/whisper-tiny.en

4 replies

rushi-the-neural-arch Dec 9, 2022

This is pretty helpful, Thank you very much!🙂

JKeddo95 Dec 11, 2022

Thank you!

NullByte08 Aug 2, 2023

Can you please share the tflite model? Thanks

nyadla-sys Aug 2, 2023
Author

Please use the git lfs install to get uncompressed file
https://github.com/usefulsensors/openai-whisper/blob/main/models/whisper-tiny.en.tflite
refer for more details
https://github.com/usefulsensors/openai-whisper/tree/main

Rdolink · 2023-01-26T01:32:08Z

Rdolink
Jan 26, 2023

It would be good to improve the automatic detection of the Spanish language in the mobile application.

1 reply

Immrspy Jun 20, 2023

The app uses the tiny.en language model, and therefore does not support languages other than English

antran89 · 2023-01-26T08:29:51Z

antran89
Jan 26, 2023

whether it is possible to deploy bigger whisper model on the phone. I assume you're using the English-only tiny.en model, can the app extend to the multi-language model.

2 replies

nyadla-sys Jan 26, 2023
Author

it is possible to deploy bigger whisper models on the phone and as well it can support multi language model.
As I am busy in doing some other works and it is not my high priority at this time.

Usama9999 Apr 3, 2023

hey! Can you please help me run this mobile application code.

appsoft124 · 2023-05-12T04:38:57Z

appsoft124
May 12, 2023

I developed Android APP based on tiny whisper.tflite (quantized ~40MB tflite model) Ran inference in ~2 seconds for 30 seconds audio clip on Pixel-7 mobile phone https://github.com/usefulsensors/openai-whisper/blob/main/android_app/release/app-release.apk Use the below command to push on to android phone adb install -r -t ~/android_app/release/app-release.apk

I am facing 1 issue in android older versions can you fix it please ?

0 replies

haydenkaizeta · 2023-06-12T03:04:30Z

haydenkaizeta
Jun 12, 2023

@nyadla-sys How do I get this working with timestamps enabled?

0 replies

hyunki85 · 2023-06-16T05:33:09Z

hyunki85
Jun 16, 2023

nice work

0 replies

lrq3000 · 2023-08-10T11:38:29Z

lrq3000
Aug 10, 2023

This is awesome! Thank you so much! Could you please upload to f-droid too instead of only google play, so that y; u app can still be downloaded in the future (google regularly changes the rules and kick off older apps).

0 replies

lrq3000 · 2023-08-10T11:40:48Z

lrq3000
Aug 10, 2023

Also could you maybe add a button to input an audio file instead of only mic recording? Then it would make this poc very much usable in combination with other FLOSS apps such as https://github.com/Dimowner/AudioRecorder

0 replies

nyadla-sys · 2023-08-30T22:17:22Z

nyadla-sys
Aug 30, 2023
Author

Refer the below github repo for Android example, which uses whisper-tiny.en.tflite model
https://github.com/nyadla-sys/whisper.tflite

0 replies

tensorbuffer · 2023-10-17T00:12:30Z

tensorbuffer
Oct 17, 2023

Hi @nyadla-sys which TF version you used? I tried to run the steps in the notebook you mentioned above, with TF 2.14 (which is the latest from pip install) and I got errors with StridedSlice op:

W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2178] TFLite interpreter needs to link Flex delegate in order to run the model since it contains the following Select TFop(s):
Flex ops: FlexStridedSlice
Details:
tf.StridedSlice(tensor<?x?xf32>, tensor<4xi32>, tensor<4xi32>, tensor<4xi32>) -> (tensor<1x1x?x?xf32>) : {begin_mask = 12 : i64, device = "", ellipsis_mask = 0 : i64, end_mask = 12 : i64, new_axis_mask = 3 : i64, shrink_axis_mask = 0 : i64}
tf.StridedSlice(tensor<?x?xf32>, tensor<4xi32>, tensor<4xi32>, tensor<4xi32>) -> (tensor<?x1x1x?xf32>) : {begin_mask = 9 : i64, device = "", ellipsis_mask = 0 : i64, end_mask = 9 : i64, new_axis_mask = 6 : i64, shrink_axis_mask = 0 : i64}
See instructions: https://www.tensorflow.org/lite/guide/ops_select

4 replies

nyadla-sys Oct 17, 2023
Author

Can you share your colab for this?

tensorbuffer Oct 17, 2023

I don't have a collar, but I used exactly same as what you referred to: https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/generate_tflite_from_whisper.ipynb
Used a new condo environent that only installed tensorflow 2.14 (with a few packages that are needed to run the scripts, like soundfile and librosa). My code is exactly same as the code in the above link

nyadla-sys Oct 17, 2023
Author

Please makesure you have below code

Convert the model

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

nyadla-sys Oct 17, 2023
Author

I just added below code in beginning of my colab and my rest of colab worked as expected
!pip3 install tensorflow==2.14
import tensorflow as tf
print(tf.version)

refer the below colab
https://colab.research.google.com/drive/1hF9HA9Q7te4mqQqbhfM7LWRZ_stTIKaJ?usp=sharing

tensorbuffer · 2023-10-17T04:00:25Z

tensorbuffer
Oct 17, 2023

Yes I have that line, which is already in your original colab. What's your output of the convert() function? My guess is that you have the same output as mine, but since I use benchmark_tool to evaluate this might be the difference. My convert() output is: 2023-10-16 20:54:34.116612: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format. 2023-10-16 20:54:34.116650: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency. 2023-10-16 20:54:34.117319: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: tf_whisper_tiny.tfsaved 2023-10-16 20:54:34.163104: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve } 2023-10-16 20:54:34.163136: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: tf_whisper_tiny.tfsaved 2023-10-16 20:54:34.273076: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled 2023-10-16 20:54:34.297966: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle. 2023-10-16 20:54:34.811451: I tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: tf_whisper_tiny.tfsaved 2023-10-16 20:54:35.102822: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 985506 microseconds. 2023-10-16 20:54:35.436697: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable. 2023-10-16 20:54:41.041211: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2178] TFLite interpreter needs to link Flex delegate in order to run the model since it contains the following Select TFop(s): Flex ops: FlexStridedSlice Details: tf.StridedSlice(tensor<?x?xf32>, tensor<4xi32>, tensor<4xi32>, tensor<4xi32>) -> (tensor<1x1x?x?xf32>) : {begin_mask = 12 : i64, device = "", ellipsis_mask = 0 : i64, end_mask = 12 : i64, new_axis_mask = 3 : i64, shrink_axis_mask = 0 : i64} tf.StridedSlice(tensor<?x?xf32>, tensor<4xi32>, tensor<4xi32>, tensor<4xi32>) -> (tensor<?x1x1x?xf32>) : {begin_mask = 9 : i64, device = "", ellipsis_mask = 0 : i64, end_mask = 9 : i64, new_axis_mask = 6 : i64, shrink_axis_mask = 0 : i64} See instructions: https://www.tensorflow.org/lite/guide/ops_select BTW the code is: import tensorflow as tf from transformers import TFWhisperModel, WhisperFeatureExtractor from datasets import load_dataset model = TFWhisperModel.from_pretrained("openai/whisper-tiny") feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-tiny") ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") inputs = feature_extractor( ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="tf" ) input_features = inputs.input_features decoder_input_ids = tf.convert_to_tensor([[1, 1]]) * model.config.decoder_start_token_id last_hidden_state = model(input_features, decoder_input_ids=decoder_input_ids).last_hidden_state model.save('tf_whisper_tiny.tfsaved') converter = tf.lite.TFLiteConverter.from_saved_model('tf_whisper_tiny.tfsaved') converter.target_spec.supported_ops = [ tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops. tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops. ] converter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_model = converter.convert() tf.__version__ shows 2.14.0

6 replies

tensorbuffer Oct 17, 2023

Thanks for looking into the code! I see you have two convert:
Convert saved model to TFLite model
Create generation-enabled TF Lite model
I only tried the first convert. The colab doesn't show the log of the first convert so I don't know if you have the same issue as mine.
Anyways you use the tflite model from second convert to evaluate right? I am not clear what you mean by " Generation is much more complex that a model forward pass", the 'generation' here means generate tflite? Looks like I need to continue your steps to try the second convert.

nyadla-sys Oct 17, 2023
Author

Try to follow exactly the same steps mentioned in colab and I am pretty sure it will work as expected.
I can actually optimize collab code to just generate tflite model , but i am a bit lazy to rework on that

tensorbuffer Oct 17, 2023

yes I used the latter part of the colab and it works, the speed is kind of slow on the android system but we can try to improve it, thanks!

nyadla-sys Oct 17, 2023
Author

For 30s audio on latest pixel 7 phone inference is taking around 700ms

nyadla-sys Oct 17, 2023
Author

It is more than 30x real-time speed

tensorbuffer · 2023-10-31T19:00:28Z

tensorbuffer
Oct 31, 2023

Hi, I tried to run the quantized model (https://github.com/usefulsensors/openai-whisper/blob/main/models/whisper-int8.tflite) but it has flex ops (like FlexConv), I guess it might be that it's converted from onnx. I tried to quantize directly on TF and I got segment fault. I think it's something related to the calibration dataset? Not sure if you have tried. The code is here (mostly taken from your notebook):
import tensorflow as tf
from transformers import WhisperProcessor, TFWhisperForConditionalGeneration
from datasets import load_dataset

processor = WhisperProcessor.from_pretrained("openai/whisper-tiny.en")
model = TFWhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")

class GenerateModel(tf.Module):
def init(self, model):
super(GenerateModel, self).init()
self.model = model

@tf.function(
# shouldn't need static batch size, but throws exception without it (needs to be fixed)
input_signature=[
tf.TensorSpec((1, 80, 3000), tf.float32, name="input_features"),
],
)
def serving(self, input_features):
outputs = self.model.generate(
input_features,
max_new_tokens=223, #change as needed
return_dict_in_generate=True,
)
return {"sequences": outputs["sequences"]}

def representative_dataset():
for x in range(20):
inputs = processor(ds[x]["audio"]["array"], sampling_rate=16000, return_tensors="tf")
input_features = inputs.input_features
yield [input_features]

saved_model_dir = 'tf_whisper_saved'
generate_model = GenerateModel(model=model)
tf.saved_model.save(generate_model, saved_model_dir, signatures={"serving_default": generate_model.serving})

Convert the model

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_model = converter.convert()

Save the model

tflite_model_path = 'whisper.tiny.tflite'
with open(tflite_model_path, 'wb') as f:
f.write(tflite_model)

0 replies

nyadla-sys · 2023-11-01T13:51:48Z

nyadla-sys
Nov 1, 2023
Author

The Int8 Whisper TFLite isn't behaving as anticipated, as one of its layers, such as "mul," struggles to handle its input in Int8. Perhaps it's worth considering converting to Int16 activation and Int8 weights.

2 replies

tensorbuffer Nov 1, 2023

At this stage we are just trying to see if it can be quantized or not to run on our accelerator, not too much on accuray.
I noticed that you have multiple tiny tflite models, like whisper-tiny.tflite, and whisper-encoder-tiny.tflite whisper-decoder-tiny.tflite. Is the first one a combination of the next two? The first one has GELU which is not there in the next two, and the next two has Flex op which is not there in the first one. I guess it's due to the conversion process (e.g. using onnx as intermediate or not?)

nyadla-sys Nov 2, 2023
Author

I have tried many options to generate fully quantized tflite, something like converting from pytorch to onx to tf to tflite and some using post training quantization from hugging face whisper tf to tflite.however I couldn't successfully run any of these fully quantized models on any device

View-my-Git-Lab-krafi · 2024-06-13T15:13:40Z

View-my-Git-Lab-krafi
Jun 13, 2024

I dont know will anyone want to use whisper using browser whatever android ios pc, using server https://gitlab.com/krafi/whisperweb

0 replies

zzy981019 · 2024-08-23T03:24:40Z

zzy981019
Aug 23, 2024

Hi, does anyone have any solution to changing the name of the node of the input and output of a whisper tflite model viewed from netron?
Thanks!

0 replies

DarioPTWR · 2024-09-01T05:54:43Z

DarioPTWR
Sep 1, 2024

Hi! Love the work done on Whisper in TFLite. Just wondering if it's possible to run the model on audio files longer than 30s? Or will it have to be done separately?

Thanks.

1 reply

nyadla-sys Sep 6, 2024
Author

@DarioPTWR it is possible just need to split audio file into 30s audio chunks and feed it to the model

mattewpsys · 2024-11-22T12:10:34Z

mattewpsys
Nov 22, 2024

hi @nyadla-sys can we use GPU-Delegate for int8 model?

0 replies

On-device Whisper inference on Android mobile using whisper.tflite(quantized 40MB model) #506

Replies: 22 comments · 26 replies

nyadla-sys Nov 11, 2022 Author

nyadla-sys Sep 11, 2023 Author

nyadla-sys Sep 11, 2023 Author

nyadla-sys Sep 11, 2023 Author

nyadla-sys Nov 12, 2022 Author

nyadla-sys Nov 29, 2022 Author

nyadla-sys Dec 8, 2022 Author

nyadla-sys Dec 8, 2022 Author

nyadla-sys Aug 2, 2023 Author

nyadla-sys Jan 26, 2023 Author

nyadla-sys Aug 30, 2023 Author

nyadla-sys Oct 17, 2023 Author

nyadla-sys Oct 17, 2023 Author

Convert the model

nyadla-sys Oct 17, 2023 Author

nyadla-sys Oct 17, 2023 Author

nyadla-sys Oct 17, 2023 Author

nyadla-sys Oct 17, 2023 Author

Convert the model

Save the model

nyadla-sys Nov 1, 2023 Author

nyadla-sys Nov 2, 2023 Author

nyadla-sys Sep 6, 2024 Author

Replies: 22 comments 26 replies

nyadla-sys
Nov 11, 2022
Author

nyadla-sys Sep 11, 2023
Author

nyadla-sys Sep 11, 2023
Author

nyadla-sys Sep 11, 2023
Author

nyadla-sys
Nov 12, 2022
Author

nyadla-sys
Nov 29, 2022
Author

nyadla-sys
Dec 8, 2022
Author

nyadla-sys
Dec 8, 2022
Author

nyadla-sys Aug 2, 2023
Author

nyadla-sys Jan 26, 2023
Author

nyadla-sys
Aug 30, 2023
Author

nyadla-sys Oct 17, 2023
Author

nyadla-sys Oct 17, 2023
Author

nyadla-sys Oct 17, 2023
Author

nyadla-sys Oct 17, 2023
Author

nyadla-sys Oct 17, 2023
Author

nyadla-sys Oct 17, 2023
Author

nyadla-sys
Nov 1, 2023
Author

nyadla-sys Nov 2, 2023
Author

nyadla-sys Sep 6, 2024
Author