Select indirect BGEMM kernels - Benchmarking grouped binary convolutions #711

simonmaurer · 2022-02-09T13:24:27Z

Given the commits #549, #550, #551 LCE supportes grouped binary convolutions. this is great work as the standard TFLite still does not support the groups argument for inference: tensorflow/tensorflow#40044
I've successfully created models with appropriate channel dimensions, in which the grouped binary convolutions are correctly identified by the LCE Converter.

How can I benchmark this with the lce_benchmark_model binary ? In other words how can we select the indirect_bgemm kernels, as the regular bgemm kernels don't support grouped convolution ?

Additionally there is a flag use_reference_bconv in the LCE Interpreter, but I do not know what this actually means.
Assuming if it is set to True the binary bgemm kernels from https://github.com/larq/compute-engine/tree/main/larq_compute_engine/core/bgemm are selected, otherwise the indirect_bgemm from https://github.com/larq/compute-engine/tree/main/larq_compute_engine/core/indirect_bgemm.

Update: the assumption is not correct, as use_reference_bconv is False by default. so use_reference_bconv is explained differently.

The text was updated successfully, but these errors were encountered:

Tombana · 2022-02-09T13:45:06Z

We currently don't have a CLI flag in lce_benchmark_model to choose between these. For internal benchmarks we simply replaced the registration on the following line:

compute-engine/larq_compute_engine/tflite/kernels/lce_ops_register.h

Lines 31 to 32 in a2611f8

    
           resolver->AddCustom("LceBconv2d", 
        
                               compute_engine::tflite::Register_BCONV_2D());

with Register_BCONV_2D_OPT_INDIRECT_BGEMM.

I'd welcome a PR to make this into a commandline flag, my suggestion would be:

Add a bool use_indirect_bgemm (default false) argument to RegisterLCECustomOps with another if-branch next to use_reference_bconv in lce_ops_register.h.
To add it as a commandline flag, I'd say the simplest (without modifying the TFLite benchmark BenchmarkTfLiteModel code) is to parse the commandline flags in lce_benchmark_main.cc and store the result as a global bool in that file, which can then be passed to RegisterLCECustomOps on line 26.

Note that use_reference_bconv uses core/bconv2d/reference.h which supports 'everything' such as zero-padding, one-padding and groups. The optimized implementations, however, don't support all of those.

simonmaurer · 2022-02-17T11:54:06Z

@Tombana thanks a lot for pointing me to the right direction.
can do a PR and include a filtering of the arguments, so we can parse the flag (as suggested by you) and remove it from argv before passing it to the BenchmarkTfLiteModel as I assume (need to verify though) this will throw an unrecognized argument error

simonmaurer · 2022-03-10T12:29:34Z

closing issue as it has been solved by #717

simonmaurer mentioned this issue Mar 7, 2022

Lce benchmark and interpreter flags #717

Merged

simonmaurer closed this as completed Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select indirect BGEMM kernels - Benchmarking grouped binary convolutions #711

Select indirect BGEMM kernels - Benchmarking grouped binary convolutions #711

simonmaurer commented Feb 9, 2022 •

edited

Loading

Tombana commented Feb 9, 2022

simonmaurer commented Feb 17, 2022

simonmaurer commented Mar 10, 2022

Select indirect BGEMM kernels - Benchmarking grouped binary convolutions #711

Select indirect BGEMM kernels - Benchmarking grouped binary convolutions #711

Comments

simonmaurer commented Feb 9, 2022 • edited Loading

Tombana commented Feb 9, 2022

simonmaurer commented Feb 17, 2022

simonmaurer commented Mar 10, 2022

simonmaurer commented Feb 9, 2022 •

edited

Loading