new implementation for positional encoding #103

keighrim · 2024-06-17T14:54:27Z

New Feature Summary

ATM we have implemented three positional encoding scheme in the trainer code, but the way it's implemented for training as well as shipped in the app package (as checkpoint files) is very confusing for both devs and users. I'd like to address this problem by re-implementing the positional encoding in a simpler, robust way.

Current status (#26) and problems

we have three schemes (fractional, sinusoidal-concat, sinusoidal-add) for positional encoding but they behave different in different ways, which adds
1. confusion for model developers in terms different dimensions of vector representation,
  - fractional will add 1 more dimension to the backbone CNN vectors
  - sinusoidal-concat will add arbitrary number of dimension (some powers of 2, usually 256 or 512) to the backbone CNN vectors
  - sinusoidal-add does not add additional dimension
2. confusion between concepts of "absolute" and "relative" encoding. Sinusoidal encodings are inspired by the transfomer paper, where the original problem of "position" was token position in natural language sentences. That made sense since the main purpose of positional encoding in transformer architecture is to provide relative difference "between tokens", and to proxy a sequential representation. However, current CNN-based classification SWT model is not at all sequential, hence the absoluteness of the sinusoidal encoding makes not much sense here.
3. difficulties in grid search and result probing - this is simply by adding too much variation to the equation.
4. unnecessary limitation on input data format (fix hardcoded 94 min limit on positional encoding #100)

Proposal

I'm proposing

we always use "sinusoidal-add": this will not change the dimension of feature vectors, and we will keep the CNN-backbone dims regardless of usage of positional encoding
we always use "relative" position: since absolute different between time 00:00:07 and 00:00:08 is not relevant information for current isolated image-based classifier.
we expose "on-off" switch of positional encoding in the runtime parameter: this implies that we always train two models given a set of hyperparams - one with sinusoidal-add, and another without - and the evaluation will be delegated to a separate evaluation module, possibly as a part of evaluate the new SWT app aapb-evaluations#43

Alternatives

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

keighrim · 2024-07-05T18:25:04Z

As we can approximate pos_enc_name="none" by using pos_enc_coeff=0, we decided to completely drop the pos_enc_name config key.

keighrim added the ✨N New feature or request label Jun 17, 2024

clams-bot assigned kla7 Jul 3, 2024

kla7 mentioned this issue Jul 9, 2024

Implement positional encoding #109

Merged

kla7 closed this as completed in #109 Jul 10, 2024

clams-bot unassigned kla7 Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new implementation for positional encoding #103

new implementation for positional encoding #103

keighrim commented Jun 17, 2024

keighrim commented Jul 5, 2024

new implementation for positional encoding #103

new implementation for positional encoding #103

Comments

keighrim commented Jun 17, 2024

New Feature Summary

Current status (#26) and problems

Proposal

Related

Alternatives

Additional context

keighrim commented Jul 5, 2024