How the recovery of locality works? #1

liyucheng09 · 2024-10-25T13:24:54Z

Congrats for the great paper!

I have a simple question about the recovering of locality.

by adding a w, it literally shifts the left bottom corner to the right, which is just recovering the first step of shifting leftwards. am I misunderstand this operation, or there are more tricks or findings here?

The text was updated successfully, but these errors were encountered:

ChenxinAn-fdu · 2024-10-26T15:27:02Z

Hi! Thank you for the great question.

I think you can understand this operation in that way. Let me explain using the shifted position matrix of Llama3.1 128K as an example:

If we do not set the local_value = 128, the 42K-th slash lines will be set to 0 (instead of 128). However, as we all know, LLMs strongly rely on the neighboring N tokens to maintain fluent content. By setting local_value = 128, all tokens can ensure that their neighboring 128 tokens have the closest distance.

If my answer does not fully address your question, please feel free to ask for further clarification.

liyucheng09 · 2024-10-26T16:47:42Z

Thanks for the response! I assume there will be some overlapping on position ids, like there will be multiple token associated to position 128? can the model works on this setting without further training?

ChenxinAn-fdu · 2024-10-27T14:06:29Z

In fact, positions 128-42K are used twice in STRING. I think there may be some negative effects due to this duplication. However, based on the experimental results obtained, the side effects caused by the repetition appear to be significantly outweighed by the performance improvements gained from using well-trained positional embeddings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How the recovery of locality works? #1

How the recovery of locality works? #1

liyucheng09 commented Oct 25, 2024

ChenxinAn-fdu commented Oct 26, 2024 •

edited

Loading

liyucheng09 commented Oct 26, 2024

ChenxinAn-fdu commented Oct 27, 2024 •

edited

Loading

How the recovery of locality works? #1

How the recovery of locality works? #1

Comments

liyucheng09 commented Oct 25, 2024

ChenxinAn-fdu commented Oct 26, 2024 • edited Loading

liyucheng09 commented Oct 26, 2024

ChenxinAn-fdu commented Oct 27, 2024 • edited Loading

ChenxinAn-fdu commented Oct 26, 2024 •

edited

Loading

ChenxinAn-fdu commented Oct 27, 2024 •

edited

Loading