Difference between CLS hidden state and pooled_output #28

intrandom5 · 2022-10-30T14:15:25Z

intrandom5
Oct 30, 2022
Maintainer

이전에 재환이형이 질문했던 pooler output 관련 이야기.
저번에 내가 임의로 모델을 만들었을 때 성능이 잘 안나왔던 이유에 관한 연장선.

Pooler output은 "학습된" linear layer임. 그렇기 때문에 내가 임의로 pooling을 적용한 것보다 성능이 좋음.(이미 학습되어 있기 때문에)
Pooler output은 CLS 토큰의 output hidden state 에 linear layer와 tanh 활성화 함수를 적용한 결과이다.
Pooler output은 NSP를 통해 학습된다고 한다. 아마 transformers 코드 구현자가 TensorFlow로 구현된 BERT 원본 코드를 base로 코드를 짠 것 같다고 함.
또한 CLS 토큰을 bert output으로 쓰는 것보다 모든 hidden state output에 MeanPooling을 적용하는 것이 더 좋다고 하지만, 실제로는 CLS 토큰 output을 classification에 사용하는 것도 성능이 잘 나온다고 한다.

BertForSequenceClassification은 이 pooler output에 class 개수에 맞는 linear layer를 추가해 구성된 모델이다.

intrandom5 · 2022-11-04T01:32:55Z

intrandom5
Nov 4, 2022
Maintainer Author

멘토님의 경우에는 실제로 어떻게 많이 쓰는지...? 때에 따라 다른건가?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between CLS hidden state and pooled_output #28

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Difference between CLS hidden state and pooled_output #28

intrandom5 Oct 30, 2022 Maintainer

Replies: 1 comment

intrandom5 Nov 4, 2022 Maintainer Author

intrandom5
Oct 30, 2022
Maintainer

intrandom5
Nov 4, 2022
Maintainer Author