Skip to content

Adding "SignBLEU: Automatic Evaluation of Multi-channel Sign Language Translation" and some eval metrics discussion #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
bbedee1
CDL: reference for kim-etal-2024-signbleu-automatic
cleong110 Jun 12, 2024
fa8a9b8
CDL: reference for NIASL2021 Dataset
cleong110 Jun 12, 2024
d627cd9
CDL: rebiber on citations
cleong110 Jun 12, 2024
943c223
CDL: adding linearization reference for SignBLEU
cleong110 Jun 12, 2024
6849121
CDL: rough draft of SignBLEU
cleong110 Jun 13, 2024
fcbadc5
CDL: trying to cite NCSLGR properly
cleong110 Jun 14, 2024
b615317
CDL: updating {} in references.md
cleong110 Jun 14, 2024
3903d9a
CDL: first draft of SignBLEU
cleong110 Jun 14, 2024
71ca972
CDL: draft 2, edited and cleaned up.
cleong110 Jun 14, 2024
f760469
CDL: rewriting SignBLEU: removing MCSLT terminology, various style ch…
cleong110 Jun 17, 2024
c4e778c
Adding a quick sentence about Ham2Pose to evaluation metrics
cleong110 Jun 17, 2024
f4ad5d7
Merge branch 'master' into paper/kim-etal-2024-signbleu-automatic
cleong110 Jun 17, 2024
c3546d5
CDL: various suggested changes for eval metrics. Rearrangement, etc
cleong110 Jun 18, 2024
d88d7b9
CDL: adding updated description for ham2pose DTW-MJE
cleong110 Jun 18, 2024
0697405
Language2Pose citation
cleong110 Jun 19, 2024
4b720de
CDL: adding in more material about 2Sign metrics
cleong110 Jun 19, 2024
12488f6
CDL: slight rewrite of pose output metrics section
cleong110 Jun 19, 2024
b2d65b8
Merge branch 'master' into paper/kim-etal-2024-signbleu-automatic
cleong110 Jun 19, 2024
eff22e7
CDL: some more APE citations
cleong110 Jun 19, 2024
6c63e47
Merge branch 'master' into paper/kim-etal-2024-signbleu-automatic
cleong110 Jun 19, 2024
50e3daa
CDL: making some requested changes for SignBLEU
cleong110 Jun 20, 2024
35f746d
Merge branch 'master' into paper/kim-etal-2024-signbleu-automatic
cleong110 Jun 20, 2024
bf2059e
CDL: adding a note to the README about 11k vs 11,000
cleong110 Jun 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ For attribution in academic contexts, please cite this work as:
- **Hyphenation**: Use hyphens (-) for compound adjectives (e.g., video-to-pose).
- **Lists**: Use "-" for list items, followed by a space.
- **Code**: Use backticks (`) for inline code, and triple backticks (```) for code blocks.
- **Numbers**: Spell out numbers less than 10, and use numerals for 10 and greater.
- **Numbers**: Spell out numbers less than 10, and use numerals for 10 and greater. For large numbers, separate with commas and do not abbreviate with suffixes such as "k" (11,000, not 11k)
- **Contractions**: Avoid contractions (e.g., use "do not" instead of "don't").
- **Compound Words**: Use a forward slash (/) to separate alternative compound words (e.g., 2D / 3D).
- **Phrasing**: Prefer active voice over passive voice (e.g., "The authors used..." instead of "The work was used by the authors...").
Expand Down
50 changes: 49 additions & 1 deletion src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -837,6 +837,7 @@ They apply several low-resource machine translation techniques used to improve s
Their findings validate the use of an intermediate text representation for signed language translation, and pave the way for including sign language translation in natural language processing research.

#### Text-to-Notation

@jiang2022machine also explore the reverse translation direction, i.e., text to SignWriting translation.
They conduct experiments under a same condition of their multilingual SignWriting to text (4 language pairs) experiment, and again propose a neural factored machine translation approach to decode the graphemes and their position separately.
They borrow BLEU from spoken language translation to evaluate the predicted graphemes and mean absolute error to evaluate the positional numbers.
Expand All @@ -846,14 +847,61 @@ They borrow BLEU from spoken language translation to evaluate the predicted grap
---

#### Notation-to-Pose

@shalev2022ham2pose proposed Ham2Pose, a model to animate HamNoSys into a sequence of poses.
They first encode the HamNoSys into a meaningful "context" representation using a transform encoder,
and use it to predict the length of the pose sequence to be generated.
Then, starting from a still frame they used an iterative non-autoregressive decoder to gradually refine the sign over $T$ steps,
In each time step $t$ from $T$ to $1$, the model predicts the required change from step $t$ to step $t-1$. After $T$ steps, the pose generator outputs the final pose sequence.
Their model outperformed previous methods like @saunders2020progressive, animating HamNoSys into more realistic sign language sequences.
Their model outperformed previous methods like @saunders2020progressive, animating HamNoSys into more realistic sign language sequences.

#### Evaluation Metrics

Methods for automatic evaluation of sign language processing are typically dependent only on the output and independent of the input.

##### Text output

For tasks that output spoken language text, standard machine translation metrics such as BLEU, chrF, or COMET are commonly used.
<!-- TODO: examples -->

##### Gloss Output

Gloss outputs can be automatically scored as well, though not without issues.
In particular, @muller-etal-2023-considerations analysed this and provide a series of recommendations (see the section on "Glosses", above).

##### Pose Output

For translation from spoken languages to signed languages, automatic evaluation metrics are an open line of research, though some metrics involving back-translation have been developed (see Text-to-Pose and Notation-to-Pose, above).
<!-- TODO: "Progressive Transformers for End-to-End Sign Language Production" is the one cited in Towards Fast and High-Quality Sign Language Production as a "widely-used setting" for backtranslation. -->
<!-- TODO: Towards Fast and High-Quality Sign Language Production uses back-translation. Discuss results and issues. -->

<!-- These three papers are cited in @shalev2022ham2pose as previous work using APE -->
Naively, works in this domain have used metrics such as Mean Squared Error (MSE) or Average Position Error (APE) for pose outputs [ahuja2019Language2PoseNaturalLanguage;ghosh2021SynthesisCompositionalAnimations;petrovich2022TEMOSGeneratingDiverse].
However, these metrics have significant limitations for Sign Language Production.

For example, MSE and APE do not account for variations in sequence length.
In practice, the same sign will not always take exactly the same amount of time to produce, even by the same signer.
To address time variation, @huang2021towards introduced a metric for pose sequence outputs based on measuring the distance between generated and reference pose sequences at the joint level using dynamic time warping, termed DTW-MJE (Dynamic Time Warping - Mean Joint Error).
However, this metric did not clearly address how to handle missing keypoints.
@shalev2022ham2pose experimented with multiple evaluation methods, and proposed adding a distance function that accounts for these missing keypoints.
They applied this function with normalization of keypoints, naming their metric nDTW-MJE.
<!-- They don't explicitly explain that the lowercase n is for "normalized keypoints" but that's my guess. -Colin -->

##### Multi-Channel Block output

As an alternative to gloss sequences, @kim-etal-2024-signbleu-automatic proposed a multi-channel output representation for sign languages and introduced SignBLEU, a BLEU-like scoring method for these outputs.
Instead of a single linear sequence of glosses, the representation segments sign language output into multiple linear channels, each containing discrete "blocks".
These blocks represent both manual and non-manual signals, for example, one for each hand and others for various non-manual signals like eyebrow movements.
The blocks are then converted to n-grams: temporal grams capture sequences within a channel, and channel grams capture co-occurrences across channels.
The SignBLEU score is then calculated for these n-grams of varying orders.
They evaluated SignBLEU on the DGS Corpus v3.0 [@dataset:Konrad_2020_dgscorpus_3; @dataset:prillwitz2008dgs], NIASL2021 [@dataset:huerta-enochian-etal-2022-kosign], and NCSLGR [@dataset:Neidle_2020_NCSLGR_ISLRN; @Vogler2012ASLLRP_data_access_interface] datasets, comparing it with single-channel (gloss) metrics such as BLEU, TER, chrF, and METEOR, as well as human evaluations by native signers.
The authors found that SignBLEU consistently correlated better to human evaluation than these alternatives.
However, one limitation of this approach is the lack of suitable datasets.
The authors reviewed a number of sign language corpora, noting the relative scarcity of multi-channel annotations.
The [source code for SignBLEU](https://github.com/eq4all-projects/SignBLEU) is available.
As with SacreBLEU [@post-2018-call-sacrebleu], the code can generate "version signature" strings summarizing key parameters, to enhance reproducibility.

<!-- (and SignBLEU can be installed and run! https://colab.research.google.com/drive/1mRCSBQSvjkoSOz5MFiOko1CgtamuCVYO?usp=sharing) -->

```{=ignore}
#### Pose-to-Notation
Expand Down
126 changes: 126 additions & 0 deletions src/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -3457,3 +3457,129 @@ @inproceedings{dataset:dal2022lsa
url = {https://doi.org/10.1007/978-3-031-22419-5_25},
year = {2023}
}


@inproceedings{kim-etal-2024-signbleu-automatic,
address = {Torino, Italia},
author = {Kim, Jung-Ho and
Huerta-Enochian, Mathew John and
Ko, Changyong and
Lee, Du Hui},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
pages = {14796--14811},
publisher = {ELRA and ICCL},
title = {{S}ign{BLEU}: Automatic Evaluation of Multi-channel Sign Language Translation},
url = {https://aclanthology.org/2024.lrec-main.1289},
year = {2024}
}

@inproceedings{dataset:huerta-enochian-etal-2022-kosign,
address = {Marseille, France},
author = {Huerta-Enochian, Mathew and
Lee, Du Hui and
Myung, Hye Jin and
Byun, Kang Suk and
Lee, Jun Woo},
booktitle = {Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives},
pages = {59--66},
publisher = {European Language Resources Association},
title = {{K}o{S}ign Sign Language Translation Project: Introducing The {NIASL}2021 Dataset},
url = {https://aclanthology.org/2022.sltat-1.9},
year = {2022}
}

@article{Bevilacqua_Blloshmi_Navigli_2021,
author = {Bevilacqua, Michele and Blloshmi, Rexhina and Navigli, Roberto},
doi = {10.1609/aaai.v35i14.17489},
journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
number = {14},
pages = {12564-12573},
title = {One {SPRING} to Rule Them Both: Symmetric {AMR} Semantic Parsing and Generation without a Complex Pipeline},
url = {https://ojs.aaai.org/index.php/AAAI/article/view/17489},
volume = {35},
year = {2021}
}

@misc{dataset:Konrad_2020_dgscorpus_3,
author = {Konrad, Reiner and Hanke, Thomas and Langer, Gabriele and Blanck, Dolly and Bleicken, Julian and Hofmann, Ilona and Jeziorski, Olga and K{\"o}nig, Lutz and K{\"o}nig, Susanne and Nishio, Rie and Regen, Anja and Salden, Uta and Wagner, Sven and Worseck, Satu and B{\"o}se, Oliver and Jahn, Elena and Schulder, Marc},
doi = {10.25592/dgs.corpus-3.0},
publisher = {Universit{\"a}t Hamburg},
title = {{{MEINE DGS}} -- Annotiert. {{{\"O}ffentliches}} Korpus Der Deutschen Geb{\"a}rdensprache, 3. {{Release}} / {{MY DGS}} -- Annotated. {{Public}} Corpus of German Sign Language, 3rd Release},
type = {Languageresource},
url = {https://doi.org/10.25592/dgs.corpus-3.0},
version = {3.0},
year = {2020}
}

@misc{dataset:Neidle_2020_NCSLGR_ISLRN,
author = {Carol Neidle and Stan Sclaroff},
publisher = {Boston University},
title = {National Center for Sign Language and Gesture Resources (NCSLGR) corpus. {ISLRN} 833-505-711-564-4},
type = {Languageresource},
url = {https://www.islrn.org/resources/833-505-711-564-4/},
year = {2012}
}

@inproceedings{Vogler2012ASLLRP_data_access_interface,
author = {Christian Vogler and C. Neidle},
title = {A new web interface to facilitate access to corpora: development of the {ASLLRP} data access interface},
url = {https://api.semanticscholar.org/CorpusID:58305327},
year = {2012}
}

@inproceedings{huangFastHighQualitySign2021,
title = {Towards Fast and {High-Quality Sign Language Production},
booktitle = {Proceedings of the 29th {{ACM International Conference}} on {{Multimedia}}},
author = {Huang, Wencan and Pan, Wenwen and Zhao, Zhou and Tian, Qi},
year = {2021},
month = oct,
series = {{{MM}} '21},
pages = {3172--3181},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
doi = {10.1145/3474085.3475463},
url = {https://doi.org/10.1145/3474085.3475463},
urldate = {2024-06-19},
isbn = {978-1-4503-8651-7}
}

@inproceedings{ahuja2019Language2PoseNaturalLanguage,
author = {Ahuja, Chaitanya and Morency, Louis-Philippe},
booktitle = {2019 {{International Conference}} on {{3D Vision}} ({{3DV}})},
doi = {10.1109/3DV.2019.00084},
issn = {2475-7888},
pages = {719--728},
shorttitle = {{{Language2Pose}}},
title = {{Language2Pose}: Natural Language Grounded Pose Forecasting},
url = {https://ieeexplore.ieee.org/document/8885540},
urldate = {2024-06-19},
year = {2019}
}

@inproceedings{ghosh2021SynthesisCompositionalAnimations,
author = {Ghosh, Anindita and Cheema, Noshaba and Oguz, Cennet and Theobalt, Christian and Slusallek, Philipp},
booktitle = {2021 {{IEEE}}/{{CVF International Conference}} on {{Computer Vision}} ({{ICCV}})},
doi = {10.1109/ICCV48922.2021.00143},
issn = {2380-7504},
pages = {1376--1386},
title = {Synthesis of Compositional Animations from Textual Descriptions},
url = {https://ieeexplore.ieee.org/document/9710802},
urldate = {2024-06-19},
year = {2021}
}

@inproceedings{petrovich2022TEMOSGeneratingDiverse,
address = {Cham},
author = {Petrovich, Mathis and Black, Michael J. and Varol, G{\"u}l},
booktitle = {Computer {{Vision}} -- {{ECCV}} 2022},
doi = {10.1007/978-3-031-20047-2_28},
editor = {Avidan, Shai and Brostow, Gabriel and Ciss{\'e}, Moustapha and Farinella, Giovanni Maria and Hassner, Tal},
isbn = {978-3-031-20047-2},
langid = {english},
pages = {480--497},
publisher = {Springer Nature Switzerland},
title = {{{TEMOS}}: Generating Diverse Human Motions from Textual Descriptions},
year = {2022}
}