9/4/2023 0 Comments Asl gloss translator![]() ![]() We obtained state-of-the-art results on large-scale datasets, including OpenASL and How2Sign. In a contrastive manner, we encourage the similarity of query results between samples containing such concepts and decrease those that do not. The global embedding of these concepts is used as a query for cross-attention to find the corresponding information within the learned visual features. ![]() Common concepts are extracted from the text and used as a weak form of intermediate representation. Our method improves the performance of SLT in the gloss-free setting by exploiting the shared underlying semantics of signs and the corresponding spoken translation. To mitigate this problem, we design the Gloss-Free End-to-end sign language translation framework (GloFE). ![]() This limits the domain coverage of translation datasets, thus handicapping real-world applications. Although intermediate representation like gloss has been proven effective, gloss annotations are hard to acquire, especially in large quantities. In this paper, we tackle the problem of sign language translation (SLT) without gloss annotations. ![]() We find that it can provide two aspects of information for the model, 1) it can help the model implicitly learn the location of semantic boundaries in continuous sign language videos, 2) it can help the model understand the sign language video globally. To solve this problem, we first perform an analysis of existing models to confirm how gloss annotations make SLT easier. Most sign language translation (SLT) methods to date require the use of gloss annotations to provide additional supervision information, however, the acquisition of gloss is not easy. Furthermore, our approach also achieves competitive results on the PHOENIX14T dataset when compared with most of the gloss-based methods. In particular, we have achieved unprecedented improvements in terms of BLEU-4 score on the PHOENIX14T dataset (>+5) and the CSL-Daily dataset (>+3) compared to state-of-the-art gloss-free SLT methods. The seamless combination of these novel designs forms a robust sign language representation and significantly improves gloss-free sign language translation. Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training (CLIP) with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual Encoder and Text Decoder from the first stage. To address this challenge, we propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-VLP), which improves SLT by inheriting language-oriented prior knowledge from pre-trained models, without any gloss annotation assistance. However, the scarcity of gloss-annotated sign language data, combined with the information bottleneck in the mid-level gloss representation, has hindered the further development of the SLT task. Many previous methods employ an intermediate representation, i.e., gloss sequences, to facilitate SLT, thus transforming it into a two-stage task of sign language recognition (SLR) followed by sign language translation (SLT). Sign Language Translation (SLT) is a challenging task due to its cross-domain nature, involving the translation of visual-gestural language to text. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |