Model Pembelajaran Mendalam untuk Tugas NLP: Tinjauan Literatur Sistematis

  • Eduard Pangestu Wonohardjo Universitas Bina Nusantara
Keywords: deep learning, machine translation, neural machine translation, seq2seq models, transformer models

Abstract

Dengan berkembangnya model pembelajaran mendalam, terjemahan mesin, yang melakukan tugas-tugas penting dalam pemrosesan bahasa alami, juga mengalami kemajuan yang signifikan. Sistem terjemahan mesin saraf yang menggunakan pendekatan jaringan saraf dalam telah melampaui metode statistik tradisional, terutama dengan munculnya arsitektur transformator dan mekanisme perhatian. Tinjauan literatur sistematis ini mengkaji perkembangan terkini dalam terjemahan mesin berbasis pembelajaran mendalam, dengan fokus pada model terlatih multibahasa, model Transformers, BERT, GPT, dan Seq2Seq selama lima tahun terakhir. Berurusan dengan bahasa yang miskin sumber daya, efisiensi pelatihan, dan kualitas terjemahan lintas domain disebut-sebut sebagai masalah utama dalam terjemahan mesin saraf. Dalam artikel ulasan ini, kami membahas kekurangannya. Selain itu, artikel ini juga menyoroti bagaimana model multibahasa dan tanpa pengawasan telah meningkatkan performa terjemahan mesin.

References

D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” CoRR, vol. abs/1409.0473, 2014, [Online]. Available: https://api.semanticscholar.org/CorpusID:11212020

A. Vaswani et al., “Attention is All you Need,” in Neural Information Processing Systems, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:13756489

I. Sutskever, O. Vinyals, and Q. Le, “Sequence to Sequence Learning with Neural Networks,” Adv. Neural Inf. Process. Syst., vol. 4, 2014.

S. Tobiyama, Y. Yamaguchi, H. Hasegawa, H. Shimada, M. Akiyama, and T. Yagi, “A method for estimating process maliciousness with Seq2Seq model,” in 2018 International Conference on Information Networking (ICOIN), 2018, pp. 255–260. doi: 10.1109/ICOIN.2018.8343120.

Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. Le, and R. Salakhutdinov, “Transformer-XL: Attentive Language Models beyond a Fixed-Length Context,” Oct. 2019, pp. 2978–2988. doi: 10.18653/v1/P19-1285.

H.-I. Liu and W.-L. Chen, “X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism,” Appl. Sci., vol. 12, no. 9, 2022, doi: 10.3390/app12094502.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Oct. 2018. doi: 10.48550/arXiv.1810.04805.

A. Radford et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.

T. B. Brown et al., “Language Models are Few-Shot Learners,” ArXiv, vol. abs/2005.1, 2020, [Online]. Available: https://api.semanticscholar.org/CorpusID:218971783

OpenAI, “Pioneering Research on The Path to AGI.” Accessed: Oct. 29, 2024. [Online]. Available: https://openai.com/research/

C. Raffel et al., “Exploring the limits of transfer learning with a unified text-to-text transformer,” J. Mach. Learn. Res., vol. 21, 2020, [Online]. Available: https://arxiv.org/abs/1910.10683

L. Xue et al., “mT5: A massively multilingual pre-trained text-to-text transformer,” 2021. [Online]. Available: https://arxiv.org/abs/2010.11934

A. Conneau, A. Baevski, R. Collobert, A. Mohamed, and M. Auli, “Unsupervised Cross-lingual Representation Learning for Speech Recognition,” Oct. 2020. doi: 10.48550/arXiv.2006.13979.

G. Lample, M. Ott, A. Conneau, L. Denoyer, and M. Ranzato, “Phrase-Based & Neural Unsupervised Machine Translation,” Oct. 2018, doi: 10.48550/arXiv.1804.07755.

R. Sennrich, B. Haddow, and A. Birch, “Improving Neural Machine Translation Models with Monolingual Data,” Oct. 2015, doi: 10.48550/arXiv.1511.06709.

Published
2025-04-12