Nagy nyelvi modellek és beszédfelismerési modellek integrálása (Integration of LLM's and speech recognition models)
Converting natural speech into text remains a significant challenge when requirements include speaker separation, punctuation, named entity tagging, recognition of foreign language expressions, and high accuracy even in noisy environments. While the application of classical/neural language models is fundamental in this domain, the use of (very) large language models (LLMs, e.g., GPT-4, ChatGPT, LLAMA, BARD, etc.) is far from straightforward. The task for the student is to explore both direct applications of LLMs in supporting speech-to-text conversion and their use in post-processing (e.g., correction). This topic can be extended to a thesis project - and beyond.
Converting natural speech into text remains a significant challenge when requirements include speaker separation, punctuation, named entity tagging, recognition of foreign language expressions, and high accuracy even in noisy environments. While the application of classical/neural language models is fundamental in this domain, the use of (very) large language models (LLMs, e.g., GPT-4, ChatGPT, LLAMA, BARD, etc.) is far from straightforward. The task for the student is to explore both direct applications of LLMs in supporting speech-to-text conversion and their use in post-processing (e.g., correction). This topic can be extended to a thesis project - and beyond.
Kulcsszavak: LLM, deep learning, beszéd-szöveg átalakítás
Department of Telecommunications and Articicial Intelligence (TMIT) Budapest University of Technology and Economics (BME) H-1117, Budapest, Magyar tudósok krt. 2, HUNGARY tel: +36 (1) 463-2448; fax: +36 (1) 463-3107 email: titkarsag@tmit.bme.hu