» Today: 28/03/2024
Technology
Recognition of speech in different languages from lip movements
In recent years, deep learning techniques have made remarkable advances in many language and image processing tasks. These advances include visual speech recognition (VSR) – a task that requires determining the content of speech simply by analyzing lip movements.


Photo: analyticsindiamag.com

Although some deep learning algorithms to perform this VSR task have achieved very promising results, however, they are almost exclusively trained for English speech recognition, since most datasets Coaches currently do not have voices of other languages. This makes the algorithm's potential users limited to the group of people who live or work in an English-speaking environment.

Recently, researchers at Imperial College London have developed a model that can perform VSR tasks in many different languages. In a new paper published in the journal Nature Machine Intelligence, the authors show that the new model outperforms some of the previous models (although the previous models were trained on larger datasets). lots of).

“During the course of my doctoral thesis, I researched a number of topics, such as how to combine visual information with sound for speech recognition, as well as how to recognize speech by images in an effective way. independent of the participant's head position. And I realized that most of the literature out there only deals with spoken English,” said Pingchuan Ma, who received a doctorate from Imperial College and is the lead author of the paper.

Therefore, the study of Ma and colleagues aims to train a deep learning model capable of recognizing speech in various languages from the speaker's lip movements, and then compare the performance of the model. this new model with models that are trained to recognize English speech. According to the authors, the new model is similar to the one introduced by other research groups before, but the difference lies in the fact that the new model has optimized some hyperparameters, and at the same time enhanced the set of parameters. data (by adding slightly modified, aggregated versions of the data) and using additional loss functions.

“Our study has shown that similar models can be used to train VSR models with other languages,” explains Ma. “Our model takes raw images as input without extracting any features, and then automatically learns what useful features need to be extracted from these images to complete. VSR tasks”.

In initial assessments, the team's new model performed very well, outperforming other VSR models that were trained on much larger data sets. However, as expected, this model performed less well than English speech recognition models, mainly because the datasets available for training were less than the English data. .

Ma and his colleagues have shown that careful design of deep learning models can make them most effective in VSR tasks, rather than simply using larger model versions or collect more training data. This may lead to a change in research direction to improve future VSR models.

“One of the main areas of research that interests me is how to combine VSR models with today’s (audio-only) speech recognition,” added Ma. “I am particularly interested in how models can understand which model they should rely on depending on noise conditions. In other words, in noisy environments, the audiovisual model should rely more on visual information. In contrast, when the mouth area of the speaker is obscured, the model needs to be more dependent on sound. However, current models are essentially 'frozen' after training and cannot adapt to such changes in the environment."

ntptuong
Follow www.khoahocphattrien.vn
Print  
Top
© Copyright 2010, Information and Documentation Center under Can Tho Science and Technology Department
Address: 118/3 Tran Phu street, Cai Khe ward, Ninh Kieu district, Can Tho city Tel: 0710 3824031 - Fax: 0710 3812352 Email: tttlcantho@cantho.gov.vn License No. 200/GP-TTÐT dated November 11st, 2011 by Agency for Radio, Television and Electronic Information under Minister of Information and Communication