Natural Language Processing Research Themes
This page presents research projects undertaken in the Data Science and AI research group that relate to natural language processing.
We also have a selection of projects in other themes with relevance to Natural Language Processing
Language Identification Using Computer Lip-Reading
Participants
Summary
Visual Language Identification (VLID) is concerned with using the appearance and movement of the mouth to determine the identity of spoken language. VLID has applications where conventional audio-based approaches are ineffective due to acoustic noise, or where an audio signal is unavailable, such as remote surveillance. The main challenge associated with VLID is the speaker-dependency of image-based visual recognition features, which bear little meaningful correspondence between speakers.
)
In this work, we examine the VLID using video individuals reciting the Universal Declaration of Human Rights in their native languages. We use cutting-edge neural network object detection algorithms to track the mouth through time, and we employ 3D Convolutional and Recurrent Neural Networks for this classification task. In our latest work, we obtain impressive classification accuracy of 84.39%, demonstrating that the system can distinguish languages to a good degree, from just 10-seconds of visual speech.
Funding
Publications
[1] Newman, Jacob L. ; Cox, Stephen J. / Language identification using visual features. In: IEEE Transactions on Audio, Speech and Language Processing. 2012 ; Vol. 20, No. 7. pp. 1936-1947.
[2] Newman, Jacob ; Theobald, Barry ; Cox, Stephen. / Limitations of Visual Speech Recognition. Paper presented at International Conference on Auditory-Visual Speech Processing, Hakone, Kanagawa, Japan.
[3] Cox, Stephen ; Harvey, Richard ; Lan, Yuxuan ; Newman, Jacob ; Theobald, Barry-John. / The Challenge of Multispeaker Lip-Reading. Paper presented at International Conference on Auditory-Visual Speech Processing, Queensland, Australia.6 p.
Speech Enhancement
Participants
Summary
Real-world speech processing applications are susceptible to background noise, which reduces the quality and intelligibility of the underlying speech signal and is problematic for speech-based systems such as telephony, hearing aids and robust speech recognition. This project focuses on integrating advanced signal processing techniques with deep learning methods to improve perceptual quality and intelligibility. Speech enhancement techniques commonly use the Fourier Transform within an analysis-synthesis framework, converting signals from the time domain to a complex time-frequency representation. This transformation provides a clear distinction between the magnitude and phase spectra, with several methods effective at estimating clean magnitude from a noisy speech signal through approaches such as time-frequency masking. Recovering the phase spectra has proven more challenging due to the lack of structure in the phase component, and only in very recent years have methods been proposed to address this issue. This work analyses the perceptual effect of phase distortions in speech and proposes advanced strategies for directly and indirectly estimating phase components to further improve speech perception.
)
Publications
[1] Milner, Ben ; Sfeclis, Georgiana-Elena ; Websdale, Danny. / Investigating Imaginary Mask Estimation in Complex Masking for Speech Enhancement. Paper presented at 31st European Signal Processing Conference (EUSIPCO).
Reading Between the Lines: Using Natural Language Processing to Understand Social Media Conversations
Participants
Summary
This project investigates weaponised victimhood (WV) in political discourse - a rhetorical strategy in which speakers frame themselves or their group as morally righteous victims under threat. WV frequently relies on vague or emotionally charged references - such as “they,” “our people,” or “the American way”- that invite audiences to infer meaning without explicit naming. While effective rhetorically, this ambiguity presents a significant challenge for machine learning models.
To address this, the project adopts a layered, linguistically informed approach. We first identify WV instances in political speeches to construct a training dataset. We then decompose WV into its core components, focusing on how group identity is constructed and positioned within text. A custom NER model is developed to extend beyond standard named entities, capturing abstract concepts and collective identities (e.g. “the American dream”) that serve as targets or tools of rhetorical alignment. A novel positioning layer enables the model to infer group roles—for example, in “They want to take away your rights,” “they” is positioned as an oppositional outgroup.
We are currently experimenting with different BERT models across both single-task and multi-task learning architectures, with the aim of applying the model to Reddit discourse via transfer learning.
)