Multi-modal highlight detection in broadcast audio: a deep learning approach for event recognition in sports and eSports

Costa, Nuno; Oliveira, António; Lobo, Armindo; Teixeira, Ricardo; Fernandes, Duarte; Rodrigues, Ricardo; Gouveia, Emanuel

http://hdl.handle.net/10400.14/57581

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
145665240.pdf		963.54 KB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Resumo(s)

The detection of highlights in broadcast streams is essential for enhancing User Experience (UX) through automated summaries and efficient content retrieval. This is particularly relevant for live streaming environments common in sports and eSports, where audiences demand near real-time analysis. This paper presents a benchmark of models for highlight detection in broadcast audio, validated on the SoccerNet dataset but applicable to general competitive gaming streams. We propose a novel multi-modal architecture combining high-level semantic audio features (YAMNet) with Natural Language Processing (NLP) of transcribed commentary (analogous to eSports shoutcasting). Results show that fusing audio event detection with semantic text analysis significantly outperforms uni-modal baselines. The proposed framework offers a computationally efficient solution for AI-based broadcasting technologies, enabling scalable automation for content creators and improved viewer experiences.

Palavras-chave

AI-based sports technologies Audio event detection Broadcast stream automation Machine learning for real-time analysis Multi-modal deep learning

URI

http://hdl.handle.net/10400.14/57581

Editora

Science and Technology Publications, Lda

DOI

10.5220/0014585200004052

Coleções

CEFH - Livros e Partes de Livros / Books and Books Parts

Licença CC

cclicense-by-nc-nd

Métricas Alternativas

Ver registo completo