Emotion in TV Shows

Sep 2025 – Oct 2025
Oct 31, 2025 min read

Project Overview

In this project, created for the Content Intelligence Agency, our team built a complete NLP pipeline that detects and classifies emotions expressed in television content. The goal was to design a local, affordable, and transparent alternative to expensive cloud-based large language model solutions, while ensuring balanced dataset representation, interpretability, and cultural adaptability.

Technical Implementation

Speech-to-Text Pipeline

The system processes raw video files through several stages. It first converts speech to text using both Whisper and AssemblyAI, which were evaluated using Word Error Rate (WER) metrics. AssemblyAI was selected for its superior transcription quality and lower operational cost. The resulting transcripts were translated into English and cleaned to ensure balanced emotional representation.

Model Architecture

For modeling, we compared several approaches including Logistic Regression, Support Vector Machines, Naive Bayes, LSTMs, and Transformer models such as BERT and RoBERTa. The Transformer architecture achieved the best performance, with the final model reaching an F1 score of 0.75.

Explainable AI Integration

We incorporated Explainable AI tools to visualize the model’s attention and better understand how it interprets emotional context in text. This step helped identify weaknesses, such as loss of context or misclassification of short sentences, and guided future improvements.

Results & Impact

The final pipeline takes a video as input and outputs structured data with timestamps, transcribed text, and emotion labels across six core emotions: happiness, sadness, anger, surprise, fear, and disgust. The project demonstrates how modern NLP techniques can bring affordable and interpretable emotion analysis to creative and production environments.

Technologies Used

  • NLP: Transformers (BERT, RoBERTa), Transfer Learning
  • Speech-to-Text: Whisper, AssemblyAI
  • Machine Learning: PyTorch, Scikit-learn
  • Data Processing: Pandas, NumPy
  • Explainability: Attention visualization, Grad-CAM

Team

This was a collaborative project with 2 other contributors from Breda University of Applied Sciences.