Content Intelligence Agency Client

Emotion Classification in TV Content

Built an end-to-end NLP pipeline that turns raw video into timestamped emotion labels across six emotions — a local, transparent alternative to expensive cloud LLM services, with the final Transformer model reaching an F1 of 0.75.

Client

Content Intelligence Agency

Year

2025

Domain

NLP

Stack

PyTorch · Transformers · BERT · RoBERTa · Whisper · AssemblyAI · scikit-learn

Context

The Content Intelligence Agency wanted to understand the emotional arc of television content without leaning on expensive, opaque cloud LLM services. The brief: something local, affordable and interpretable, with balanced emotional representation and room for cultural adaptation. I worked on this in a team of three from BUas, owning the modelling and pipeline engineering.

Approach

The pipeline processes raw video end to end:

  • Speech-to-text — Whisper and AssemblyAI were both evaluated on Word Error Rate; AssemblyAI won on transcription quality and cost. Transcripts were translated to English and cleaned for balanced emotional representation.
  • Modelling — I benchmarked Logistic Regression, SVMs, Naive Bayes and LSTMs against Transformer models (BERT, RoBERTa). The Transformer approach won, reaching F1 0.75.
  • Explainability — attention visualisation showed how the model read emotional context, exposing weaknesses like lost context and misclassified short sentences that then guided improvements.

Outcome

The finished system takes a video and returns structured output — timestamps, transcribed text and emotion labels across happiness, sadness, anger, surprise, fear and disgust — demonstrating that affordable, interpretable emotion analysis is achievable on local infrastructure rather than paid cloud APIs.

This work later fed into a peer-reviewed publication on the structural limits of text-only emotion classification.

What I took from it

Owning the modelling and pipeline across a three-person team taught me to make defensible model choices on evidence — WER for transcription, a shared evaluation for classifiers — rather than reaching for the biggest model by default.