Name: Joint Neural Translation; Classification of Videos for Audio Processing
Start: 2026-05-28T13:30:00+0200
End: 2026-05-28T14:00:00+0200

Schedule as of May 16, 2022 - subject to change

Default Time Zone is CEST - Central European Summer Time
You can change your view to your time zone (look for "Timezone" on the right)

LIVESTREAMS : A and B

ON DEMAND VIDEOS (previous days)

Joint Neural Translation; Classification of Videos for Audio Processing

Thursday May 28, 2026 1:30pm - 2:00pm CEST

Aud 43

A low-parameter-count machine-learning model for
classifying streaming video can enable content-aware
audio/video processing on consumer edge devices with
latency, computational,; battery constraints. In this
paper, we propose a low-compute classification technique
that uses only text metadata from the streaming file
header, enabling near-instantaneous inference without
decoding; analyzing audio or video signals as is
traditionally done. In particular, to support multilingual
platforms such as YouTube, we first apply neural machine
translation as a pre-processing step for the text metadata
; optimize a lightweight neural classifier for a
three-class audio-centric classification taxonomy (movie,
music, dialog/other). Experiments on a mixed-language
YouTube dataset achieve $\approx$90\% classification
accuracy on a test set using a combined translation; a
classification model (with only $\sim22K$ parameters),
demonstrating a globally-scalable approach for robust
classification on the edge.

Authors