A low-parameter-count machine-learning model for classifying streaming video can enable content-aware audio/video processing on consumer edge devices with latency, computational,; battery constraints. In this paper, we propose a low-compute classification technique that uses only text metadata from the streaming file header, enabling near-instantaneous inference without decoding; analyzing audio or video signals as is traditionally done. In particular, to support multilingual platforms such as YouTube, we first apply neural machine translation as a pre-processing step for the text metadata ; optimize a lightweight neural classifier for a three-class audio-centric classification taxonomy (movie, music, dialog/other). Experiments on a mixed-language YouTube dataset achieve $\approx$90\% classification accuracy on a test set using a combined translation; a classification model (with only $\sim22K$ parameters), demonstrating a globally-scalable approach for robust classification on the edge.