Communicating Sound Through Natural Language

Emanuele Rossi1 · Emanuele Rodolà1,2

1 Sapienza University of Rome 2 Paradigma

Abstract

Natural language is widely used to describe, prompt, and control audio systems, but rarely serves as the representation carrying audio itself. We introduce lexical acoustic coding (LAC), a framework in which pre-trained LLM sender and receiver agents transmit sound through natural language. Under fixed system prompts, the agents write their own analysis and synthesis code, communicating only through a lexical sentence, shared vocabulary, and optional symbolic music structure. The sender analyzes an input waveform into interpretable, non-learned acoustic descriptors, quantizes each with a feature-specific interval vocabulary, and verbalizes the lexical code as English. The receiver parses the sentence back into lexical-acoustic constraints and renders a waveform through closed-loop refinement. The transmitted text serves as both a rich caption and as the transport representation itself. We frame LAC as a finite-rate lossy quantizer, exposing trade-offs between vocabulary size, rate, and fidelity. Experiments on short sounds and symbolic music transfer show that plain text preserves measurable acoustic structure while remaining interpretable, editable, and native to LLM-mediated communication.

Read Paper

This demo

This demo includes side-by-side listening examples of entire songs and individual samples transmitted with LAC and reconstructed at the receiver, alongside their original source audio.

Audio provenance

Respect for original creators is a core part of this demo and of demoscene culture. Every third-party track is credited per item by artist and title, and linked directly to the original AMP or scene.org archive source to preserve clear attribution and provenance.