Data Sonification Case Study: Turning Econometric Research into Music with AI and Reaper

I took a 44-page academic paper about how weather affects movie ticket sales and turned it into a 2-minute 40-second musical composition where you can literally hear the data. Four musical layers, 8 automation parameters, 48 bars, all generated from a Python pipeline and mixed in Reaper. This is Momentum Cascade, a data sonification project built by Mulhacen Labs.

What Is Data Sonification?

Data sonification maps numbers to sound. Instead of reading a chart, you hear the data. A rising stock price becomes a rising pitch. Accelerating growth becomes an accelerating rhythm. It is not new (sonar is sonification), but AI and modern audio tools make it practical for research, accessibility, and creative applications.

Aspect	Visualization	Sonification
Sense	Sight	Hearing
Dimensions	2-3 (x, y, color)	6+ (pitch, volume, rhythm, timbre, stereo, density)
Time perception	Static or animated	Naturally temporal
Accessibility	Requires sight	Works for visually impaired
Engagement	Analytical	Emotional + analytical

The Source Data

The source paper is "Something to Talk About: Social Spillovers in Movie Consumption" by Gilchrist and Sands (2016, Journal of Political Economy). The key finding: a positive weather shock on a movie's opening weekend creates a compounding wave of word-of-mouth. By week 6, a $1 shock to opening revenue generates $2.14 in total revenue.

Week	Coefficient	Cumulative Multiplier
1	1.000	1.000
2	0.474	1.474
3	0.269	1.743
4	0.188	1.931
5	0.112	2.043
6	0.096	2.139

All coefficients statistically significant at the 1% level. That cascading pattern is what I wanted to make audible.

The Pipeline

I built a 5-layer pipeline: data extraction (Python + pdfplumber), parameter mapping, MIDI generation, automation control, and mixing/mastering in Reaper.

Step 1: Extract. Python scripts pull the tables and figures from the PDF into structured CSVs (momentum coefficients, viewership decay curves, quality splits).

Step 2: Map. Each data dimension gets assigned to a musical parameter, grounded in The Sonification Handbook (Hermann, Hunt & Neuhoff, 2011). I did not make arbitrary choices. Every mapping follows established perceptual principles.

Step 3: Generate. A custom Python script (raw_midi_generator.py, stdlib only, no dependencies) generates a Type 1 multi-track MIDI file at 480 PPQN. 4 musical tracks + conductor track + automation track.

Step 4: Automate. 8 continuous MIDI CC parameters control effects in real-time: filter cutoff, reverb, delay feedback, EQ, stereo width. Sent via IAC Driver at 32nd-note resolution.

Step 5: Mix. Loaded into Reaper, assigned instruments and effects per track, mixed and mastered to a final stereo file.

The Four Musical Layers

Layer	Data Dimension	Musical Parameter	Effect
Pad	Cumulative multiplier (1.0 to 2.14)	Chord density (1 to 5 voices) + velocity	Texture thickens as the cascade grows
Bass	Week coefficient (1.0 to 0.096)	Pitch (C2 to C3, lower = stronger)	Initial shock hits deep, echoes rise
Echo	Number of active echoes per week	Fragment count (1 to 6 ascending motifs)	Space fills up as word-of-mouth spreads
Pulse	Week progression	Note interval (whole to eighth notes)	Heartbeat accelerates with tension

The piece is in C natural minor at 72 BPM. 6 sections of 8 bars each, one per weekend. It builds from sparse and quiet (week 1) to dense and urgent (week 6), mirroring the compounding cascade in the data.

Technical Specs

Property	Value
Duration	48 bars, ~2 min 40 sec
Tempo	72 BPM
Key	C natural minor (Aeolian)
MIDI resolution	480 PPQN
Automation	8 CC parameters, 32nd-note resolution
Tools	Python, Reaper, IAC Driver
Dependencies	Zero external Python libraries for MIDI generation

Why This Matters

Data sonification is a growing field with applications in scientific research, accessibility (making data available to visually impaired users), financial monitoring, and creative arts. The techniques I used here (parameter mapping, multi-layer composition, automation from data) apply to any dataset.

If you have data that tells a story (and most data does), sonification can make that story felt, not just understood.

About Mulhacen Labs

I'm Barry Faassen, founder of Mulhacen Labs. I build at the intersection of software engineering, AI, and audio. 25+ years of experience across scientific computing (Deltares), geotechnical software (Fugro), and audio plugin development (C++/JUCE). Based in Granada, Spain.

Have a dataset you want to hear? Book a call.