Real-valued Swin VAE · ×64 compression

High-fidelity music,
compressed.

SAGE is a Swin-based variational autoencoder that reconstructs music at a ×64 compression rate.

SCROLL TO DISCOVER

A/B comparison

Hear the difference

Same clip, every model. Switch instantly.

Benchmarks

Objective benchmarks

SAGE against state-of-the-art audio VAEs.

Model & efficiency

Capacity, training data and inference cost.

Reconstruction & perceptual metrics

Reconstruction fidelity and perceptual quality, on two datasets.

Semantic probing metrics

How well the frozen latent space supports downstream semantic tasks, on two datasets.

Subjective evaluation

Take the listening test

A short MUSHRA test.

Start the test