Real-valued Swin VAE · ×64 compression
SAGE is a Swin-based variational autoencoder that reconstructs music at a ×64 compression rate.
A/B comparison
Same clip, every model. Switch instantly.
Benchmarks
SAGE against state-of-the-art audio VAEs.
Capacity, training data and inference cost.
Reconstruction fidelity and perceptual quality, on two datasets.
How well the frozen latent space supports downstream semantic tasks, on two datasets.
Subjective evaluation