Real-valued Swin VAE · ×64 compression
SAGE is a Swin-based variational autoencoder that reconstructs music at a ×64 compression rate.
A/B comparison
Same clip, every model. Switch instantly.
Benchmarks
Reconstruction fidelity and perceptual quality against state-of-the-art audio VAEs. Final numbers land with the last training run.
How well the frozen latent space supports downstream semantic tasks. Final numbers land with the last training run.
Subjective evaluation