ICML 2025 review response

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment

Dataset Bias

Table A. MSE and R2 reconstruction accuracy on new datasets
Fig A. Zero-shot concept visualizations on DTD dataset (trained on ImageNet)
Fig B. Zero-shot concept visualizations on CelebA dataset (trained on ImageNet)
Fig C. Firing metrics on DTD and CelebA datasets

Dictionary Size

Fig D. Firing metrics on various dictionary sizes
Table B. Cosine similarity between USAE dictionaries of varying sizes

Model Subset USAEs

Fig E. USAE 2-model firing metrics

Dictionary Stability

Fig F. Concept stability between 3-model USAE training runs
Fig G. Stability vs concept importance over a pair of training runs

Failure Cases

Fig H. Failure cases: entangled or noisy activations