Github repository: https://github.com/torchdr/torchdr/.
Documentation: https://torchdr.github.io/dev/.
TorchDR
is an open-source dimensionality reduction (DR) library using PyTorch
. Its goal is to accelerate the development of new DR methods by providing a common simplified framework.
DR aims to construct a low-dimensional representation (or embedding) of an input dataset that best preserves its geometry encoded via a pairwise affinity matrix . To this end, DR methods optimize the embedding such that its associated pairwise affinity matches the input affinity. TorchDR
provides a general framework for solving problems of this form. Defining a DR algorithm solely requires choosing or implementing an Affinity
object for both input and embedding as well as an objective function.
Benefits of TorchDR
include:
Modularity | All of it is written in python in a highly modular way, making it easy to create or transform components. |
Speed | Supports GPU acceleration, sparsity and batching strategies with contrastive learning techniques. |
Memory efficiency | Relies on KeOps [19] symbolic tensors to avoid memory overflows. |
Compatibility | Implemented methods are fully compatible with the scikit-learn [21] API and torch [20] ecosystem. |
TorchDR
offers a user-friendly API similar to scikit-learn. It seamlessly accepts both NumPy arrays and PyTorch tensors as input, ensuring that the output matches the type and backend of the input.
from sklearn.datasets import fetch_openml
from torchdr import PCA, TSNE
x = fetch_openml("mnist_784").data.astype("float32")
x_ = PCA(n_components=50).fit_transform(x)
z = TSNE(perplexity=30).fit_transform(x_)
TorchDR
enables GPU acceleration without memory limitations thanks to the KeOps
library. This can be easily enabled as follows:
z_gpu = TSNE(perplexity=30, device="cuda", keops=True).fit_transform(x_)
For additional examples, visit the examples directory.
TorchDR
features a wide range of affinity matrices which can then be used as a building block for DR algorithms. It includes:
- Usual affinities such that scalar product, Gaussian and Student kernels.
- Self-tuning affinities [22].
- Doubly stochastic affinities with entropic [5] [6] [7] [16] and quadratic [10] projections.
- Adaptive affinities with entropy control [1] [4] and its symmetric version [3].
Spectral. TorchDR
provides spectral embeddings calculated via eigenvalue decomposition of the affinities or their Laplacian.
Neighbor Embedding. TorchDR
includes various neighbor embedding methods such as SNE [1], t-SNE [2], SNEkhorn / t-SNEkhorn [3], UMAP [8], LargeVis [13] and InfoTSNE [15].
If you have any questions or suggestions, feel free to open an issue on the issue tracker or contact Hugues Van Assel directly.
If you use TorchDR
in your research, please cite the following reference:
Van Assel H., Courty N., Flamary R., Garivier A., Massias M., Vayer T., Vincent-Cuaz C. TorchDR URL: https://torchdr.github.io/
or in Bibtex format :
@misc{vanassel2024torchdr,
author = {Van Assel, Hugues and Courty, Nicolas and Flamary, Rémi and Garivier, Aurélien and Massias, Mathurin and Vayer, Titouan and Vincent-Cuaz, Cédric},
title = {TorchDR},
url = {https://torchdr.github.io/},
year = {2024}
}
[1] | (1, 2) Geoffrey Hinton, Sam Roweis (2002). Stochastic Neighbor Embedding. Advances in Neural Information Processing Systems 15 (NeurIPS). |
[2] | Laurens van der Maaten, Geoffrey Hinton (2008). Visualizing Data using t-SNE. The Journal of Machine Learning Research 9.11 (JMLR). |
[3] | (1, 2) Hugues Van Assel, Titouan Vayer, Rémi Flamary, Nicolas Courty (2023). SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities. Advances in Neural Information Processing Systems 36 (NeurIPS). |
[4] | Max Vladymyrov, Miguel A. Carreira-Perpinan (2013). Entropic Affinities: Properties and Efficient Numerical Computation. International Conference on Machine Learning (ICML). |
[5] | Richard Sinkhorn, Paul Knopp (1967). Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2), 343-348. |
[6] | Marco Cuturi (2013). Sinkhorn Distances: Lightspeed Computation of Optimal Transport. Advances in Neural Information Processing Systems 26 (NeurIPS). |
[7] | Jean Feydy, Thibault Séjourné, François-Xavier Vialard, Shun-ichi Amari, Alain Trouvé, Gabriel Peyré (2019). Interpolating between Optimal Transport and MMD using Sinkhorn Divergences. International Conference on Artificial Intelligence and Statistics (AISTATS). |
[8] | Leland McInnes, John Healy, James Melville (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426. |
[9] | Yao Lu, Jukka Corander, Zhirong Yang (2019). Doubly Stochastic Neighbor Embedding on Spheres. Pattern Recognition Letters 128 : 100-106. |
[10] | Stephen Zhang, Gilles Mordant, Tetsuya Matsumoto, Geoffrey Schiebinger (2023). Manifold Learning with Sparse Regularised Optimal Transport. arXiv preprint. |
[11] | Ham, J., Lee, D. D., Mika, S., & Schölkopf, B. (2004). A kernel view of the dimensionality reduction of manifolds. In Proceedings of the twenty-first international conference on Machine learning (ICML). |
[12] | Sebastian Damrich, Fred Hamprecht (2021). On UMAP's True Loss Function. Advances in Neural Information Processing Systems 34 (NeurIPS). |
[13] | Tang, J., Liu, J., Zhang, M., & Mei, Q. (2016). Visualizing Large-Scale and High-Dimensional Data. In Proceedings of the 25th international conference on world wide web. |
[14] | Artemenkov, A., & Panov, M. (2020). NCVis: Noise Contrastive Approach for Scalable Visualization. In Proceedings of The Web Conference. |
[15] | Sebastian Damrich, Jan Niklas Böhm, Fred Hamprecht, Dmitry Kobak (2023). From t-SNE to UMAP with contrastive learning. International Conference on Learning Representations (ICLR). |
[16] | Landa, B., Coifman, R. R., & Kluger, Y. (2021). Doubly stochastic normalization of the gaussian kernel is robust to heteroskedastic noise. SIAM journal on mathematics of data science, 3(1), 388-413. |
[17] | Hugues Van Assel, Thibault Espinasse, Julien Chiquet, & Franck Picard (2022). A Probabilistic Graph Coupling View of Dimension Reduction. Advances in Neural Information Processing Systems 35 (NeurIPS). |
[18] | Böhm, J. N., Berens, P., & Kobak, D. (2022). Attraction-Repulsion Spectrum in Neighbor Embeddings. Journal of Machine Learning Research, 23 (JMLR). |
[19] | Charlier, B., Feydy, J., Glaunes, J. A., Collin, F. D., & Durif, G. (2021). Kernel Operations on the GPU, with Autodiff, without Memory Overflows. Journal of Machine Learning Research, 22 (JMLR). |
[20] | Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (NeurIPS). |
[21] | Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of machine Learning research, 12 (JMLR). |
[22] | Max Zelnik-Manor, L., & Perona, P. (2004). Self-Tuning Spectral Clustering. Advances in Neural Information Processing Systems 17 (NeurIPS). |