ASTRA-sim is a distributed machine learning system simulator developed by Intel, Meta, and Georgia Tech. It enables the systematic study of challenges in modern deep learning systems, allowing for the exploration of bottlenecks and the development of efficient methodologies for large DNN models across diverse future platforms.
The previous version, ASTRA-sim 1.0, is available in the ASTRA-sim-1.0
branch.
Here is a concise visual summary of our simulator:
For a comprehensive understanding of the tool, and to gain insights into its capabilities, please visit our website.
For information on how to use ASTRA-sim, please visit our Wiki.
We are constantly working to improve ASTRA-sim and expand its capabilities. Here are some of the features that are currently under active development:
- Network Backends
- Garnet (for chiplet fabrics)
- Detailed Statistics Report (Network Utilization)
Please note that these features are under active development and, while we aim to have them available as soon as possible, the completion timeline can vary. Check back regularly for updates on the progress of these and other features. This is an open-source project and we also value PRs from the community on features they have added.
We appreciate your interest and support in ASTRA-sim!
For any questions about using ASTRA-sim, you can email the ASTRA-sim User Mailing List: [email protected]
To join the mailing list, please fill out the following form: https://forms.gle/18KVS99SG3k9CGXm6