aws-neuron · aws-trsharma · Sep 14, 2024
@@ -42,6 +42,7 @@ New in this release
 Bug Fixes
 ^^^^^^^^^
 * Fixed compatibility issues for the Linux 6.3 kernel
+* Resolved issue where device reset handling code was not properly checking the failure metric
 
 
 Neuron Driver release [2.16.7.0]

@@ -31,6 +31,19 @@ NEFF Version Runtime Version Range Notes
 2.0          >= 1.6.5.0            Starting support for 2.0 NEFFs 
 ============ ===================== ===================================
 
+Neuron Runtime Library [2.22.14.0]
+---------------------------------
+Date: 09/16/2024
+
+New in this release
+^^^^^^^^^^^^^^^^^^^
+* Improved the inter-node mesh algorithm to scales better for larger number of nodes and larger allreduce problem sizes
+
+Bug fixes
+^^^^^^^^^
+* Implemented a fix that differentiate between out-of-memory (OOM) conditions occurring on the host system versus the device when an OOM event occurs
+* Resolved a performance issue with transpose operations, which was caused by an uneven distribution of work across DMA engines
+
 Neuron Runtime Library [2.21.41.0]
 ---------------------------------
 Date: 07/03/2024

@@ -161,7 +161,18 @@ Description for Each Field
 
   * ``device_mem/``: The amount of memory that Neuron Runtime uses for weights, instructions and DMA rings.
 
-    * This device memory per NeuronCore is further categorized into five types: ``constants/``, ``model_code/``, ``model_shared_scratchpad/``, ``runtime_memory/``, and ``tensors/``. Definitions for these categories can be found in the :ref:`Device Used Memory <neuron_top_device_mem_usage>` section.  Each of these categories has total, present, and peak.
+    * This device memory per NeuronCore is further categorized into five types: ``collectives/``, ``constants/``, ``dma_rings/``, ``driver_memory/``, ``model_code/``, ``model_shared_scratchpad/``, ``nonshared_scratchpad/``, ``notifications/``, ``runtime_memory/``, ``tensors/``, and ``uncategorized/``. Each of these categories has total, present, and peak.
+        * ``collectives`` - amount of device memory used for collective communication between workers
+        * ``constants`` - amount of device memory used for constants (for applications running training) or weights (for applications running inferences)
+        * ``dma_rings`` - amount of device memory used for storing model executable code used for data movements
+        * ``driver_memory`` - amount of device memory used by the Neuron Driver
+        * ``model_code`` - amount of device memory used for storing model executable code
+        * ``model_shared_scratchpad`` - amount of device memory used for the shared model scratchpad, a buffer shared between models on the same Neuron Core used for internal model variables and other auxiliary buffers
+        * ``nonshared_scratchpad`` - amount of device memory used for non-shared model scratchpad, a buffer used by a single model for internal model variables and other auxiliary buffers
+        * ``notifications`` - amount of device memory used to store instruction level trace information used to profile workloads ran on the device
+        * ``runtime_memory`` - amount of device memory used by the Neuron Runtime (outside of the previous categories)
+        * ``tensors`` - amount of device memory used for tensors
+        * ``uncategorized`` - amount of device memory that does not belong in any other catagory in this list
 
   * ``host_mem/``: The amount of memory that Neuron Runtime uses for input and output tensors.