Releases: ROCm/ROCm
ROCm 6.3.1 Release
ROCm 6.3.1 release notes
The release notes provide a summary of notable changes since the previous ROCm release.
If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a
display connected, continue to use ROCm 6.2.3. See the [Use ROCm on Radeon GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/index.html)
documentation to verify compatibility and system requirements.
Release highlights
The following are notable new features and improvements in ROCm 6.3.1. For changes to individual components, see
Detailed component changes.
Per queue resiliency for Instinct MI300 accelerators
The AMDGPU driver now includes enhanced resiliency for misbehaving applications on AMD Instinct MI300 accelerators. This helps isolate the impact of misbehaving applications, ensuring other workloads running on the same accelerator are unaffected.
ROCm Runfile Installer
ROCm 6.3.1 introduces the ROCm Runfile Installer, with initial support for Ubuntu 22.04. The ROCm Runfile Installer facilitates ROCm installation without using a native Linux package management system, with or without network or internet access. For more information, see the ROCm Runfile Installer documentation.
ROCm documentation updates
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.
-
Added documentation on training a model with ROCm Megatron-LM. AMD offers a Docker image for MI300X accelerators
containing essential components to get started, including ROCm libraries, PyTorch, and Megatron-LM utilities. See
Training a model using ROCm Megatron-LM
to get started.The new ROCm Megatron-LM training Docker accompanies the ROCm vLLM inference
Docker
as a set of ready-to-use containerized solutions to get started with using ROCm
for AI. -
Updated the Instinct MI300X workload tuning
guide with more current optimization
strategies. The updated sections include guidance on vLLM optimization, PyTorch TunableOp, and hipBLASLt tuning. -
HIP graph-safe libraries operate safely in HIP execution graphs. HIP graphs are an alternative way of executing tasks on a GPU that can provide performance benefits over launching kernels using the standard method via streams. A topic that shows whether a ROCm library is graph-safe has been added.
-
The Device memory topic in the HIP memory management section has been updated.
-
The HIP documentation has expanded with new resources for developers:
Operating system and hardware support changes
ROCm 6.3.1 adds support for Debian 12 (kernel: 6.1). Debian is supported only on AMD Instinct accelerators. See the installation instructions at Debian native installation.
ROCm 6.3.1 enables support for AMD Instinct MI325X accelerator. For more information, see AMD Instinct™ MI325X Accelerators.
See the Compatibility
matrix
for more information about operating system and hardware compatibility.
ROCm components
The following table lists the versions of ROCm components for ROCm 6.3.1, including any version
changes from 6.3.0 to 6.3.1. Click the component's updated version to go to a list of its changes.
Click {fab}github
to go to the component's source code on GitHub.
Category | Group | Name | Version | |
---|---|---|---|---|
Libraries | Machine learning and computer vision | Composable Kernel | 1.1.0 | |
MIGraphX | 2.11.0 | |||
MIOpen | 3.3.0 | |||
MIVisionX | 3.1.0 ⇒ 3.1.0 | |||
rocAL | 2.1.0 | |||
rocDecode | 0.8.0 | |||
rocJPEG | 0.6.0 | |||
rocPyDecode | 0.2.0 | |||
RPP | 1.9.1 | |||
Communication | RCCL | 2.21.5 ⇒ 2.21.5 | ||
Math | hipBLAS | 2.3.0 | ||
hipBLASLt | 0.10.0 | |||
hipFFT | 1.0.17 | |||
ROCm 6.3.0 Release
ROCm 6.3.0 release notes
The release notes provide a summary of notable changes since the previous ROCm release.
If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a
display connected, continue to use ROCm 6.2.3. See the [Use ROCm on Radeon
GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/index.html)
documentation to verify compatibility and system requirements.
Release highlights
The following are notable new features and improvements in ROCm 6.3.0. For changes to individual components, see
Detailed component changes.
rocJPEG added
ROCm 6.3.0 introduces the rocJPEG library to the ROCm software stack. rocJPEG is a high performance
JPEG decode SDK for AMD GPUs. For more information, see the rocJPEG
documentation.
ROCm Compute Profiler and ROCm Systems Profiler
These ROCm components have been renamed to reflect their new direction as part of the ROCm software
stack.
-
ROCm Compute Profiler, formerly Omniperf. For more information, see the ROCm Compute Profiler
documentation and
/~https://github.com/ROCm/rocprofiler-compute on GitHub. -
ROCm Systems Profiler, formerly Omnitrace. For more information, see the ROCm Systems Profiler
documentation and
/~https://github.com/ROCm/rocprofiler-systems on GitHub.
For future compatibility, the Omnitrace project is available at /~https://github.com/ROCm/omnitrace.
See the Omnitrace documentation.Update any references to the old binary names `omniperf` and `omnitrace` to ensure compatibility with the new `rocprof-compute` and `rocprof-sys-*` binaries. This might include updating environment variables, commands, and paths as needed to avoid disruptions to your profiling or tracing workflows. See [ROCm Compute Profiler 3.0.0](#rocm-compute-profiler-3-0-0) and [ROCm Systems Profiler 0.1.0](#rocm-systems-profiler-0-1-0).
SHARK AI toolkit for high-speed inferencing and serving introduced
SHARK is an open-source toolkit for high-performance serving of popular generative AI and large
language models. In its initial release, SHARK contains the Shortfin high-performance serving
engine, which is the SHARK inferencing
library that includes example server applications for popular models.
This initial release includes support for serving the Stable Diffusion XL model on AMD Instinct™
MI300 devices using ROCm. See the SHARK release
page on GitHub to get started.
PyTorch 2.4 support added
ROCm 6.3.0 adds support for PyTorch 2.4. See the Compatibility
matrix
for the complete list of PyTorch versions tested for compatibility with ROCm.
Flash Attention kernels in Triton and Composable Kernel (CK) added to Transformer Engine
Composable Kernel-based and Triton-based Flash Attention kernels have been integrated into
Transformer Engine via the ROCm Composable Kernel and AOTriton libraries. The
Transformer Engine can now optionally select a flexible and optimized Attention
solution for AMD GPUs. For more information, see Fused Attention Backends on
ROCm
on GitHub.
HIP compatibility
HIP now includes the hipStreamLegacy
API. It's equivalent to NVIDIA cudaStreamLegacy
. For more
information, see Global enum and
defines
in the HIP runtime API documentation.
Unload active amdgpu-dkms module without a system reboot
On Instinct MI200 and MI300 systems, you can now unload the active amdgpu-dkms
modules, and reinstall
and reload newer modules without a system reboot. If the new dkms
package includes newer firmware
components, the driver will first reset the device and then load newer firmware components.
ROCm Offline Installer Creator updates
The ROCm Offline Installer Creator 6.3 introduces a new feature to uninstall the previous version of
ROCm on the non-connected target system before installing a new version. This feature is only supported
on the Ubuntu distribution. See the ROCm Offline Installer
Creator
documentation for more information.
OpenCL ICD loader separated from ROCm
The OpenCL ICD loader is no longer delivered as part of ROCm, and must be installed separately
as part of the ROCm installation
process. For Ubuntu and RHEL
installations, the required package is installed as part of the setup described in
Prerequisites.
In other supported Linux distributions like SUSE, the required package must be installed in separate steps, which are included in the installation instructions.
Because the OpenCL path is now separate from the ROCm installation for versioned and multi-version
installations, you must manually define the LD_LIBRARY_PATH
to point to the ROCm
installation library as described in the Post-installation
instructions.
If the LD_LIBRARY_PATH
is not set as needed for versioned or multi-version installations, OpenCL
applications like clinfo
will fail to run and return an error.
ROCT Thunk Interface integrated into ROCr runtime
The ROCT Thunk Interface package is now integrated into the ROCr runtime. As a result, the ROCT package
is no longer included as a separate package in the ROCm software stack.
ROCm documentation updates
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a
wider variety of user needs and use cases.
-
Documentation for Tensile is now available. Tensile is a library that creates
benchmark-driven backend implementations for GEMMs, serving primarily as a
backend component of rocBLAS. See the Tensile
documentation. -
New documentation has been added to explain the advantages of enabling the IOMMU in passthrough
mode for Instinct accelerators and Radeon GPUs. See Input-Output Memory Management
Unit. -
The HIP documentation has been updated and includes the following new topics:
-
The following HIP documentation topics have been updated:
-
The following HIP documentation topics have been reorganized to improve usability:
Operating system and hardware support changes
ROCm 6.3.0 adds support for the following operating system and kernel versions:
- Ubuntu 24.04.2 (kernel: 6.8 [GA], 6.11 [HWE])
- Ubuntu 22.04.5 (kernel: 5.15 [GA], 6.8 [HWE])
- RHEL 9.5 (kernel: 5.14.0)
- Oracle Linux 8.10 (kernel: 5.15.0)
See installation instructions at ROCm installation for
Linux.
ROCm 6.3.0 marks the end of support (EoS) for:
- Ubuntu 24.04.1
- Ubuntu 22.04.4
- RHEL 9.3
- RHEL 8.9
- Oracle Linux 8.9
Hardware support r...
ROCm 6.2.4 Release
ROCm 6.2.4 release notes
The release notes provide a summary of notable changes since the previous ROCm release.
If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a
display connected, continue to use ROCm 6.2.3. See the [Use ROCm on Radeon
GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/index.html)
documentation to verify compatibility and system requirements.
Release highlights
The following are notable new features and improvements in ROCm 6.2.4. For changes to individual components, see
Detailed component changes.
ROCm documentation updates
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for
a wider variety of user needs and use cases.
-
Added a new GPU cluster networking guide. See
Cluster network performance validation for AMD Instinct accelerators.
This documentation provides guidelines on validating network configurations
in single-node and multi-node environments to attain optimal speed and bandwidth
in AMD Instinct-powered clusters. -
Updated the HIP runtime documentation.
-
Added a new section on how to use HIP graphs.
-
Added a new section about the Stream ordered memory allocator (SOMA).
-
Updated the Porting CUDA driver API section.
-
-
Updated the Post-installation instructions
with guidance on using theupdate-alternatives
utility and environment modules to help you manage multiple ROCm
versions and streamline PATH configuration. -
Updated the LLM inference performance validation on AMD Instinct
MI300X
documentation with more detailed guidance, new models, and thefloat8
data type.
Operating system and hardware support changes
ROCm 6.2.4 adds support for the AMD Radeon PRO V710 GPU for compute workloads. See
Supported GPUs
for more information.
This release maintains the same operating system support as 6.2.2.
ROCm components
The following table lists the versions of ROCm components for ROCm 6.2.4, including any version changes from 6.2.2 to 6.2.4.
Click the component's updated version to go to a detailed list of its changes. Click to go to the component's source code on GitHub.
Category | Group | Name | Version | |
---|---|---|---|---|
Libraries | Machine learning and computer vision | Composable Kernel | 1.1.0 | |
MIGraphX | 2.10 | |||
MIOpen | 3.2.0 | |||
MIVisionX | 3.0.0 | |||
rocAL | 2.0.0 | |||
rocDecode | 0.6.0 | |||
rocPyDecode | 0.1.0 | |||
RPP | 1.8.0 | |||
Communication | RCCL | 2.20.5 | ||
Math | hipBLAS | 2.2.0 | ||
hipBLASLt | 0.8.0 | |||
hipFFT | 1.0.15 ⇒ 1.0.16 | |||
hipfort | 0.4.0 | |||
hipRAND | 2.11.0 ⇒ 2.11.1 | |||
hipSOLVER | 2.2.0 | |||
hipSPARSE | 3.1.1 | |||
hipSPARSELt | 0.2.1 |
ROCm 6.2.2 Release
ROCm 6.2.2 release notes
These release notes provide a summary of notable changes since the previous ROCm release.
As ROCm 6.2.2 was released shortly after 6.2.1, the changes between these versions
are minimal. For a comprehensive overview of recent updates,
refer to the ROCm 6.2.1 release notes.
The Compatibility matrix
provides the full list of supported hardware, operating systems, ecosystems, third-party components, and ROCm components
for each ROCm release.
Release notes for previous ROCm releases are available in earlier versions of the documentation.
See the ROCm documentation release history.
Release highlights
The following is a significant fix introduced in ROCm 6.2.2.
Fixed Instinct MI300X error recovery failure
Improved the reliability of AMD Instinct MI300X accelerators in scenarios involving
uncorrectable errors. Previously, error recovery did not occur as expected,
potentially leaving the system in an undefined state. This fix ensures that error
recovery functions as expected, maintaining system stability.
See the original issue noted in the ROCm 6.2.1 release notes.
ROCm 6.2.1 Release
ROCm 6.2.1 release notes
The release notes provide a summary of notable changes since the previous ROCm release.
The Compatibility matrix
provides the full list of supported hardware, operating systems, ecosystems, third-party components, and ROCm components for each ROCm release.
Release notes for previous ROCm releases are available in earlier versions of the documentation.
See the ROCm documentation release history.
Release highlights
The following are notable new features and improvements in ROCm 6.2.1. For changes to individual components, see Detailed component changes.
rocAL major version change
The new version of rocAL introduces many new features, but does not modify any of the existing public API functions. However, the version number was incremented from 1.3 to 2.0.
Applications linked to version 1.3 must be recompiled to link against version 2.0.
See the rocAL detailed changes for more information.
New support for FBGEMM (Facebook General Matrix Multiplication)
As of ROCm 6.2.1, ROCm supports Facebook General Matrix Multiplication (FBGEMM) and the related FBGEMM_GPU library.
FBGEMM is a low-precision, high-performance CPU kernel library for convolution and matrix multiplication. It is used for server-side inference and as a back end for PyTorch quantized operators. FBGEMM_GPU includes a collection of PyTorch GPU operator libraries for training and inference. For more information, see the ROCm Model acceleration libraries guide
and PyTorch's FBGEMM GitHub repository.
ROCm Offline Installer Creator changes
The ROCm Offline Installer Creator 6.2.1 introduces several new features and improvements including:
- Logging support for create and install logs
- More stringent checks for Linux versions and distributions
- Updated prerequisite repositories
- Fixed CTest issues
ROCm documentation changes
There have been no changes to supported hardware or operating systems from ROCm 6.2.0 to ROCm 6.2.1.
- The Programming Model Reference and Understanding the Programming Model topics in HIP have been consolidated into one topic,
HIP programming model (conceptual). - The HIP virtual memory management and HIP virtual memory management API topics have been added.
The ROCm documentation, like all ROCm projects, is open source and available on GitHub. To contribute to ROCm documentation, see the [ROCm documentation contribution guidelines](https://rocm.docs.amd.com/en/latest/contribute/contributing.html).
Operating system and hardware support changes
There are no changes to supported hardware or operating systems from ROCm 6.2.0 to ROCm 6.2.1.
See the Compatibility matrix for the full list of supported operating systems and hardware architectures.
ROCm components
The following table lists the versions of ROCm components for ROCm 6.2.1, including any version changes from 6.2.0 to 6.2.1.
Click the component's updated version to go to a detailed list of its changes. Click to go to the component's source code on GitHub.
Category | Group | Name | Version |
---|---|---|---|
Libraries | Machine learning and computer vision | Composable Kernel | 1.1.0 |
MIGraphX | 2.10 | ||
MIOpen | 3.2.0 | ||
MIVisionX | 3.0.0 | ||
rocAL | 1.0.0 ⇒ 2.0.0 | ||
rocDecode | 0.6.0 | ||
rocPyDecode | 0.1.0 | ||
RPP | 1.8.0 | ||
Communication | RCCL | 2.20.5 ⇒ 2.20.5 | |
Math | hipBLAS | 2.2.0 | |
hipBLASLt | 0.8.0 | ||
hipFFT | 1.0.15 | ||
hipfort | 0.4.0 | ||
hipRAND | 2.11.0 | ||
hipSOLVER | 2.2.0 | ||
hipSPARSE | 3.1.1 | ||
hipSPARSELt | 0.2.1 | ||
rocALUTION | 3.2.0 | ||
rocBLAS | 4.1.2 ⇒ 4.2.1 | ||
rocFFT | 1.0.28 ⇒ 1.0.29 | ||
rocRAND | 3.1.0 | ||
rocSOLVER | 3.26.0 | ||
rocSPARSE | 3.2.0 | ||
rocWMMA | 1.5.0 | ||
Tensile | 4.41.0 | ||
ROCm 6.2.0 Release
ROCm 6.2.0 release notes
The release notes provide a comprehensive summary of changes since the previous ROCm release.
-
Release highlights
-
Operating system and hardware support changes
-
ROCm components versioning
-
Detailed component changes
-
ROCm known issues
-
ROCm upcoming changes
The Compatibility matrix
provides an overview of operating system, hardware, ecosystem, and ROCm component support across ROCm releases.
Release notes for previous ROCm releases are available in earlier versions of the documentation.
See the ROCm documentation release history.
Release highlights
This section introduces notable new features and improvements in ROCm 6.2. See the
Detailed component changes for individual component changes.
New components
ROCm 6.2.0 introduces the following new components to the ROCm software stack.
-
Omniperf -- A kernel-level profiling tool for machine learning and high-performance computing (HPC) workloads
running on AMD Instinct accelerators. Omniperf offers comprehensive profiling and advanced analysis via command line
or a GUI dashboard. For more information, see
Omniperf. -
Omnitrace -- A multi-purpose analysis tool for profiling and tracing applications running on the CPU or the CPU and GPU.
It supports dynamic binary instrumentation, call-stack sampling, causal profiling, and other features for determining
which function and line number are executing. For more information, see
Omnitrace. -
rocPyDecode -- A tool to access rocDecode APIs in Python. It connects Python and C/C++ libraries,
enabling function calling and data passing between the two languages. Therocpydecode.so
library, a wrapper, uses
rocDecode APIs written primarily in C/C++ within Python. For more information, see
rocPyDecode. -
ROCprofiler-SDK -- ROCprofiler-SDK is a profiling and tracing library for HIP and ROCm applications on AMD ROCm software
used to identify application performance bottlenecks and optimize their performance. The new APIs add restrictions for more
efficient implementations and improved thread safety. A new window restriction specifies the services the tool can use.
ROCprofiler-SDK also provides a tool library to help you write your tool implementations.rocprofv3
uses this tool library
to profile and trace applications for performance bottlenecks. Examples include API tracing, kernel tracing, and so on.
For more information, see ROCprofiler-SDK.ROCprofiler-SDK for ROCm 6.2.0 is a beta release and subject to change.
ROCm Offline Installer Creator introduced
The new ROCm Offline Installer Creator creates an installation package for a preconfigured setup of ROCm, the AMDGPU
driver, or a combination of the two on a target system without network access. This new tool customizes
multiple unique configurations for use when installing ROCm on a target. Other notable features include:
-
A lightweight, easy-to-use user interface for configuring the creation of the installer
-
Support for multiple Linux distributions
-
Installer support for different ROCm releases and specific ROCm components
-
Optional driver or driver-only installer creation
-
Optional post-install preferences
-
Lightweight installer packages, which are unique to the preconfigured ROCm setup
-
Resolution and inclusion of dependency packages for offline installation
For more information, see
ROCm Offline Installer Creator.
Math libraries default to Clang instead of HIPCC
The default compiler used to build the math libraries on Linux changes from hipcc
to amdclang++
.
Appropriate compiler flags are added to ensure these compilations build correctly. This change only applies when
building the libraries. Applications using the libraries can continue to be compiled using hipcc
or amdclang++
as
described in ROCm compiler reference.
The math libraries can also be built with hipcc
using any of the previously available methods (for example, the CXX
environment variable, the CMAKE_CXX_COMPILER
CMake variable, and so on). This change shouldn't affect performance or
functionality.
Framework and library changes
This section highlights updates to supported deep learning frameworks and notable third-party library optimizations.
Additional PyTorch and TensorFlow support
ROCm 6.2.0 supports PyTorch versions 2.2 and 2.3 and TensorFlow version 2.16.
See Installing PyTorch for ROCm
and Installing TensorFlow for ROCm
for installation instructions.
Refer to the
Third-party support matrix
for a comprehensive list of third-party frameworks and libraries supported by ROCm.
Optimized framework support for OpenXLA
PyTorch for ROCm and TensorFlow for ROCm now provide native support for OpenXLA. OpenXLA is an open-source ML compiler
ecosystem that enables developers to compile and optimize models from all leading ML frameworks. For more information, see
Installing PyTorch for ROCm
and Installing TensorFlow for ROCm.
PyTorch support for Autocast (automatic mixed precision)
PyTorch now supports Autocast for recurrent neural networks (RNNs) on ROCm. This can help to reduce computational
workloads and improve performance. Based on the information about the magnitude of values, Autocast can substitute the
original float32
linear layers and convolutions with their float16
or bfloat16
variants. For more information, see
Automatic mixed precision.
Memory savings for bitsandbytes model quantization
The ROCm-aware bitsandbytes library is a lightweight Python wrapper around HIP
custom functions, in particular 8-bit optimizer, matrix multiplication, and 8-bit and 4-bit quantization functions.
ROCm 6.2.0 introduces the following bitsandbytes changes:
Int8
matrix multiplication is enabled, and it includes the following functions:extract-outliers
– extracts rows and columns that have outliers in the inputs. They’re later used for matrix multiplication without quantization.transform
– row-to-column and column-to-row transformations are enabled, along with transpose operations. These are used before and after matmul computation.igemmlt
– new function for GEMM computation A*B^T. It uses
hipblasLtMatMul and performs 8-bit GEMM operations.dequant_mm
– dequantizes output matrix to original data type using scaling factors from vector-wise quantization.
- Blockwise quantization – input tensors are quantized for a fixed block size.
- 4-bit quantization and dequantization functions – normalized
Float4
quantization, quantile estimation, and quantile quantization functions are enabled. - 8-bit and 32-bit optimizers are enabled.
These functions are included in bitsandbytes. They are not part of ROCm. However, ROCm 6.2.0 has enabled the fixes and
features to run them.
For more information, see Model quantization techniques.
Improved vLLM support
ROCm 6.2.0 enhances vLLM support for inference on AMD Instinct accelerators, adding
capabilities for FP16
/BF16
precision for LLMs, and FP8
support for Llama.
ROCm 6.2.0 adds support for the following vLLM features:
-
MP: Multi-GPU execution. Choose between MP and Ray using a flag. To set it to MP,
use--distributed-executor-backed=mp
. The default depends on the commit in flux. -
FP8 KV cache: Enhances computational efficiency and performance by significantly reducing memory usage and bandwidth requirements.
The QUARK quantizer currently only supports Llama. -
Triton Flash Attention:
ROCm supports both Triton and Composable Kernel Flash Attention 2 in vLLM. The default is Triton, but you can change this
setting using theVLLM_USE_FLASH_ATTN_TRITON=False
environment variable. -
PyTorch TunableOp:
Improved optimization and tuning of GEMMs. It requires Docker with PyTorch 2.3 or later.
For more information about enabling these features, see
vLLM inference.
ROCm has a vLLM branch for experimental features. This includes performance improvements, accuracy, and correctness testing.
These features include:
- FP8 GEMMs: To improve the performance of FP8 quantization, work is underway on tuning the GEMM using the shapes used
in the model's execution. It only supp...
ROCm 6.1.2 Release
ROCm 6.1.2 release notes
ROCm 6.1.2 includes enhancements to SMI tools and improvements to some libraries.
OS support
ROCm 6.1.2 has been tested against a pre-release version of Ubuntu 22.04.5 (kernel: 5.15 [GA], 6.8 [HWE]).
AMD SMI
AMD SMI for ROCm 6.1.2
Additions
- Added process isolation and clean shader APIs and CLI commands.
amdsmi_get_gpu_process_isolation()
amdsmi_set_gpu_process_isolation()
amdsmi_set_gpu_clear_sram_data()
- Added the
MIN_POWER
metric to output provided byamd-smi static --limit
.
Optimizations
- Updated the
amd-smi monitor --pcie
output to prevent delays with themonitor
command.
Changes
- Updated
amismi_get_power_cap_info
to return values in uW instead of W. - Updated Python library return types for
amdsmi_get_gpu_memory_reserved_pages
andamdsmi_get_gpu_bad_page_info
. - Updated the output of
amd-smi metric --ecc-blocks
to show counters available from blocks.
Fixes
amdsmi_get_gpu_board_info()
no longer returns junk character strings.amd-smi metric --power
now correctly details power output for RDNA3, RDNA2, and MI1x devices.- Fixed the
amdsmitstReadWrite.TestPowerCapReadWrite
test for RDNA3, RDNA2, and MI100 devices. - Fixed an issue with the
amdsmi_get_gpu_memory_reserved_pages
andamdsmi_get_gpu_bad_page_info
Python interface calls.
Removals
- Removed the
amdsmi_get_gpu_process_info
API from the Python library. It was removed from the C library in an earlier release.
See the AMD SMI detailed changelog with code samples for more information.
ROCm SMI
ROCm SMI for ROCm 6.1.2
Additions
- Added the ring hang event to the
amdsmi_evt_notification_type_t
enum.
Fixes
- Fixed an issue causing ROCm SMI to incorrectly report GPU utilization for RDNA3 GPUs. See the issue on GitHub.
- Fixed the parsing of
pp_od_clk_voltage
inget_od_clk_volt_info
to work better with MI-series hardware.
RCCL
RCCL 2.18.6 for ROCm 6.1.2
Changes
- Reduced
NCCL_TOPO_MAX_NODES
to limit stack usage and avoid stack overflow.
rocBLAS
rocBLAS 4.1.2 for ROCm 6.1.2
Optimizations
- Tuned BBS TN and TT operations on the CDNA3 architecture.
Fixes
- Fixed an issue related to obtaining solutions for BF16 TT operations.
rocDecode
rocDecode 0.6.0 for ROCm 6.1.2
Additions
- Added support for FFmpeg v5.x.
Optimizations
- Updated error checking in the
rocDecode-setup.py
script.
Changes
- Updated core dependencies.
- Updated to support the use of public LibVA headers.
Fixes
- Fixed some package dependencies.
Upcoming changes
- A future release will enable the use of HIPCC compiled binaries
hipcc.bin
andhipconfig.bin
by default. No action is needed by users; you may continue calling high-level Perl scriptshipcc
andhipconfig
.hipcc.bin
andhipconfig.bin
will be invoked by the high-level Perl scripts. To revert to the previous behavior and invokehipcc.pl
andhipconfig.pl
, set theHIP_USE_PERL_SCRIPTS
environment variable to1
. - A subsequent release will remove high-level HIPCC Perl scripts from
hipcc
andhipconfig
. This release will remove theHIP_USE_PERL_SCRIPTS
environment variable. It will renamehipcc.bin
andhipconfig.bin
tohipcc
andhipconfig
respectively. No action is needed by the users. To revert to the previous behavior, invokehipcc.pl
andhipconfig.pl
explicitly. - A subsequent release will remove
hipcc.pl
andhipconfig.pl
for HIPCC.
ROCm 6.1.1 Release
ROCm 6.1.1 release notes
ROCm™ 6.1.1 introduces minor fixes and improvements to some tools and libraries.
OS support
ROCm 6.1.1 has been tested against a pre-release version of Ubuntu 22.04.5 (kernel 6.8).
AMD SMI
AMD SMI for ROCm 6.1.1
Additions
- Added deferred error correctable counts to
amd-smi metric -ecc -ecc-blocks
.
Changes
- Updated the output of
amd-smi metric --ecc-blocks
to show counters available from blocks. - Updated the output of
amd-smi metric --clock
to reflect each engine. - Updated the output of
amd-smi topology --json
to align with output reported by host and guest systems.
Fixes
- Fixed
amd-smi metric --clock
's clock lock and deep sleep status. - Fixed an issue that would cause an error when resetting non-AMD GPUs.
- Fixed
amd-smi metric --pcie
andamdsmi_get_pcie_info()
when using RDNA3 (Navi 32 and Navi 31) hardware to prevent "UNKNOWN" reports. - Fixed the output results of
amd-smi process
when getting processes running on a device.
Removals
- Removed the
amdsmi_get_gpu_process_info
API from the Python library. It was removed from the C library in an earlier release.
Known issues
amd-smi bad-pages
can result in aValueError: Null pointer access
error when using some PMU firmware versions.
See the [detailed changelog](/~https://github.com/ROCm/amdsmi/blob/docs/6.1.1/CHANGELOG.md) with code samples for more information.
HIPCC
HIPCC for ROCm 6.1.1
Changes
- Upcoming: a future release will enable use of compiled binaries
hipcc.bin
andhipconfig.bin
by default. No action is needed by users. You can continue calling high-level Perl scriptshipcc
andhipconfig
.hipcc.bin
andhipconfig.bin
will be invoked by the high-level Perl scripts. To revert to the previous behavior and invokehipcc.pl
andhipconfig.pl
, set theHIP_USE_PERL_SCRIPTS
environment variable to1
. - Upcoming: a subsequent release will remove high-level Perl scripts
hipcc
andhipconfig
. This release will remove theHIP_USE_PERL_SCRIPTS
environment variable. It will renamehipcc.bin
andhipconfig.bin
tohipcc
andhipconfig
respectively. No action is needed by the users. To revert to the previous behavior, invokehipcc.pl
andhipconfig.pl
explicitly. - Upcoming: a subsequent release will remove
hipcc.pl
andhipconfig.pl
.
HIPIFY
HIPIFY for ROCm 6.1.1
Additions
- Added support for LLVM 18.1.2.
- Added support for cuDNN 9.0.0.
- Added a new option:
--clang-resource-directory
to specify the clang resource path (the path to the parent folder for theinclude
folder that contains__clang_cuda_runtime_wrapper.h
and other header files used during the hipification process).
ROCm SMI
ROCm SMI for ROCm 6.1.1
Known issues
- ROCm SMI reports GPU utilization incorrectly for RDNA3 GPUs in some situations.
hipBLASLt
hipBLASLt 0.7.0 for ROCm 6.1.1
Additions
- Added
hipblasltExtSoftmax
extension API. - Added
hipblasltExtLayerNorm
extension API. - Added
hipblasltExtAMax
extension API. - Added
GemmTuning
extension parameter to set split-k by user. - Added support for mixed precision datatype: fp16/fp8 in with fp16 outk.
Deprecations
- Upcoming:
algoGetHeuristic()
ext API for GroupGemm will be deprecated in a future release of hipBLASLt.
hipSOLVER
hipSOLVER 2.1.1 for ROCm 6.1.1
Changes
- By default,
BUILD_WITH_SPARSE
is now set to OFF on Microsoft Windows.
Fixes
- Fixed benchmark client build when
BUILD_WITH_SPARSE
is OFF.
rocFFT
rocFFT 1.0.27 for ROCm 6.1.1
Additions
- Enable multi-GPU testing on systems without direct GPU-interconnects.
Fixes
- Fixed kernel launch failure on execute of very large odd-length real-complex transforms.
ROCm 6.1.0 Release
ROCm 6.1 release highlights
The ROCm™ 6.1 release consists of new features and fixes to improve the stability and
performance of AMD Instinct™ MI300 GPU applications. Notably, we've added:
-
Full support for Ubuntu 22.04.4.
-
rocDecode, a new ROCm component that provides high-performance video decode support for
AMD GPUs. With rocDecode, you can decode compressed video streams while keeping the resulting
YUV frames in video memory. With decoded frames in video memory, you can run video
post-processing using ROCm HIP, avoiding unnecessary data copies via the PCIe bus.To learn more, refer to the rocDecode
documentation.
OS and GPU support changes
ROCm 6.1 adds the following operating system support:
- MI300A: Ubuntu 22.04.4 and RHEL 9.3
- MI300X: Ubuntu 22.04.4
Future releases will add additional operating systems to match the general offering. For older
generations of supported AMD Instinct products, we’ve added Ubuntu 22.04.4 support.
To view the complete list of supported GPUs and operating systems, refer to the system requirements
page for
[Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html)
and
[Windows](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html).
Installation packages
This release includes a new set of packages for every module (all libraries and binaries default to
DT_RPATH
). Package names have the suffix rpath
; for example, the rpath
variant of rocminfo
is
rocminfo-rpath
.
The new `rpath` packages will conflict with the default packages; they are meant to be used only in
environments where legacy `DT_RPATH` is the preferred form of linking (instead of `DT_RUNPATH`). We
do **not** recommend installing both sets of packages.
ROCm components
The following sections highlight select component-specific changes. For additional details, refer to the
Changelog.
AMD System Management Interface (SMI) Tool
-
New monitor command for GPU metrics.
Use the monitor command to customize, capture, collect, and observe GPU metrics on
target devices. -
Integration with E-SMI.
The EPYC™ System Management Interface In-band Library is a Linux C-library that provides in-band
user space software APIs to monitor and control your CPU’s power, energy, performance, and other
system management functionality. This integration enables access to CPU metrics and telemetry
through the AMD SMI API and CLI tools.
Composable Kernel (CK)
-
New architecture support.
CK now supports to the following architectures to enable efficient image denoising on the following
AMD GPUs: gfx1030, gfx1100, gfx1031, gfx1101, gfx1032, gfx1102, gfx1034, gfx1103, gfx1035,
gfx1036 -
FP8 rounding logic is replaced with stochastic rounding.
Stochastic rounding mimics a more realistic data behavior and improves model convergence.
HIP
- New environment variable to enable kernel run serialization.
The defaultHIP_LAUNCH_BLOCKING
value is0
(disable); which causes kernels to run as defined in
the queue. When set to1
(enable), the HIP runtime serializes the kernel queue, which behaves the
same asAMD_SERIALIZE_KERNEL
.
hipBLASLt
- New GemmTuning extension parameter GemmTuning allows you to set a split-k value for each solution, which is more feasible for
performance tuning.
hipFFT
- New multi-GPU support for single-process transforms Multiple GPUs can be used to perform a transform in a single process. Note that this initial
implementation is a functional preview.
HIPIFY
- Skipped code blocks: Code blocks that are skipped by the preprocessor are no longer hipified under the
--default-preprocessor
option. To hipify everything, despite conditional preprocessor directives
(#if
,#ifdef
,#ifndef
,#elif
, or#else
), don't use the--default-preprocessor
or--amap
options.
hipSPARSELt
- Structured sparsity matrix support extensions
Structured sparsity matrices help speed up deep-learning workloads. We now supportB
as the
sparse matrix andA
as the dense matrix in Sparse Matrix-Matrix Multiplication (SPMM). Prior to this
release, we only supported sparse (matrix A) x dense (matrix B) matrix multiplication. Structured
sparsity matrices help speed up deep learning workloads.
hipTensor
- 4D tensor permutation and contraction support.
You can now perform tensor permutation on 4D tensors and 4D contractions for F16, BF16, and
Complex F32/F64 datatypes.
MIGraphX
-
Improved performance for transformer-based models.
We added support for FlashAttention, which benefits models like BERT, GPT, and Stable Diffusion. -
New Torch-MIGraphX driver.
This driver calls MIGraphX directly from PyTorch. It provides anmgx_module
object that you can
invoke like any other Torch module, but which utilizes the MIGraphX inference engine internally.
Torch-MIGraphX supports FP32, FP16, and INT8 datatypes.- FP8 support. We now offer functional support for inference in the FP8E4M3FNUZ datatype. You
can load an ONNX model in FP8E4M3FNUZ using C++ or Python APIs, ormigraphx-driver
.
You can quantize a floating point model to FP8 format by using the--fp8
flag withmigraphx-driver
.
To accelerate inference, MIGraphX uses hardware acceleration on MI300 for FP8 by leveraging FP8
support in various backend kernel libraries.
- FP8 support. We now offer functional support for inference in the FP8E4M3FNUZ datatype. You
MIOpen
- Improved performance for inference and convolutions.
Inference support now provided for Find 2.0 fusion plans. Additionally, we've enhanced the Number of
samples, Height, Width, and Channels (NHWC) convolution kernels for heuristics. NHWC stores data
in a format where the height and width dimensions come first, followed by channels.
OpenMP
-
Implicit Zero-copy is triggered automatically in XNACK-enabled MI300A systems.
Implicit Zero-copy behavior innon unified_shared_memory
programs is triggered automatically in
XNACK-enabled MI300A systems (for example, when using theHSA_XNACK=1
environment
variable). OpenMP supports the 'requiresunified_shared_memory
' directive to support programs
that don’t want to copy data explicitly between the CPU and GPU. However, this requires that you add
these directives to every translation unit of the program. -
New MI300 FP atomics. Application performance can now improve by leveraging fast floating-point atomics on MI300 (gfx942).
RCCL
-
NCCL 2.18.6 compatibility.
RCCL is now compatible with NCCL 2.18.6, which includes increasing the maximum IB network interfaces to 32 and fixing network device ordering when creating communicators with only one GPU
per node. -
Doubled simultaneous communication channels.
We improved MI300X performance by increasing the maximum number of simultaneous
communication channels from 32 to 64.
rocALUTION
- New multiple node and GPU support.
Unsmoothed and smoothed aggregations and Ruge-Stueben AMG now work with multiple nodes
and GPUs. For more information, refer to the
API documentation.
rocDecode
- New ROCm component.
rocDecode ROCm's newest component, providing high-performance video decode support for AMD
GPUs. To learn more, refer to the
documentation.
ROCm Compiler
-
Combined projects. ROCm Device-Libs, ROCm Compiler Support, and hipCC are now located in
thellvm-project/amd
subdirectory of AMD's fork of the LLVM project. Previously, these projects
were maintained in separate repositories. Note that the projects themselves will continue to be
packaged separately. -
Split the 'rocm-llvm' package. This package has been split into a required and an optional package:
-
rocm-llvm(required): A package containing the essential binaries needed for compilation.
-
rocm-llvm-dev(optional): A package containing binaries for compiler and application developers.
-
ROCm Data Center Tool (RDC)
- C++ upgrades.
RDC was upgraded from C++11 to C++17 to enable a more modern C++ standard when writing RDC plugins.
ROCm Performance Primitives (RPP)
- New backend support.
Audio processing support added for theHOST
backend and 3D Voxel kernels support
for theHOST
andHIP
backends.
ROCm Validation Suite
- New datatype support.
Added BF16 and FP8 datatypes based on General Matrix Multiply(GEMM) operations in the GPU Stress Test (GST) module. This provides additional performance benchmarking and stress testing based on the newly supported datatypes.
rocSOLVER
- New EigenSolver routine.
Based on the Jacobi algorithm, a new EigenSolver routine was added to the library. This routine computes the eigenvalues and eigenvectors of a matrix with improved performance.
ROCTracer
- New versioning and callback enhancements.
Improved to match versioning changes in HIP Runtime and supports runtime API callbacks and activity record logging. The APIs of different runtimes at different levels are considered different API domains with assigned domain IDs.
Upcoming changes
-
ROCm SMI will be deprecated in a future release. We advise migrating to AMD SMI now to
prevent future workflow disruptions. -
hipCC supports, by default, the following compiler invocation flags:
-mllvm -amdgpu-early-inline-all=true
-mllvm -amdgpu-function-calls=false
...
ROCm 6.0.2 Release
ROCm 6.0.2 is a point release with minor bug fixes to improve stability of MI300 GPU applications. This included fixes in the rocSPARSE library. Several new driver features are introduced for system qualification on our partner server offerings.
hipFFT
Changes
- Removed the Git submodule for shared files between rocFFT and hipFFT; instead, just copy the files
over (this should help simplify downstream builds and packaging)