20 Dec 23:59

deb4895

ROCm 6.3.1 Release Latest

Latest

ROCm 6.3.1 release notes

The release notes provide a summary of notable changes since the previous ROCm release.

Release highlights
Operating system and hardware support changes
ROCm components versioning
Detailed component changes
ROCm known issues
ROCm resolved issues
ROCm upcoming changes

If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a
display connected, continue to use ROCm 6.2.3. See the [Use ROCm on Radeon GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/index.html)
documentation to verify compatibility and system requirements.

Release highlights

The following are notable new features and improvements in ROCm 6.3.1. For changes to individual components, see
Detailed component changes.

Per queue resiliency for Instinct MI300 accelerators

The AMDGPU driver now includes enhanced resiliency for misbehaving applications on AMD Instinct MI300 accelerators. This helps isolate the impact of misbehaving applications, ensuring other workloads running on the same accelerator are unaffected.

ROCm Runfile Installer

ROCm 6.3.1 introduces the ROCm Runfile Installer, with initial support for Ubuntu 22.04. The ROCm Runfile Installer facilitates ROCm installation without using a native Linux package management system, with or without network or internet access. For more information, see the ROCm Runfile Installer documentation.

ROCm documentation updates

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.

Added documentation on training a model with ROCm Megatron-LM. AMD offers a Docker image for MI300X accelerators
containing essential components to get started, including ROCm libraries, PyTorch, and Megatron-LM utilities. See
Training a model using ROCm Megatron-LM
to get started.

The new ROCm Megatron-LM training Docker accompanies the ROCm vLLM inference
Docker
as a set of ready-to-use containerized solutions to get started with using ROCm
for AI.
Updated the Instinct MI300X workload tuning
guide with more current optimization
strategies. The updated sections include guidance on vLLM optimization, PyTorch TunableOp, and hipBLASLt tuning.
HIP graph-safe libraries operate safely in HIP execution graphs. HIP graphs are an alternative way of executing tasks on a GPU that can provide performance benefits over launching kernels using the standard method via streams. A topic that shows whether a ROCm library is graph-safe has been added.
The Device memory topic in the HIP memory management section has been updated.
The HIP documentation has expanded with new resources for developers:
- Multi device management
- OpenGL interoperability

Operating system and hardware support changes

ROCm 6.3.1 adds support for Debian 12 (kernel: 6.1). Debian is supported only on AMD Instinct accelerators. See the installation instructions at Debian native installation.

ROCm 6.3.1 enables support for AMD Instinct MI325X accelerator. For more information, see AMD Instinct™ MI325X Accelerators.

See the Compatibility
matrix
for more information about operating system and hardware compatibility.

ROCm components

The following table lists the versions of ROCm components for ROCm 6.3.1, including any version
changes from 6.3.0 to 6.3.1. Click the component's updated version to go to a list of its changes.
Click {fab}github to go to the component's source code on GitHub.

Category	Group	Name	Version
Libraries	Machine learning and computer vision	Composable Kernel	1.1.0
		MIGraphX	2.11.0
		MIOpen	3.3.0
		MIVisionX	3.1.0 ⇒ 3.1.0
		rocAL	2.1.0
		rocDecode	0.8.0
		rocJPEG	0.6.0
		rocPyDecode	0.2.0
		RPP	1.9.1
	Communication	RCCL	2.21.5 ⇒ 2.21.5
	Math	hipBLAS	2.3.0
		hipBLASLt	0.10.0
		hipFFT	1.0.17

Assets 2

05 Dec 01:49

samjwu

rocm-6.3.0

0d3eb1d

ROCm 6.3.0 Release

ROCm 6.3.0 release notes

The release notes provide a summary of notable changes since the previous ROCm release.

Release highlights
Operating system and hardware support changes
ROCm components versioning
Detailed component changes
ROCm known issues
ROCm resolved issues
ROCm upcoming changes

If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a
display connected, continue to use ROCm 6.2.3. See the [Use ROCm on Radeon
GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/index.html)
documentation to verify compatibility and system requirements.

Release highlights

The following are notable new features and improvements in ROCm 6.3.0. For changes to individual components, see
Detailed component changes.

rocJPEG added

ROCm 6.3.0 introduces the rocJPEG library to the ROCm software stack. rocJPEG is a high performance
JPEG decode SDK for AMD GPUs. For more information, see the rocJPEG
documentation.

ROCm Compute Profiler and ROCm Systems Profiler

These ROCm components have been renamed to reflect their new direction as part of the ROCm software
stack.

ROCm Compute Profiler, formerly Omniperf. For more information, see the ROCm Compute Profiler
documentation and
/~https://github.com/ROCm/rocprofiler-compute on GitHub.

ROCm Systems Profiler, formerly Omnitrace. For more information, see the ROCm Systems Profiler
documentation and
/~https://github.com/ROCm/rocprofiler-systems on GitHub.
For future compatibility, the Omnitrace project is available at /~https://github.com/ROCm/omnitrace.
See the Omnitrace documentation.

Update any references to the old binary names `omniperf` and `omnitrace` to
ensure compatibility with the new `rocprof-compute` and `rocprof-sys-*` binaries.
This might include updating environment variables, commands, and paths as
needed to avoid disruptions to your profiling or tracing workflows.

See [ROCm Compute Profiler 3.0.0](#rocm-compute-profiler-3-0-0) and [ROCm Systems
Profiler 0.1.0](#rocm-systems-profiler-0-1-0).

SHARK AI toolkit for high-speed inferencing and serving introduced

SHARK is an open-source toolkit for high-performance serving of popular generative AI and large
language models. In its initial release, SHARK contains the Shortfin high-performance serving
engine, which is the SHARK inferencing
library that includes example server applications for popular models.

This initial release includes support for serving the Stable Diffusion XL model on AMD Instinct™
MI300 devices using ROCm. See the SHARK release
page on GitHub to get started.

PyTorch 2.4 support added

ROCm 6.3.0 adds support for PyTorch 2.4. See the Compatibility
matrix
for the complete list of PyTorch versions tested for compatibility with ROCm.

Flash Attention kernels in Triton and Composable Kernel (CK) added to Transformer Engine

Composable Kernel-based and Triton-based Flash Attention kernels have been integrated into
Transformer Engine via the ROCm Composable Kernel and AOTriton libraries. The
Transformer Engine can now optionally select a flexible and optimized Attention
solution for AMD GPUs. For more information, see Fused Attention Backends on
ROCm
on GitHub.

HIP compatibility

HIP now includes the hipStreamLegacy API. It's equivalent to NVIDIA cudaStreamLegacy. For more
information, see Global enum and
defines
in the HIP runtime API documentation.

Unload active amdgpu-dkms module without a system reboot

On Instinct MI200 and MI300 systems, you can now unload the active amdgpu-dkms modules, and reinstall
and reload newer modules without a system reboot. If the new dkms package includes newer firmware
components, the driver will first reset the device and then load newer firmware components.

ROCm Offline Installer Creator updates

The ROCm Offline Installer Creator 6.3 introduces a new feature to uninstall the previous version of
ROCm on the non-connected target system before installing a new version. This feature is only supported
on the Ubuntu distribution. See the ROCm Offline Installer
Creator
documentation for more information.

OpenCL ICD loader separated from ROCm

The OpenCL ICD loader is no longer delivered as part of ROCm, and must be installed separately
as part of the ROCm installation
process. For Ubuntu and RHEL
installations, the required package is installed as part of the setup described in
Prerequisites.
In other supported Linux distributions like SUSE, the required package must be installed in separate steps, which are included in the installation instructions.

Because the OpenCL path is now separate from the ROCm installation for versioned and multi-version
installations, you must manually define the LD_LIBRARY_PATH to point to the ROCm
installation library as described in the Post-installation
instructions.
If the LD_LIBRARY_PATH is not set as needed for versioned or multi-version installations, OpenCL
applications like clinfo will fail to run and return an error.

ROCT Thunk Interface integrated into ROCr runtime

The ROCT Thunk Interface package is now integrated into the ROCr runtime. As a result, the ROCT package
is no longer included as a separate package in the ROCm software stack.

ROCm documentation updates

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a
wider variety of user needs and use cases.

Documentation for Tensile is now available. Tensile is a library that creates
benchmark-driven backend implementations for GEMMs, serving primarily as a
backend component of rocBLAS. See the Tensile
documentation.
New documentation has been added to explain the advantages of enabling the IOMMU in passthrough
mode for Instinct accelerators and Radeon GPUs. See Input-Output Memory Management
Unit.
The HIP documentation has been updated and includes the following new topics:
The following HIP documentation topics have been updated:
The following HIP documentation topics have been reorganized to improve usability:

Operating system and hardware support changes

ROCm 6.3.0 adds support for the following operating system and kernel versions:

Ubuntu 24.04.2 (kernel: 6.8 [GA], 6.11 [HWE])
Ubuntu 22.04.5 (kernel: 5.15 [GA], 6.8 [HWE])
RHEL 9.5 (kernel: 5.14.0)
Oracle Linux 8.10 (kernel: 5.15.0)

See installation instructions at ROCm installation for
Linux.

ROCm 6.3.0 marks the end of support (EoS) for:

Ubuntu 24.04.1
Ubuntu 22.04.4
RHEL 9.3
RHEL 8.9
Oracle Linux 8.9

Hardware support r...

Assets 2

0 Join discussion

07 Nov 00:04

samjwu

rocm-6.2.4

c5ac1f1

ROCm 6.2.4 Release

ROCm 6.2.4 release notes

The release notes provide a summary of notable changes since the previous ROCm release.

Release highlights
Operating system and hardware support changes
ROCm components versioning
Detailed component changes
ROCm known issues
ROCm upcoming changes

If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a
display connected, continue to use ROCm 6.2.3. See the [Use ROCm on Radeon
GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/index.html)
documentation to verify compatibility and system requirements.

Release highlights

The following are notable new features and improvements in ROCm 6.2.4. For changes to individual components, see
Detailed component changes.

ROCm documentation updates

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for
a wider variety of user needs and use cases.

Added a new GPU cluster networking guide. See
Cluster network performance validation for AMD Instinct accelerators.
This documentation provides guidelines on validating network configurations
in single-node and multi-node environments to attain optimal speed and bandwidth
in AMD Instinct-powered clusters.
Updated the HIP runtime documentation.
- Added a new section on how to use HIP graphs.
- Added a new section about the Stream ordered memory allocator (SOMA).
- Updated the Porting CUDA driver API section.
Updated the Post-installation instructions
with guidance on using the update-alternatives utility and environment modules to help you manage multiple ROCm
versions and streamline PATH configuration.
Updated the LLM inference performance validation on AMD Instinct
MI300X
documentation with more detailed guidance, new models, and the float8 data type.

Operating system and hardware support changes

ROCm 6.2.4 adds support for the AMD Radeon PRO V710 GPU for compute workloads. See
Supported GPUs
for more information.

This release maintains the same operating system support as 6.2.2.

ROCm components

The following table lists the versions of ROCm components for ROCm 6.2.4, including any version changes from 6.2.2 to 6.2.4.

Click the component's updated version to go to a detailed list of its changes. Click to go to the component's source code on GitHub.

Category	Group	Name	Version
Libraries	Machine learning and computer vision	Composable Kernel	1.1.0
		MIGraphX	2.10
		MIOpen	3.2.0
		MIVisionX	3.0.0
		rocAL	2.0.0
		rocDecode	0.6.0
		rocPyDecode	0.1.0
		RPP	1.8.0
	Communication	RCCL	2.20.5
	Math	hipBLAS	2.2.0
		hipBLASLt	0.8.0
		hipFFT	1.0.15 ⇒ 1.0.16
		hipfort	0.4.0
		hipRAND	2.11.0 ⇒ 2.11.1
		hipSOLVER	2.2.0
		hipSPARSE	3.1.1
		hipSPARSELt	0.2.1

Assets 2

27 Sep 21:20

samjwu

rocm-6.2.2

f93799f

ROCm 6.2.2 Release

ROCm 6.2.2 release notes

These release notes provide a summary of notable changes since the previous ROCm release.

As ROCm 6.2.2 was released shortly after 6.2.1, the changes between these versions
are minimal. For a comprehensive overview of recent updates,
refer to the ROCm 6.2.1 release notes.

The Compatibility matrix
provides the full list of supported hardware, operating systems, ecosystems, third-party components, and ROCm components
for each ROCm release.

Release notes for previous ROCm releases are available in earlier versions of the documentation.
See the ROCm documentation release history.

Release highlights

The following is a significant fix introduced in ROCm 6.2.2.

Fixed Instinct MI300X error recovery failure

Improved the reliability of AMD Instinct MI300X accelerators in scenarios involving
uncorrectable errors. Previously, error recovery did not occur as expected,
potentially leaving the system in an undefined state. This fix ensures that error
recovery functions as expected, maintaining system stability.

See the original issue noted in the ROCm 6.2.1 release notes.

Assets 2

1 Join discussion

21 Sep 00:27

samjwu

rocm-6.2.1

e76c795

ROCm 6.2.1 Release

ROCm 6.2.1 release notes

The release notes provide a summary of notable changes since the previous ROCm release.

The Compatibility matrix
provides the full list of supported hardware, operating systems, ecosystems, third-party components, and ROCm components for each ROCm release.

Release notes for previous ROCm releases are available in earlier versions of the documentation.
See the ROCm documentation release history.

Release highlights

The following are notable new features and improvements in ROCm 6.2.1. For changes to individual components, see Detailed component changes.

rocAL major version change

The new version of rocAL introduces many new features, but does not modify any of the existing public API functions. However, the version number was incremented from 1.3 to 2.0.
Applications linked to version 1.3 must be recompiled to link against version 2.0.

See the rocAL detailed changes for more information.

New support for FBGEMM (Facebook General Matrix Multiplication)

As of ROCm 6.2.1, ROCm supports Facebook General Matrix Multiplication (FBGEMM) and the related FBGEMM_GPU library.

FBGEMM is a low-precision, high-performance CPU kernel library for convolution and matrix multiplication. It is used for server-side inference and as a back end for PyTorch quantized operators. FBGEMM_GPU includes a collection of PyTorch GPU operator libraries for training and inference. For more information, see the ROCm Model acceleration libraries guide
and PyTorch's FBGEMM GitHub repository.

ROCm Offline Installer Creator changes

The ROCm Offline Installer Creator 6.2.1 introduces several new features and improvements including:

Logging support for create and install logs
More stringent checks for Linux versions and distributions
Updated prerequisite repositories
Fixed CTest issues

ROCm documentation changes

There have been no changes to supported hardware or operating systems from ROCm 6.2.0 to ROCm 6.2.1.

The Programming Model Reference and Understanding the Programming Model topics in HIP have been consolidated into one topic,
HIP programming model (conceptual).
The HIP virtual memory management and HIP virtual memory management API topics have been added.

The ROCm documentation, like all ROCm projects, is open source and available on GitHub. To contribute to ROCm documentation, see the [ROCm documentation contribution guidelines](https://rocm.docs.amd.com/en/latest/contribute/contributing.html).

Operating system and hardware support changes

There are no changes to supported hardware or operating systems from ROCm 6.2.0 to ROCm 6.2.1.

See the Compatibility matrix for the full list of supported operating systems and hardware architectures.

ROCm components

The following table lists the versions of ROCm components for ROCm 6.2.1, including any version changes from 6.2.0 to 6.2.1.

Click the component's updated version to go to a detailed list of its changes. Click to go to the component's source code on GitHub.

Category	Group	Name	Version
Libraries	Machine learning and computer vision	Composable Kernel	1.1.0
		MIGraphX	2.10
		MIOpen	3.2.0
		MIVisionX	3.0.0
		rocAL	1.0.0 ⇒ 2.0.0
		rocDecode	0.6.0
		rocPyDecode	0.1.0
		RPP	1.8.0
	Communication	RCCL	2.20.5 ⇒ 2.20.5
	Math	hipBLAS	2.2.0
		hipBLASLt	0.8.0
		hipFFT	1.0.15
		hipfort	0.4.0
		hipRAND	2.11.0
		hipSOLVER	2.2.0
		hipSPARSE	3.1.1
		hipSPARSELt	0.2.1
		rocALUTION	3.2.0
		rocBLAS	4.1.2 ⇒ 4.2.1
		rocFFT	1.0.28 ⇒ 1.0.29
		rocRAND	3.1.0
		rocSOLVER	3.26.0
		rocSPARSE	3.2.0
		rocWMMA	1.5.0
		Tensile	4.41.0

Assets 2

0 Join discussion

02 Aug 19:43

samjwu

rocm-6.2.0

87bc26e

ROCm 6.2.0 Release

ROCm 6.2.0 release notes

The release notes provide a comprehensive summary of changes since the previous ROCm release.

Release highlights
Operating system and hardware support changes
ROCm components versioning
Detailed component changes
ROCm known issues
ROCm upcoming changes

The Compatibility matrix
provides an overview of operating system, hardware, ecosystem, and ROCm component support across ROCm releases.

Release notes for previous ROCm releases are available in earlier versions of the documentation.
See the ROCm documentation release history.

Release highlights

This section introduces notable new features and improvements in ROCm 6.2. See the
Detailed component changes for individual component changes.

New components

ROCm 6.2.0 introduces the following new components to the ROCm software stack.

Omniperf -- A kernel-level profiling tool for machine learning and high-performance computing (HPC) workloads
running on AMD Instinct accelerators. Omniperf offers comprehensive profiling and advanced analysis via command line
or a GUI dashboard. For more information, see
Omniperf.
Omnitrace -- A multi-purpose analysis tool for profiling and tracing applications running on the CPU or the CPU and GPU.
It supports dynamic binary instrumentation, call-stack sampling, causal profiling, and other features for determining
which function and line number are executing. For more information, see
Omnitrace.
rocPyDecode -- A tool to access rocDecode APIs in Python. It connects Python and C/C++ libraries,
enabling function calling and data passing between the two languages. The rocpydecode.so library, a wrapper, uses
rocDecode APIs written primarily in C/C++ within Python. For more information, see
rocPyDecode.
ROCprofiler-SDK -- ROCprofiler-SDK is a profiling and tracing library for HIP and ROCm applications on AMD ROCm software
used to identify application performance bottlenecks and optimize their performance. The new APIs add restrictions for more
efficient implementations and improved thread safety. A new window restriction specifies the services the tool can use.
ROCprofiler-SDK also provides a tool library to help you write your tool implementations. rocprofv3 uses this tool library
to profile and trace applications for performance bottlenecks. Examples include API tracing, kernel tracing, and so on.
For more information, see ROCprofiler-SDK.
```
ROCprofiler-SDK for ROCm 6.2.0 is a beta release and subject to change.
```

ROCm Offline Installer Creator introduced

The new ROCm Offline Installer Creator creates an installation package for a preconfigured setup of ROCm, the AMDGPU
driver, or a combination of the two on a target system without network access. This new tool customizes
multiple unique configurations for use when installing ROCm on a target. Other notable features include:

A lightweight, easy-to-use user interface for configuring the creation of the installer
Support for multiple Linux distributions
Installer support for different ROCm releases and specific ROCm components
Optional driver or driver-only installer creation
Optional post-install preferences
Lightweight installer packages, which are unique to the preconfigured ROCm setup
Resolution and inclusion of dependency packages for offline installation

For more information, see
ROCm Offline Installer Creator.

Math libraries default to Clang instead of HIPCC

The default compiler used to build the math libraries on Linux changes from hipcc to amdclang++.
Appropriate compiler flags are added to ensure these compilations build correctly. This change only applies when
building the libraries. Applications using the libraries can continue to be compiled using hipcc or amdclang++ as
described in ROCm compiler reference.
The math libraries can also be built with hipcc using any of the previously available methods (for example, the CXX
environment variable, the CMAKE_CXX_COMPILER CMake variable, and so on). This change shouldn't affect performance or
functionality.

Framework and library changes

This section highlights updates to supported deep learning frameworks and notable third-party library optimizations.

Additional PyTorch and TensorFlow support

ROCm 6.2.0 supports PyTorch versions 2.2 and 2.3 and TensorFlow version 2.16.

See Installing PyTorch for ROCm
and Installing TensorFlow for ROCm
for installation instructions.

Refer to the
Third-party support matrix
for a comprehensive list of third-party frameworks and libraries supported by ROCm.

Optimized framework support for OpenXLA

PyTorch for ROCm and TensorFlow for ROCm now provide native support for OpenXLA. OpenXLA is an open-source ML compiler
ecosystem that enables developers to compile and optimize models from all leading ML frameworks. For more information, see
Installing PyTorch for ROCm
and Installing TensorFlow for ROCm.

PyTorch support for Autocast (automatic mixed precision)

PyTorch now supports Autocast for recurrent neural networks (RNNs) on ROCm. This can help to reduce computational
workloads and improve performance. Based on the information about the magnitude of values, Autocast can substitute the
original float32 linear layers and convolutions with their float16 or bfloat16 variants. For more information, see
Automatic mixed precision.

Memory savings for bitsandbytes model quantization

The ROCm-aware bitsandbytes library is a lightweight Python wrapper around HIP
custom functions, in particular 8-bit optimizer, matrix multiplication, and 8-bit and 4-bit quantization functions.
ROCm 6.2.0 introduces the following bitsandbytes changes:

Int8 matrix multiplication is enabled, and it includes the following functions:
- extract-outliers – extracts rows and columns that have outliers in the inputs. They’re later used for matrix multiplication without quantization.
- transform – row-to-column and column-to-row transformations are enabled, along with transpose operations. These are used before and after matmul computation.
- igemmlt – new function for GEMM computation A*B^T. It uses
  hipblasLtMatMul and performs 8-bit GEMM operations.
- dequant_mm – dequantizes output matrix to original data type using scaling factors from vector-wise quantization.
Blockwise quantization – input tensors are quantized for a fixed block size.
4-bit quantization and dequantization functions – normalized Float4 quantization, quantile estimation, and quantile quantization functions are enabled.
8-bit and 32-bit optimizers are enabled.

These functions are included in bitsandbytes. They are not part of ROCm. However, ROCm 6.2.0 has enabled the fixes and
features to run them.

For more information, see Model quantization techniques.

Improved vLLM support

ROCm 6.2.0 enhances vLLM support for inference on AMD Instinct accelerators, adding
capabilities for FP16/BF16 precision for LLMs, and FP8 support for Llama.
ROCm 6.2.0 adds support for the following vLLM features:

MP: Multi-GPU execution. Choose between MP and Ray using a flag. To set it to MP,
use --distributed-executor-backed=mp. The default depends on the commit in flux.
FP8 KV cache: Enhances computational efficiency and performance by significantly reducing memory usage and bandwidth requirements.
The QUARK quantizer currently only supports Llama.
Triton Flash Attention:

ROCm supports both Triton and Composable Kernel Flash Attention 2 in vLLM. The default is Triton, but you can change this
setting using the VLLM_USE_FLASH_ATTN_TRITON=False environment variable.
PyTorch TunableOp:

Improved optimization and tuning of GEMMs. It requires Docker with PyTorch 2.3 or later.

For more information about enabling these features, see
vLLM inference.

ROCm has a vLLM branch for experimental features. This includes performance improvements, accuracy, and correctness testing.
These features include:

FP8 GEMMs: To improve the performance of FP8 quantization, work is underway on tuning the GEMM using the shapes used
in the model's execution. It only supp...

Assets 2

1 Join discussion

04 Jun 22:13

samjwu

rocm-6.1.2

17f12a1

ROCm 6.1.2 Release

ROCm 6.1.2 release notes

ROCm 6.1.2 includes enhancements to SMI tools and improvements to some libraries.

OS support

ROCm 6.1.2 has been tested against a pre-release version of Ubuntu 22.04.5 (kernel: 5.15 [GA], 6.8 [HWE]).

AMD SMI

AMD SMI for ROCm 6.1.2

Additions

Added process isolation and clean shader APIs and CLI commands.
- amdsmi_get_gpu_process_isolation()
- amdsmi_set_gpu_process_isolation()
- amdsmi_set_gpu_clear_sram_data()
Added the MIN_POWER metric to output provided by amd-smi static --limit.

Optimizations

Updated the amd-smi monitor --pcie output to prevent delays with the monitor command.

Changes

Updated amismi_get_power_cap_info to return values in uW instead of W.
Updated Python library return types for amdsmi_get_gpu_memory_reserved_pages and amdsmi_get_gpu_bad_page_info.
Updated the output of amd-smi metric --ecc-blocks to show counters available from blocks.

Fixes

amdsmi_get_gpu_board_info() no longer returns junk character strings.
amd-smi metric --power now correctly details power output for RDNA3, RDNA2, and MI1x devices.
Fixed the amdsmitstReadWrite.TestPowerCapReadWrite test for RDNA3, RDNA2, and MI100 devices.
Fixed an issue with the amdsmi_get_gpu_memory_reserved_pages and amdsmi_get_gpu_bad_page_info Python interface calls.

Removals

Removed the amdsmi_get_gpu_process_info API from the Python library. It was removed from the C library in an earlier release.

See the AMD SMI detailed changelog with code samples for more information.

ROCm SMI

ROCm SMI for ROCm 6.1.2

Additions

Added the ring hang event to the amdsmi_evt_notification_type_t enum.

Fixes

Fixed an issue causing ROCm SMI to incorrectly report GPU utilization for RDNA3 GPUs. See the issue on GitHub.
Fixed the parsing of pp_od_clk_voltage in get_od_clk_volt_info to work better with MI-series hardware.

RCCL

RCCL 2.18.6 for ROCm 6.1.2

Changes

Reduced NCCL_TOPO_MAX_NODES to limit stack usage and avoid stack overflow.

rocBLAS

rocBLAS 4.1.2 for ROCm 6.1.2

Optimizations

Tuned BBS TN and TT operations on the CDNA3 architecture.

Fixes

Fixed an issue related to obtaining solutions for BF16 TT operations.

rocDecode

rocDecode 0.6.0 for ROCm 6.1.2

Additions

Added support for FFmpeg v5.x.

Optimizations

Updated error checking in the rocDecode-setup.py script.

Changes

Updated core dependencies.
Updated to support the use of public LibVA headers.

Fixes

Fixed some package dependencies.

Upcoming changes

A future release will enable the use of HIPCC compiled binaries hipcc.bin and hipconfig.bin by default. No action is needed by users; you may continue calling high-level Perl scripts hipcc and hipconfig. hipcc.bin and hipconfig.bin will be invoked by the high-level Perl scripts. To revert to the previous behavior and invoke hipcc.pl and hipconfig.pl, set the HIP_USE_PERL_SCRIPTS environment variable to 1.
A subsequent release will remove high-level HIPCC Perl scripts from hipcc and hipconfig. This release will remove the HIP_USE_PERL_SCRIPTS environment variable. It will rename hipcc.bin and hipconfig.bin to hipcc and hipconfig respectively. No action is needed by the users. To revert to the previous behavior, invoke hipcc.pl and hipconfig.pl explicitly.
A subsequent release will remove hipcc.pl and hipconfig.pl for HIPCC.

Assets 2

2 Join discussion

08 May 22:49

samjwu

rocm-6.1.1

735b057

ROCm 6.1.1 Release

ROCm 6.1.1 release notes

ROCm™ 6.1.1 introduces minor fixes and improvements to some tools and libraries.

OS support

ROCm 6.1.1 has been tested against a pre-release version of Ubuntu 22.04.5 (kernel 6.8).

AMD SMI

AMD SMI for ROCm 6.1.1

Additions

Added deferred error correctable counts to amd-smi metric -ecc -ecc-blocks.

Changes

Updated the output of amd-smi metric --ecc-blocks to show counters available from blocks.
Updated the output of amd-smi metric --clock to reflect each engine.
Updated the output of amd-smi topology --json to align with output reported by host and guest systems.

Fixes

Fixed amd-smi metric --clock's clock lock and deep sleep status.
Fixed an issue that would cause an error when resetting non-AMD GPUs.
Fixed amd-smi metric --pcie and amdsmi_get_pcie_info() when using RDNA3 (Navi 32 and Navi 31) hardware to prevent "UNKNOWN" reports.
Fixed the output results of amd-smi process when getting processes running on a device.

Removals

Removed the amdsmi_get_gpu_process_info API from the Python library. It was removed from the C library in an earlier release.

Known issues

amd-smi bad-pages can result in a ValueError: Null pointer access error when using some PMU firmware versions.

See the [detailed changelog](/~https://github.com/ROCm/amdsmi/blob/docs/6.1.1/CHANGELOG.md) with code samples for more information.

HIPCC

HIPCC for ROCm 6.1.1

Changes

Upcoming: a future release will enable use of compiled binaries hipcc.bin and hipconfig.bin by default. No action is needed by users. You can continue calling high-level Perl scripts hipcc and hipconfig. hipcc.bin and hipconfig.bin will be invoked by the high-level Perl scripts. To revert to the previous behavior and invoke hipcc.pl and hipconfig.pl, set the HIP_USE_PERL_SCRIPTS environment variable to 1.
Upcoming: a subsequent release will remove high-level Perl scripts hipcc and hipconfig. This release will remove the HIP_USE_PERL_SCRIPTS environment variable. It will rename hipcc.bin and hipconfig.bin to hipcc and hipconfig respectively. No action is needed by the users. To revert to the previous behavior, invoke hipcc.pl and hipconfig.pl explicitly.
Upcoming: a subsequent release will remove hipcc.pl and hipconfig.pl.

HIPIFY

HIPIFY for ROCm 6.1.1

Additions

Added support for LLVM 18.1.2.
Added support for cuDNN 9.0.0.
Added a new option: --clang-resource-directory to specify the clang resource path (the path to the parent folder for the include folder that contains __clang_cuda_runtime_wrapper.h and other header files used during the hipification process).

ROCm SMI

ROCm SMI for ROCm 6.1.1

Known issues

ROCm SMI reports GPU utilization incorrectly for RDNA3 GPUs in some situations.

hipBLASLt

hipBLASLt 0.7.0 for ROCm 6.1.1

Additions

Added hipblasltExtSoftmax extension API.
Added hipblasltExtLayerNorm extension API.
Added hipblasltExtAMax extension API.
Added GemmTuning extension parameter to set split-k by user.
Added support for mixed precision datatype: fp16/fp8 in with fp16 outk.

Deprecations

Upcoming: algoGetHeuristic() ext API for GroupGemm will be deprecated in a future release of hipBLASLt.

hipSOLVER

hipSOLVER 2.1.1 for ROCm 6.1.1

Changes

By default, BUILD_WITH_SPARSE is now set to OFF on Microsoft Windows.

Fixes

Fixed benchmark client build when BUILD_WITH_SPARSE is OFF.

rocFFT

rocFFT 1.0.27 for ROCm 6.1.1

Additions

Enable multi-GPU testing on systems without direct GPU-interconnects.

Fixes

Fixed kernel launch failure on execute of very large odd-length real-complex transforms.

Assets 2

1 Join discussion

16 Apr 22:03

samjwu

rocm-6.1.0

4970c5d

ROCm 6.1.0 Release

ROCm 6.1 release highlights

The ROCm™ 6.1 release consists of new features and fixes to improve the stability and
performance of AMD Instinct™ MI300 GPU applications. Notably, we've added:

Full support for Ubuntu 22.04.4.
rocDecode, a new ROCm component that provides high-performance video decode support for
AMD GPUs. With rocDecode, you can decode compressed video streams while keeping the resulting
YUV frames in video memory. With decoded frames in video memory, you can run video
post-processing using ROCm HIP, avoiding unnecessary data copies via the PCIe bus.

To learn more, refer to the rocDecode
documentation.

OS and GPU support changes

ROCm 6.1 adds the following operating system support:

MI300A: Ubuntu 22.04.4 and RHEL 9.3
MI300X: Ubuntu 22.04.4

Future releases will add additional operating systems to match the general offering. For older
generations of supported AMD Instinct products, we’ve added Ubuntu 22.04.4 support.

To view the complete list of supported GPUs and operating systems, refer to the system requirements
page for
[Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html)
and
[Windows](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html).

Installation packages

This release includes a new set of packages for every module (all libraries and binaries default to
DT_RPATH). Package names have the suffix rpath; for example, the rpath variant of rocminfo is
rocminfo-rpath.

The new `rpath` packages will conflict with the default packages; they are meant to be used only in
environments where legacy `DT_RPATH` is the preferred form of linking (instead of `DT_RUNPATH`). We
do **not** recommend installing both sets of packages.

ROCm components

The following sections highlight select component-specific changes. For additional details, refer to the
Changelog.

AMD System Management Interface (SMI) Tool

New monitor command for GPU metrics.
Use the monitor command to customize, capture, collect, and observe GPU metrics on
target devices.
Integration with E-SMI.
The EPYC™ System Management Interface In-band Library is a Linux C-library that provides in-band
user space software APIs to monitor and control your CPU’s power, energy, performance, and other
system management functionality. This integration enables access to CPU metrics and telemetry
through the AMD SMI API and CLI tools.

Composable Kernel (CK)

New architecture support.
CK now supports to the following architectures to enable efficient image denoising on the following
AMD GPUs: gfx1030, gfx1100, gfx1031, gfx1101, gfx1032, gfx1102, gfx1034, gfx1103, gfx1035,
gfx1036
FP8 rounding logic is replaced with stochastic rounding.
Stochastic rounding mimics a more realistic data behavior and improves model convergence.

HIP

New environment variable to enable kernel run serialization.
The default HIP_LAUNCH_BLOCKING value is 0 (disable); which causes kernels to run as defined in
the queue. When set to 1 (enable), the HIP runtime serializes the kernel queue, which behaves the
same as AMD_SERIALIZE_KERNEL.

hipBLASLt

New GemmTuning extension parameter GemmTuning allows you to set a split-k value for each solution, which is more feasible for
performance tuning.

hipFFT

New multi-GPU support for single-process transforms Multiple GPUs can be used to perform a transform in a single process. Note that this initial
implementation is a functional preview.

HIPIFY

Skipped code blocks: Code blocks that are skipped by the preprocessor are no longer hipified under the
--default-preprocessor option. To hipify everything, despite conditional preprocessor directives
(#if, #ifdef, #ifndef, #elif, or #else), don't use the --default-preprocessor or --amap options.

hipSPARSELt

Structured sparsity matrix support extensions
Structured sparsity matrices help speed up deep-learning workloads. We now support B as the
sparse matrix and A as the dense matrix in Sparse Matrix-Matrix Multiplication (SPMM). Prior to this
release, we only supported sparse (matrix A) x dense (matrix B) matrix multiplication. Structured
sparsity matrices help speed up deep learning workloads.

hipTensor

4D tensor permutation and contraction support.
You can now perform tensor permutation on 4D tensors and 4D contractions for F16, BF16, and
Complex F32/F64 datatypes.

MIGraphX

Improved performance for transformer-based models.
We added support for FlashAttention, which benefits models like BERT, GPT, and Stable Diffusion.
New Torch-MIGraphX driver.
This driver calls MIGraphX directly from PyTorch. It provides an mgx_module object that you can
invoke like any other Torch module, but which utilizes the MIGraphX inference engine internally.
Torch-MIGraphX supports FP32, FP16, and INT8 datatypes.
- FP8 support. We now offer functional support for inference in the FP8E4M3FNUZ datatype. You
  can load an ONNX model in FP8E4M3FNUZ using C++ or Python APIs, or migraphx-driver.
  You can quantize a floating point model to FP8 format by using the --fp8 flag with migraphx-driver.
  To accelerate inference, MIGraphX uses hardware acceleration on MI300 for FP8 by leveraging FP8
  support in various backend kernel libraries.

MIOpen

Improved performance for inference and convolutions.
Inference support now provided for Find 2.0 fusion plans. Additionally, we've enhanced the Number of
samples, Height, Width, and Channels (NHWC) convolution kernels for heuristics. NHWC stores data
in a format where the height and width dimensions come first, followed by channels.

OpenMP

Implicit Zero-copy is triggered automatically in XNACK-enabled MI300A systems.
Implicit Zero-copy behavior in non unified_shared_memory programs is triggered automatically in
XNACK-enabled MI300A systems (for example, when using the HSA_XNACK=1 environment
variable). OpenMP supports the 'requires unified_shared_memory' directive to support programs
that don’t want to copy data explicitly between the CPU and GPU. However, this requires that you add
these directives to every translation unit of the program.
New MI300 FP atomics. Application performance can now improve by leveraging fast floating-point atomics on MI300 (gfx942).

RCCL

NCCL 2.18.6 compatibility.
RCCL is now compatible with NCCL 2.18.6, which includes increasing the maximum IB network interfaces to 32 and fixing network device ordering when creating communicators with only one GPU
per node.
Doubled simultaneous communication channels.
We improved MI300X performance by increasing the maximum number of simultaneous
communication channels from 32 to 64.

rocALUTION

New multiple node and GPU support.
Unsmoothed and smoothed aggregations and Ruge-Stueben AMG now work with multiple nodes
and GPUs. For more information, refer to the
API documentation.

rocDecode

New ROCm component.
rocDecode ROCm's newest component, providing high-performance video decode support for AMD
GPUs. To learn more, refer to the
documentation.

ROCm Compiler

Combined projects. ROCm Device-Libs, ROCm Compiler Support, and hipCC are now located in
the llvm-project/amd subdirectory of AMD's fork of the LLVM project. Previously, these projects
were maintained in separate repositories. Note that the projects themselves will continue to be
packaged separately.
Split the 'rocm-llvm' package. This package has been split into a required and an optional package:
- rocm-llvm(required): A package containing the essential binaries needed for compilation.
- rocm-llvm-dev(optional): A package containing binaries for compiler and application developers.

ROCm Data Center Tool (RDC)

C++ upgrades.
RDC was upgraded from C++11 to C++17 to enable a more modern C++ standard when writing RDC plugins.

ROCm Performance Primitives (RPP)

New backend support.
Audio processing support added for the HOST backend and 3D Voxel kernels support
for the HOST and HIP backends.

ROCm Validation Suite

New datatype support.
Added BF16 and FP8 datatypes based on General Matrix Multiply(GEMM) operations in the GPU Stress Test (GST) module. This provides additional performance benchmarking and stress testing based on the newly supported datatypes.

rocSOLVER

New EigenSolver routine.
Based on the Jacobi algorithm, a new EigenSolver routine was added to the library. This routine computes the eigenvalues and eigenvectors of a matrix with improved performance.

ROCTracer

New versioning and callback enhancements.
Improved to match versioning changes in HIP Runtime and supports runtime API callbacks and activity record logging. The APIs of different runtimes at different levels are considered different API domains with assigned domain IDs.

Upcoming changes

ROCm SMI will be deprecated in a future release. We advise migrating to AMD SMI now to
prevent future workflow disruptions.
hipCC supports, by default, the following compiler invocation flags:
- -mllvm -amdgpu-early-inline-all=true
- -mllvm -amdgpu-function-calls=false
...

Assets 2

31 Jan 23:29

samjwu

rocm-6.0.2

43cd749

ROCm 6.0.2 Release

ROCm 6.0.2 is a point release with minor bug fixes to improve stability of MI300 GPU applications. This included fixes in the rocSPARSE library. Several new driver features are introduced for system qualification on our partner server offerings.

hipFFT

Changes

Removed the Git submodule for shared files between rocFFT and hipFFT; instead, just copy the files
over (this should help simplify downstream builds and packaging)

Assets 2

0 Join discussion

Releases: ROCm/ROCm

ROCm 6.3.1 Release

ROCm 6.3.1 release notes

Release highlights

Per queue resiliency for Instinct MI300 accelerators

ROCm Runfile Installer

ROCm documentation updates

Operating system and hardware support changes

ROCm components

ROCm 6.3.0 Release

ROCm 6.3.0 release notes

Release highlights

rocJPEG added

ROCm Compute Profiler and ROCm Systems Profiler

SHARK AI toolkit for high-speed inferencing and serving introduced

PyTorch 2.4 support added

Flash Attention kernels in Triton and Composable Kernel (CK) added to Transformer Engine

HIP compatibility

Unload active amdgpu-dkms module without a system reboot

ROCm Offline Installer Creator updates

OpenCL ICD loader separated from ROCm

ROCT Thunk Interface integrated into ROCr runtime

ROCm documentation updates

Operating system and hardware support changes

ROCm 6.2.4 Release

ROCm 6.2.4 release notes

Release highlights

ROCm documentation updates

Operating system and hardware support changes

ROCm components

ROCm 6.2.2 Release

ROCm 6.2.2 release notes

Release highlights

Fixed Instinct MI300X error recovery failure

ROCm 6.2.1 Release

ROCm 6.2.1 release notes

Release highlights

rocAL major version change

New support for FBGEMM (Facebook General Matrix Multiplication)

ROCm Offline Installer Creator changes

ROCm documentation changes

Operating system and hardware support changes

ROCm components

ROCm 6.2.0 Release

ROCm 6.2.0 release notes

Release highlights

New components

ROCm Offline Installer Creator introduced

Math libraries default to Clang instead of HIPCC

Framework and library changes

Additional PyTorch and TensorFlow support

Optimized framework support for OpenXLA

PyTorch support for Autocast (automatic mixed precision)

Memory savings for bitsandbytes model quantization

Improved vLLM support

ROCm 6.1.2 Release

ROCm 6.1.2 release notes

OS support

AMD SMI

Additions

Optimizations

Changes

Fixes

Removals

ROCm SMI

Additions

Fixes

RCCL

Changes

rocBLAS

Optimizations

Fixes

rocDecode

Additions

Optimizations

Changes

Fixes

Upcoming changes

ROCm 6.1.1 Release

ROCm 6.1.1 release notes