Skip to content

Releases: ROCm/ROCm

ROCm 6.3.1 Release

20 Dec 23:59
Compare
Choose a tag to compare

ROCm 6.3.1 release notes

The release notes provide a summary of notable changes since the previous ROCm release.

If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a
display connected, continue to use ROCm 6.2.3. See the [Use ROCm on Radeon GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/index.html)
documentation to verify compatibility and system requirements.

Release highlights

The following are notable new features and improvements in ROCm 6.3.1. For changes to individual components, see
Detailed component changes.

Per queue resiliency for Instinct MI300 accelerators

The AMDGPU driver now includes enhanced resiliency for misbehaving applications on AMD Instinct MI300 accelerators. This helps isolate the impact of misbehaving applications, ensuring other workloads running on the same accelerator are unaffected.

ROCm Runfile Installer

ROCm 6.3.1 introduces the ROCm Runfile Installer, with initial support for Ubuntu 22.04. The ROCm Runfile Installer facilitates ROCm installation without using a native Linux package management system, with or without network or internet access. For more information, see the ROCm Runfile Installer documentation.

ROCm documentation updates

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.

  • Added documentation on training a model with ROCm Megatron-LM. AMD offers a Docker image for MI300X accelerators
    containing essential components to get started, including ROCm libraries, PyTorch, and Megatron-LM utilities. See
    Training a model using ROCm Megatron-LM
    to get started.

    The new ROCm Megatron-LM training Docker accompanies the ROCm vLLM inference
    Docker

    as a set of ready-to-use containerized solutions to get started with using ROCm
    for AI.

  • Updated the Instinct MI300X workload tuning
    guide
    with more current optimization
    strategies. The updated sections include guidance on vLLM optimization, PyTorch TunableOp, and hipBLASLt tuning.

  • HIP graph-safe libraries operate safely in HIP execution graphs. HIP graphs are an alternative way of executing tasks on a GPU that can provide performance benefits over launching kernels using the standard method via streams. A topic that shows whether a ROCm library is graph-safe has been added.

  • The Device memory topic in the HIP memory management section has been updated.

  • The HIP documentation has expanded with new resources for developers:

Operating system and hardware support changes

ROCm 6.3.1 adds support for Debian 12 (kernel: 6.1). Debian is supported only on AMD Instinct accelerators. See the installation instructions at Debian native installation.

ROCm 6.3.1 enables support for AMD Instinct MI325X accelerator. For more information, see AMD Instinct™ MI325X Accelerators.

See the Compatibility
matrix

for more information about operating system and hardware compatibility.

ROCm components

The following table lists the versions of ROCm components for ROCm 6.3.1, including any version
changes from 6.3.0 to 6.3.1. Click the component's updated version to go to a list of its changes.
Click {fab}github to go to the component's source code on GitHub.

Category Group Name Version
Libraries Machine learning and computer vision Composable Kernel 1.1.0
MIGraphX 2.11.0
MIOpen 3.3.0
MIVisionX 3.1.0 ⇒ 3.1.0
rocAL 2.1.0
rocDecode 0.8.0
rocJPEG 0.6.0
rocPyDecode 0.2.0
RPP 1.9.1
Communication RCCL 2.21.5 ⇒ 2.21.5
Math hipBLAS 2.3.0
hipBLASLt 0.10.0
hipFFT 1.0.17
Read more

ROCm 6.3.0 Release

05 Dec 01:49
0d3eb1d
Compare
Choose a tag to compare

ROCm 6.3.0 release notes

The release notes provide a summary of notable changes since the previous ROCm release.

If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a
display connected, continue to use ROCm 6.2.3. See the [Use ROCm on Radeon
GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/index.html)
documentation to verify compatibility and system requirements.

Release highlights

The following are notable new features and improvements in ROCm 6.3.0. For changes to individual components, see
Detailed component changes.

rocJPEG added

ROCm 6.3.0 introduces the rocJPEG library to the ROCm software stack. rocJPEG is a high performance
JPEG decode SDK for AMD GPUs. For more information, see the rocJPEG
documentation
.

ROCm Compute Profiler and ROCm Systems Profiler

These ROCm components have been renamed to reflect their new direction as part of the ROCm software
stack.

SHARK AI toolkit for high-speed inferencing and serving introduced

SHARK is an open-source toolkit for high-performance serving of popular generative AI and large
language models. In its initial release, SHARK contains the Shortfin high-performance serving
engine
, which is the SHARK inferencing
library that includes example server applications for popular models.

This initial release includes support for serving the Stable Diffusion XL model on AMD Instinct™
MI300 devices using ROCm. See the SHARK release
page
on GitHub to get started.

PyTorch 2.4 support added

ROCm 6.3.0 adds support for PyTorch 2.4. See the Compatibility
matrix

for the complete list of PyTorch versions tested for compatibility with ROCm.

Flash Attention kernels in Triton and Composable Kernel (CK) added to Transformer Engine

Composable Kernel-based and Triton-based Flash Attention kernels have been integrated into
Transformer Engine via the ROCm Composable Kernel and AOTriton libraries. The
Transformer Engine can now optionally select a flexible and optimized Attention
solution for AMD GPUs. For more information, see Fused Attention Backends on
ROCm

on GitHub.

HIP compatibility

HIP now includes the hipStreamLegacy API. It's equivalent to NVIDIA cudaStreamLegacy. For more
information, see Global enum and
defines

in the HIP runtime API documentation.

Unload active amdgpu-dkms module without a system reboot

On Instinct MI200 and MI300 systems, you can now unload the active amdgpu-dkms modules, and reinstall
and reload newer modules without a system reboot. If the new dkms package includes newer firmware
components, the driver will first reset the device and then load newer firmware components.

ROCm Offline Installer Creator updates

The ROCm Offline Installer Creator 6.3 introduces a new feature to uninstall the previous version of
ROCm on the non-connected target system before installing a new version. This feature is only supported
on the Ubuntu distribution. See the ROCm Offline Installer
Creator

documentation for more information.

OpenCL ICD loader separated from ROCm

The OpenCL ICD loader is no longer delivered as part of ROCm, and must be installed separately
as part of the ROCm installation
process
. For Ubuntu and RHEL
installations, the required package is installed as part of the setup described in
Prerequisites.
In other supported Linux distributions like SUSE, the required package must be installed in separate steps, which are included in the installation instructions.

Because the OpenCL path is now separate from the ROCm installation for versioned and multi-version
installations, you must manually define the LD_LIBRARY_PATH to point to the ROCm
installation library as described in the Post-installation
instructions
.
If the LD_LIBRARY_PATH is not set as needed for versioned or multi-version installations, OpenCL
applications like clinfo will fail to run and return an error.

ROCT Thunk Interface integrated into ROCr runtime

The ROCT Thunk Interface package is now integrated into the ROCr runtime. As a result, the ROCT package
is no longer included as a separate package in the ROCm software stack.

ROCm documentation updates

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a
wider variety of user needs and use cases.

Operating system and hardware support changes

ROCm 6.3.0 adds support for the following operating system and kernel versions:

  • Ubuntu 24.04.2 (kernel: 6.8 [GA], 6.11 [HWE])
  • Ubuntu 22.04.5 (kernel: 5.15 [GA], 6.8 [HWE])
  • RHEL 9.5 (kernel: 5.14.0)
  • Oracle Linux 8.10 (kernel: 5.15.0)

See installation instructions at ROCm installation for
Linux
.

ROCm 6.3.0 marks the end of support (EoS) for:

  • Ubuntu 24.04.1
  • Ubuntu 22.04.4
  • RHEL 9.3
  • RHEL 8.9
  • Oracle Linux 8.9

Hardware support r...

Read more

ROCm 6.2.4 Release

07 Nov 00:04
c5ac1f1
Compare
Choose a tag to compare

ROCm 6.2.4 release notes

The release notes provide a summary of notable changes since the previous ROCm release.

If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a
display connected, continue to use ROCm 6.2.3. See the [Use ROCm on Radeon
GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/index.html)
documentation to verify compatibility and system requirements.

Release highlights

The following are notable new features and improvements in ROCm 6.2.4. For changes to individual components, see
Detailed component changes.

ROCm documentation updates

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for
a wider variety of user needs and use cases.

Operating system and hardware support changes

ROCm 6.2.4 adds support for the AMD Radeon PRO V710 GPU for compute workloads. See
Supported GPUs
for more information.

This release maintains the same operating system support as 6.2.2.

ROCm components

The following table lists the versions of ROCm components for ROCm 6.2.4, including any version changes from 6.2.2 to 6.2.4.

Click the component's updated version to go to a detailed list of its changes. Click to go to the component's source code on GitHub.

Category Group Name Version
Libraries Machine learning and computer vision Composable Kernel 1.1.0
MIGraphX 2.10
MIOpen 3.2.0
MIVisionX 3.0.0
rocAL 2.0.0
rocDecode 0.6.0
rocPyDecode 0.1.0
RPP 1.8.0
Communication RCCL 2.20.5
Math hipBLAS 2.2.0
hipBLASLt 0.8.0
hipFFT 1.0.15 ⇒ 1.0.16
hipfort 0.4.0
hipRAND 2.11.0 ⇒ 2.11.1
hipSOLVER 2.2.0
hipSPARSE 3.1.1
hipSPARSELt 0.2.1
Read more

ROCm 6.2.2 Release

27 Sep 21:20
f93799f
Compare
Choose a tag to compare

ROCm 6.2.2 release notes

These release notes provide a summary of notable changes since the previous ROCm release.

As ROCm 6.2.2 was released shortly after 6.2.1, the changes between these versions
are minimal. For a comprehensive overview of recent updates,
refer to the ROCm 6.2.1 release notes.

The Compatibility matrix
provides the full list of supported hardware, operating systems, ecosystems, third-party components, and ROCm components
for each ROCm release.

Release notes for previous ROCm releases are available in earlier versions of the documentation.
See the ROCm documentation release history.

Release highlights

The following is a significant fix introduced in ROCm 6.2.2.

Fixed Instinct MI300X error recovery failure

Improved the reliability of AMD Instinct MI300X accelerators in scenarios involving
uncorrectable errors. Previously, error recovery did not occur as expected,
potentially leaving the system in an undefined state. This fix ensures that error
recovery functions as expected, maintaining system stability.

See the original issue noted in the ROCm 6.2.1 release notes.

ROCm 6.2.1 Release

21 Sep 00:27
e76c795
Compare
Choose a tag to compare

ROCm 6.2.1 release notes

The release notes provide a summary of notable changes since the previous ROCm release.

The Compatibility matrix
provides the full list of supported hardware, operating systems, ecosystems, third-party components, and ROCm components for each ROCm release.

Release notes for previous ROCm releases are available in earlier versions of the documentation.
See the ROCm documentation release history.

Release highlights

The following are notable new features and improvements in ROCm 6.2.1. For changes to individual components, see Detailed component changes.

rocAL major version change

The new version of rocAL introduces many new features, but does not modify any of the existing public API functions. However, the version number was incremented from 1.3 to 2.0.
Applications linked to version 1.3 must be recompiled to link against version 2.0.

See the rocAL detailed changes for more information.

New support for FBGEMM (Facebook General Matrix Multiplication)

As of ROCm 6.2.1, ROCm supports Facebook General Matrix Multiplication (FBGEMM) and the related FBGEMM_GPU library.

FBGEMM is a low-precision, high-performance CPU kernel library for convolution and matrix multiplication. It is used for server-side inference and as a back end for PyTorch quantized operators. FBGEMM_GPU includes a collection of PyTorch GPU operator libraries for training and inference. For more information, see the ROCm Model acceleration libraries guide
and PyTorch's FBGEMM GitHub repository.

ROCm Offline Installer Creator changes

The ROCm Offline Installer Creator 6.2.1 introduces several new features and improvements including:

  • Logging support for create and install logs
  • More stringent checks for Linux versions and distributions
  • Updated prerequisite repositories
  • Fixed CTest issues

ROCm documentation changes

There have been no changes to supported hardware or operating systems from ROCm 6.2.0 to ROCm 6.2.1.

The ROCm documentation, like all ROCm projects, is open source and available on GitHub. To contribute to ROCm documentation, see the [ROCm documentation contribution guidelines](https://rocm.docs.amd.com/en/latest/contribute/contributing.html).

Operating system and hardware support changes

There are no changes to supported hardware or operating systems from ROCm 6.2.0 to ROCm 6.2.1.

See the Compatibility matrix for the full list of supported operating systems and hardware architectures.

ROCm components

The following table lists the versions of ROCm components for ROCm 6.2.1, including any version changes from 6.2.0 to 6.2.1.

Click the component's updated version to go to a detailed list of its changes. Click to go to the component's source code on GitHub.

Category Group Name Version
Libraries Machine learning and computer vision Composable Kernel 1.1.0
MIGraphX 2.10
MIOpen 3.2.0
MIVisionX 3.0.0
rocAL 1.0.0 ⇒ 2.0.0
rocDecode 0.6.0
rocPyDecode 0.1.0
RPP 1.8.0
Communication RCCL 2.20.5 ⇒ 2.20.5
Math hipBLAS 2.2.0
hipBLASLt 0.8.0
hipFFT 1.0.15
hipfort 0.4.0
hipRAND 2.11.0
hipSOLVER 2.2.0
hipSPARSE 3.1.1
hipSPARSELt 0.2.1
rocALUTION 3.2.0
rocBLAS 4.1.2 ⇒ 4.2.1
rocFFT 1.0.28 ⇒ 1.0.29
rocRAND 3.1.0
rocSOLVER 3.26.0
rocSPARSE 3.2.0
rocWMMA 1.5.0
Tensile 4.41.0
Read more

ROCm 6.2.0 Release

02 Aug 19:43
87bc26e
Compare
Choose a tag to compare

ROCm 6.2.0 release notes

The release notes provide a comprehensive summary of changes since the previous ROCm release.

  • Release highlights

  • Operating system and hardware support changes

  • ROCm components versioning

  • Detailed component changes

  • ROCm known issues

  • ROCm upcoming changes

The Compatibility matrix
provides an overview of operating system, hardware, ecosystem, and ROCm component support across ROCm releases.

Release notes for previous ROCm releases are available in earlier versions of the documentation.
See the ROCm documentation release history.

Release highlights

This section introduces notable new features and improvements in ROCm 6.2. See the
Detailed component changes for individual component changes.

New components

ROCm 6.2.0 introduces the following new components to the ROCm software stack.

  • Omniperf -- A kernel-level profiling tool for machine learning and high-performance computing (HPC) workloads
    running on AMD Instinct accelerators. Omniperf offers comprehensive profiling and advanced analysis via command line
    or a GUI dashboard. For more information, see
    Omniperf.

  • Omnitrace -- A multi-purpose analysis tool for profiling and tracing applications running on the CPU or the CPU and GPU.
    It supports dynamic binary instrumentation, call-stack sampling, causal profiling, and other features for determining
    which function and line number are executing. For more information, see
    Omnitrace.

  • rocPyDecode -- A tool to access rocDecode APIs in Python. It connects Python and C/C++ libraries,
    enabling function calling and data passing between the two languages. The rocpydecode.so library, a wrapper, uses
    rocDecode APIs written primarily in C/C++ within Python. For more information, see
    rocPyDecode.

  • ROCprofiler-SDK -- ROCprofiler-SDK is a profiling and tracing library for HIP and ROCm applications on AMD ROCm software
    used to identify application performance bottlenecks and optimize their performance. The new APIs add restrictions for more
    efficient implementations and improved thread safety. A new window restriction specifies the services the tool can use.
    ROCprofiler-SDK also provides a tool library to help you write your tool implementations. rocprofv3 uses this tool library
    to profile and trace applications for performance bottlenecks. Examples include API tracing, kernel tracing, and so on.
    For more information, see ROCprofiler-SDK.

    ROCprofiler-SDK for ROCm 6.2.0 is a beta release and subject to change.
    

ROCm Offline Installer Creator introduced

The new ROCm Offline Installer Creator creates an installation package for a preconfigured setup of ROCm, the AMDGPU
driver, or a combination of the two on a target system without network access. This new tool customizes
multiple unique configurations for use when installing ROCm on a target. Other notable features include:

  • A lightweight, easy-to-use user interface for configuring the creation of the installer

  • Support for multiple Linux distributions

  • Installer support for different ROCm releases and specific ROCm components

  • Optional driver or driver-only installer creation

  • Optional post-install preferences

  • Lightweight installer packages, which are unique to the preconfigured ROCm setup

  • Resolution and inclusion of dependency packages for offline installation

For more information, see
ROCm Offline Installer Creator.

Math libraries default to Clang instead of HIPCC

The default compiler used to build the math libraries on Linux changes from hipcc to amdclang++.
Appropriate compiler flags are added to ensure these compilations build correctly. This change only applies when
building the libraries. Applications using the libraries can continue to be compiled using hipcc or amdclang++ as
described in ROCm compiler reference.
The math libraries can also be built with hipcc using any of the previously available methods (for example, the CXX
environment variable, the CMAKE_CXX_COMPILER CMake variable, and so on). This change shouldn't affect performance or
functionality.

Framework and library changes

This section highlights updates to supported deep learning frameworks and notable third-party library optimizations.

Additional PyTorch and TensorFlow support

ROCm 6.2.0 supports PyTorch versions 2.2 and 2.3 and TensorFlow version 2.16.

See Installing PyTorch for ROCm
and Installing TensorFlow for ROCm
for installation instructions.

Refer to the
Third-party support matrix
for a comprehensive list of third-party frameworks and libraries supported by ROCm.

Optimized framework support for OpenXLA

PyTorch for ROCm and TensorFlow for ROCm now provide native support for OpenXLA. OpenXLA is an open-source ML compiler
ecosystem that enables developers to compile and optimize models from all leading ML frameworks. For more information, see
Installing PyTorch for ROCm
and Installing TensorFlow for ROCm.

PyTorch support for Autocast (automatic mixed precision)

PyTorch now supports Autocast for recurrent neural networks (RNNs) on ROCm. This can help to reduce computational
workloads and improve performance. Based on the information about the magnitude of values, Autocast can substitute the
original float32 linear layers and convolutions with their float16 or bfloat16 variants. For more information, see
Automatic mixed precision.

Memory savings for bitsandbytes model quantization

The ROCm-aware bitsandbytes library is a lightweight Python wrapper around HIP
custom functions, in particular 8-bit optimizer, matrix multiplication, and 8-bit and 4-bit quantization functions.
ROCm 6.2.0 introduces the following bitsandbytes changes:

  • Int8 matrix multiplication is enabled, and it includes the following functions:
    • extract-outliers – extracts rows and columns that have outliers in the inputs. They’re later used for matrix multiplication without quantization.
    • transform – row-to-column and column-to-row transformations are enabled, along with transpose operations. These are used before and after matmul computation.
    • igemmlt – new function for GEMM computation A*B^T. It uses
      hipblasLtMatMul and performs 8-bit GEMM operations.
    • dequant_mm – dequantizes output matrix to original data type using scaling factors from vector-wise quantization.
  • Blockwise quantization – input tensors are quantized for a fixed block size.
  • 4-bit quantization and dequantization functions – normalized Float4 quantization, quantile estimation, and quantile quantization functions are enabled.
  • 8-bit and 32-bit optimizers are enabled.
These functions are included in bitsandbytes. They are not part of ROCm. However, ROCm 6.2.0 has enabled the fixes and
features to run them.

For more information, see Model quantization techniques.

Improved vLLM support

ROCm 6.2.0 enhances vLLM support for inference on AMD Instinct accelerators, adding
capabilities for FP16/BF16 precision for LLMs, and FP8 support for Llama.
ROCm 6.2.0 adds support for the following vLLM features:

  • MP: Multi-GPU execution. Choose between MP and Ray using a flag. To set it to MP,
    use --distributed-executor-backed=mp. The default depends on the commit in flux.

  • FP8 KV cache: Enhances computational efficiency and performance by significantly reducing memory usage and bandwidth requirements.
    The QUARK quantizer currently only supports Llama.

  • Triton Flash Attention:

    ROCm supports both Triton and Composable Kernel Flash Attention 2 in vLLM. The default is Triton, but you can change this
    setting using the VLLM_USE_FLASH_ATTN_TRITON=False environment variable.

  • PyTorch TunableOp:

    Improved optimization and tuning of GEMMs. It requires Docker with PyTorch 2.3 or later.

For more information about enabling these features, see
vLLM inference.

ROCm has a vLLM branch for experimental features. This includes performance improvements, accuracy, and correctness testing.
These features include:

  • FP8 GEMMs: To improve the performance of FP8 quantization, work is underway on tuning the GEMM using the shapes used
    in the model's execution. It only supp...
Read more

ROCm 6.1.2 Release

04 Jun 22:13
17f12a1
Compare
Choose a tag to compare

ROCm 6.1.2 release notes

ROCm 6.1.2 includes enhancements to SMI tools and improvements to some libraries.

OS support

ROCm 6.1.2 has been tested against a pre-release version of Ubuntu 22.04.5 (kernel: 5.15 [GA], 6.8 [HWE]).

AMD SMI

AMD SMI for ROCm 6.1.2

Additions

  • Added process isolation and clean shader APIs and CLI commands.
    • amdsmi_get_gpu_process_isolation()
    • amdsmi_set_gpu_process_isolation()
    • amdsmi_set_gpu_clear_sram_data()
  • Added the MIN_POWER metric to output provided by amd-smi static --limit.

Optimizations

  • Updated the amd-smi monitor --pcie output to prevent delays with the monitor command.

Changes

  • Updated amismi_get_power_cap_info to return values in uW instead of W.
  • Updated Python library return types for amdsmi_get_gpu_memory_reserved_pages and amdsmi_get_gpu_bad_page_info.
  • Updated the output of amd-smi metric --ecc-blocks to show counters available from blocks.

Fixes

  • amdsmi_get_gpu_board_info() no longer returns junk character strings.
  • amd-smi metric --power now correctly details power output for RDNA3, RDNA2, and MI1x devices.
  • Fixed the amdsmitstReadWrite.TestPowerCapReadWrite test for RDNA3, RDNA2, and MI100 devices.
  • Fixed an issue with the amdsmi_get_gpu_memory_reserved_pages and amdsmi_get_gpu_bad_page_info Python interface calls.

Removals

  • Removed the amdsmi_get_gpu_process_info API from the Python library. It was removed from the C library in an earlier release.

See the AMD SMI detailed changelog with code samples for more information.

ROCm SMI

ROCm SMI for ROCm 6.1.2

Additions

  • Added the ring hang event to the amdsmi_evt_notification_type_t enum.

Fixes

  • Fixed an issue causing ROCm SMI to incorrectly report GPU utilization for RDNA3 GPUs. See the issue on GitHub.
  • Fixed the parsing of pp_od_clk_voltage in get_od_clk_volt_info to work better with MI-series hardware.

RCCL

RCCL 2.18.6 for ROCm 6.1.2

Changes

  • Reduced NCCL_TOPO_MAX_NODES to limit stack usage and avoid stack overflow.

rocBLAS

rocBLAS 4.1.2 for ROCm 6.1.2

Optimizations

  • Tuned BBS TN and TT operations on the CDNA3 architecture.

Fixes

  • Fixed an issue related to obtaining solutions for BF16 TT operations.

rocDecode

rocDecode 0.6.0 for ROCm 6.1.2

Additions

  • Added support for FFmpeg v5.x.

Optimizations

  • Updated error checking in the rocDecode-setup.py script.

Changes

  • Updated core dependencies.
  • Updated to support the use of public LibVA headers.

Fixes

  • Fixed some package dependencies.

Upcoming changes

  • A future release will enable the use of HIPCC compiled binaries hipcc.bin and hipconfig.bin by default. No action is needed by users; you may continue calling high-level Perl scripts hipcc and hipconfig. hipcc.bin and hipconfig.bin will be invoked by the high-level Perl scripts. To revert to the previous behavior and invoke hipcc.pl and hipconfig.pl, set the HIP_USE_PERL_SCRIPTS environment variable to 1.
  • A subsequent release will remove high-level HIPCC Perl scripts from hipcc and hipconfig. This release will remove the HIP_USE_PERL_SCRIPTS environment variable. It will rename hipcc.bin and hipconfig.bin to hipcc and hipconfig respectively. No action is needed by the users. To revert to the previous behavior, invoke hipcc.pl and hipconfig.pl explicitly.
  • A subsequent release will remove hipcc.pl and hipconfig.pl for HIPCC.

ROCm 6.1.1 Release

08 May 22:49
735b057
Compare
Choose a tag to compare

ROCm 6.1.1 release notes

ROCm™ 6.1.1 introduces minor fixes and improvements to some tools and libraries.

OS support

ROCm 6.1.1 has been tested against a pre-release version of Ubuntu 22.04.5 (kernel 6.8).

AMD SMI

AMD SMI for ROCm 6.1.1

Additions

  • Added deferred error correctable counts to amd-smi metric -ecc -ecc-blocks.

Changes

  • Updated the output of amd-smi metric --ecc-blocks to show counters available from blocks.
  • Updated the output of amd-smi metric --clock to reflect each engine.
  • Updated the output of amd-smi topology --json to align with output reported by host and guest systems.

Fixes

  • Fixed amd-smi metric --clock's clock lock and deep sleep status.
  • Fixed an issue that would cause an error when resetting non-AMD GPUs.
  • Fixed amd-smi metric --pcie and amdsmi_get_pcie_info() when using RDNA3 (Navi 32 and Navi 31) hardware to prevent "UNKNOWN" reports.
  • Fixed the output results of amd-smi process when getting processes running on a device.

Removals

  • Removed the amdsmi_get_gpu_process_info API from the Python library. It was removed from the C library in an earlier release.

Known issues

  • amd-smi bad-pages can result in a ValueError: Null pointer access error when using some PMU firmware versions.
See the [detailed changelog](/~https://github.com/ROCm/amdsmi/blob/docs/6.1.1/CHANGELOG.md) with code samples for more information.

HIPCC

HIPCC for ROCm 6.1.1

Changes

  • Upcoming: a future release will enable use of compiled binaries hipcc.bin and hipconfig.bin by default. No action is needed by users. You can continue calling high-level Perl scripts hipcc and hipconfig. hipcc.bin and hipconfig.bin will be invoked by the high-level Perl scripts. To revert to the previous behavior and invoke hipcc.pl and hipconfig.pl, set the HIP_USE_PERL_SCRIPTS environment variable to 1.
  • Upcoming: a subsequent release will remove high-level Perl scripts hipcc and hipconfig. This release will remove the HIP_USE_PERL_SCRIPTS environment variable. It will rename hipcc.bin and hipconfig.bin to hipcc and hipconfig respectively. No action is needed by the users. To revert to the previous behavior, invoke hipcc.pl and hipconfig.pl explicitly.
  • Upcoming: a subsequent release will remove hipcc.pl and hipconfig.pl.

HIPIFY

HIPIFY for ROCm 6.1.1

Additions

  • Added support for LLVM 18.1.2.
  • Added support for cuDNN 9.0.0.
  • Added a new option: --clang-resource-directory to specify the clang resource path (the path to the parent folder for the include folder that contains __clang_cuda_runtime_wrapper.h and other header files used during the hipification process).

ROCm SMI

ROCm SMI for ROCm 6.1.1

Known issues

  • ROCm SMI reports GPU utilization incorrectly for RDNA3 GPUs in some situations.

hipBLASLt

hipBLASLt 0.7.0 for ROCm 6.1.1

Additions

  • Added hipblasltExtSoftmax extension API.
  • Added hipblasltExtLayerNorm extension API.
  • Added hipblasltExtAMax extension API.
  • Added GemmTuning extension parameter to set split-k by user.
  • Added support for mixed precision datatype: fp16/fp8 in with fp16 outk.
Deprecations
  • Upcoming: algoGetHeuristic() ext API for GroupGemm will be deprecated in a future release of hipBLASLt.

hipSOLVER

hipSOLVER 2.1.1 for ROCm 6.1.1

Changes

  • By default, BUILD_WITH_SPARSE is now set to OFF on Microsoft Windows.

Fixes

  • Fixed benchmark client build when BUILD_WITH_SPARSE is OFF.

rocFFT

rocFFT 1.0.27 for ROCm 6.1.1

Additions

  • Enable multi-GPU testing on systems without direct GPU-interconnects.

Fixes

  • Fixed kernel launch failure on execute of very large odd-length real-complex transforms.

ROCm 6.1.0 Release

16 Apr 22:03
4970c5d
Compare
Choose a tag to compare

ROCm 6.1 release highlights

The ROCm™ 6.1 release consists of new features and fixes to improve the stability and
performance of AMD Instinct™ MI300 GPU applications. Notably, we've added:

  • Full support for Ubuntu 22.04.4.

  • rocDecode, a new ROCm component that provides high-performance video decode support for
    AMD GPUs. With rocDecode, you can decode compressed video streams while keeping the resulting
    YUV frames in video memory. With decoded frames in video memory, you can run video
    post-processing using ROCm HIP, avoiding unnecessary data copies via the PCIe bus.

    To learn more, refer to the rocDecode
    documentation.

OS and GPU support changes

ROCm 6.1 adds the following operating system support:

  • MI300A: Ubuntu 22.04.4 and RHEL 9.3
  • MI300X: Ubuntu 22.04.4

Future releases will add additional operating systems to match the general offering. For older
generations of supported AMD Instinct products, we’ve added Ubuntu 22.04.4 support.

To view the complete list of supported GPUs and operating systems, refer to the system requirements
page for
[Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html)
and
[Windows](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html).

Installation packages

This release includes a new set of packages for every module (all libraries and binaries default to
DT_RPATH). Package names have the suffix rpath; for example, the rpath variant of rocminfo is
rocminfo-rpath.

The new `rpath` packages will conflict with the default packages; they are meant to be used only in
environments where legacy `DT_RPATH` is the preferred form of linking (instead of `DT_RUNPATH`). We
do **not** recommend installing both sets of packages.

ROCm components

The following sections highlight select component-specific changes. For additional details, refer to the
Changelog.

AMD System Management Interface (SMI) Tool

  • New monitor command for GPU metrics.
    Use the monitor command to customize, capture, collect, and observe GPU metrics on
    target devices.

  • Integration with E-SMI.
    The EPYC™ System Management Interface In-band Library is a Linux C-library that provides in-band
    user space software APIs to monitor and control your CPU’s power, energy, performance, and other
    system management functionality. This integration enables access to CPU metrics and telemetry
    through the AMD SMI API and CLI tools.

Composable Kernel (CK)

  • New architecture support.
    CK now supports to the following architectures to enable efficient image denoising on the following
    AMD GPUs: gfx1030, gfx1100, gfx1031, gfx1101, gfx1032, gfx1102, gfx1034, gfx1103, gfx1035,
    gfx1036

  • FP8 rounding logic is replaced with stochastic rounding.
    Stochastic rounding mimics a more realistic data behavior and improves model convergence.

HIP

  • New environment variable to enable kernel run serialization.
    The default HIP_LAUNCH_BLOCKING value is 0 (disable); which causes kernels to run as defined in
    the queue. When set to 1 (enable), the HIP runtime serializes the kernel queue, which behaves the
    same as AMD_SERIALIZE_KERNEL.

hipBLASLt

  • New GemmTuning extension parameter GemmTuning allows you to set a split-k value for each solution, which is more feasible for
    performance tuning.

hipFFT

  • New multi-GPU support for single-process transforms Multiple GPUs can be used to perform a transform in a single process. Note that this initial
    implementation is a functional preview.

HIPIFY

  • Skipped code blocks: Code blocks that are skipped by the preprocessor are no longer hipified under the
    --default-preprocessor option. To hipify everything, despite conditional preprocessor directives
    (#if, #ifdef, #ifndef, #elif, or #else), don't use the --default-preprocessor or --amap options.

hipSPARSELt

  • Structured sparsity matrix support extensions
    Structured sparsity matrices help speed up deep-learning workloads. We now support B as the
    sparse matrix and A as the dense matrix in Sparse Matrix-Matrix Multiplication (SPMM). Prior to this
    release, we only supported sparse (matrix A) x dense (matrix B) matrix multiplication. Structured
    sparsity matrices help speed up deep learning workloads.

hipTensor

  • 4D tensor permutation and contraction support.
    You can now perform tensor permutation on 4D tensors and 4D contractions for F16, BF16, and
    Complex F32/F64 datatypes.

MIGraphX

  • Improved performance for transformer-based models.
    We added support for FlashAttention, which benefits models like BERT, GPT, and Stable Diffusion.

  • New Torch-MIGraphX driver.
    This driver calls MIGraphX directly from PyTorch. It provides an mgx_module object that you can
    invoke like any other Torch module, but which utilizes the MIGraphX inference engine internally.
    Torch-MIGraphX supports FP32, FP16, and INT8 datatypes.

    • FP8 support. We now offer functional support for inference in the FP8E4M3FNUZ datatype. You
      can load an ONNX model in FP8E4M3FNUZ using C++ or Python APIs, or migraphx-driver.
      You can quantize a floating point model to FP8 format by using the --fp8 flag with migraphx-driver.
      To accelerate inference, MIGraphX uses hardware acceleration on MI300 for FP8 by leveraging FP8
      support in various backend kernel libraries.

MIOpen

  • Improved performance for inference and convolutions.
    Inference support now provided for Find 2.0 fusion plans. Additionally, we've enhanced the Number of
    samples, Height, Width, and Channels (NHWC) convolution kernels for heuristics. NHWC stores data
    in a format where the height and width dimensions come first, followed by channels.

OpenMP

  • Implicit Zero-copy is triggered automatically in XNACK-enabled MI300A systems.
    Implicit Zero-copy behavior in non unified_shared_memory programs is triggered automatically in
    XNACK-enabled MI300A systems (for example, when using the HSA_XNACK=1 environment
    variable). OpenMP supports the 'requires unified_shared_memory' directive to support programs
    that don’t want to copy data explicitly between the CPU and GPU. However, this requires that you add
    these directives to every translation unit of the program.

  • New MI300 FP atomics. Application performance can now improve by leveraging fast floating-point atomics on MI300 (gfx942).

RCCL

  • NCCL 2.18.6 compatibility.
    RCCL is now compatible with NCCL 2.18.6, which includes increasing the maximum IB network interfaces to 32 and fixing network device ordering when creating communicators with only one GPU
    per node.

  • Doubled simultaneous communication channels.
    We improved MI300X performance by increasing the maximum number of simultaneous
    communication channels from 32 to 64.

rocALUTION

  • New multiple node and GPU support.
    Unsmoothed and smoothed aggregations and Ruge-Stueben AMG now work with multiple nodes
    and GPUs. For more information, refer to the
    API documentation.

rocDecode

  • New ROCm component.
    rocDecode ROCm's newest component, providing high-performance video decode support for AMD
    GPUs. To learn more, refer to the
    documentation.

ROCm Compiler

  • Combined projects. ROCm Device-Libs, ROCm Compiler Support, and hipCC are now located in
    the llvm-project/amd subdirectory of AMD's fork of the LLVM project. Previously, these projects
    were maintained in separate repositories. Note that the projects themselves will continue to be
    packaged separately.

  • Split the 'rocm-llvm' package. This package has been split into a required and an optional package:

    • rocm-llvm(required): A package containing the essential binaries needed for compilation.

    • rocm-llvm-dev(optional): A package containing binaries for compiler and application developers.

ROCm Data Center Tool (RDC)

  • C++ upgrades.
    RDC was upgraded from C++11 to C++17 to enable a more modern C++ standard when writing RDC plugins.

ROCm Performance Primitives (RPP)

  • New backend support.
    Audio processing support added for the HOST backend and 3D Voxel kernels support
    for the HOST and HIP backends.

ROCm Validation Suite

  • New datatype support.
    Added BF16 and FP8 datatypes based on General Matrix Multiply(GEMM) operations in the GPU Stress Test (GST) module. This provides additional performance benchmarking and stress testing based on the newly supported datatypes.

rocSOLVER

  • New EigenSolver routine.
    Based on the Jacobi algorithm, a new EigenSolver routine was added to the library. This routine computes the eigenvalues and eigenvectors of a matrix with improved performance.

ROCTracer

  • New versioning and callback enhancements.
    Improved to match versioning changes in HIP Runtime and supports runtime API callbacks and activity record logging. The APIs of different runtimes at different levels are considered different API domains with assigned domain IDs.

Upcoming changes

  • ROCm SMI will be deprecated in a future release. We advise migrating to AMD SMI now to
    prevent future workflow disruptions.

  • hipCC supports, by default, the following compiler invocation flags:

    • -mllvm -amdgpu-early-inline-all=true
    • -mllvm -amdgpu-function-calls=false

    ...

Read more

ROCm 6.0.2 Release

31 Jan 23:29
Compare
Choose a tag to compare

ROCm 6.0.2 is a point release with minor bug fixes to improve stability of MI300 GPU applications. This included fixes in the rocSPARSE library. Several new driver features are introduced for system qualification on our partner server offerings.

hipFFT

Changes

  • Removed the Git submodule for shared files between rocFFT and hipFFT; instead, just copy the files
    over (this should help simplify downstream builds and packaging)