Releases · ggml-org/llama.cpp

28 Feb 14:35

70680c4

b4793 Latest

Latest

ggml : upgrade init_tensor API to return a ggml_status (#11854)

* Upgrade init_tensor API to return a ggml_status

To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.

* misc fixes

---------

Co-authored-by: slaren <slarengh@gmail.com>

Assets 25

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-02-28T14:35:46Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-02-28T14:35:53Z
llama-b4793-bin-macos-arm64.zip

23.3 MB 2025-02-28T14:36:04Z
llama-b4793-bin-macos-x64.zip

24.9 MB 2025-02-28T14:36:05Z
llama-b4793-bin-ubuntu-arm64.zip

25.4 MB 2025-02-28T14:36:06Z
llama-b4793-bin-ubuntu-vulkan-x64.zip

30.8 MB 2025-02-28T14:36:07Z
llama-b4793-bin-ubuntu-x64.zip

26.9 MB 2025-02-28T14:36:09Z
llama-b4793-bin-win-avx-x64.zip

16.4 MB 2025-02-28T14:36:10Z
llama-b4793-bin-win-avx2-x64.zip

16.4 MB 2025-02-28T14:36:11Z
llama-b4793-bin-win-avx512-x64.zip

16.4 MB 2025-02-28T14:36:12Z
Source code (zip)

2025-02-28T13:41:47Z
Source code (tar.gz)

2025-02-28T13:41:47Z

28 Feb 12:25

github-actions

b4792

c43a3e7

b4792

llama : add Phi-4-mini support (supersede #12099) (#12108)

* Added Phi-4-mini-instruct support

* Update regex per ngxson

* Change the vocab base to Xenova/gpt-4o

* fix conversion update script

* no need to check longrope

* minor style fix

* fix python style

---------

Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>

Assets 25

28 Feb 09:32

github-actions

b4790

438a839

b4790

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizatio…

Assets 25

28 Feb 09:00

github-actions

b4789

9c42b17

b4789

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (#12098)

Assets 25

28 Feb 08:36

github-actions

b4788

05e6f5a

b4788

ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (#12064)

* Added SVE Support for Q2_K Quantized Models

* Use 4-space indentation in the switch cases

* removed comments lines

* Remove the loop Retain the curly bracess for better understanding of code

* Remove the comment like added for q3_k_q8_k kernel

---------

Co-authored-by: vithulep <p.m.vithule1517@gmail.com>

Assets 25

28 Feb 08:17

github-actions

b4786

fbeda90

b4786

vulkan: matmul dequantization improvements (#12015)

* faster dequant for old quants

* dont use unpack for iq4_nl

* vec2 unpack for q8

Assets 25

28 Feb 07:57

github-actions

b4785

581650b

b4785

vulkan: improve im2col (#11826)

* vulkan: improve im2col performance

Assets 25

27 Feb 08:23

github-actions

b4784

b95c8af

b4784

cmake: Fix ggml backend dependencies and installation (#11818)

* Fix dependencies between ggml and backends

ggml backends link only to ggml-base and ggml links to all backends.

* Fix installation of ggml backends

Set up GNUInstallDirs before setting the installation directory of ggml backends

Assets 25

26 Feb 15:05

github-actions

b4783

a800ae4

b4783

llava : add struct for FFI bindgen (#12079)

* add struct for FFI bindgen

* Apply suggestions from code review

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

Assets 25

25 Feb 16:06

github-actions

b4778

a82c9e7

b4778

vulkan: fix assertion when qy_needs_dequant (#12068)

Looks like a copy/paste bug from qx_needs_dequant.

Assets 25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b4793

b4792

b4790

b4789

b4788

b4786

b4785

b4784

b4783

b4778