Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4793
ggml : upgrade init_tensor API to return a ggml_status (#11854) * Upgrade init_tensor API to return a ggml_status To prepare for an 'abort-free' ggml (ggml not to abort on OOMs but return a OOM status), as agreeed with Diego in the ggml repo, upgrade the init_tensor() and view_init() APIs to return a ggml_status. * misc fixes --------- Co-authored-by: slaren <slarengh@gmail.com>
b4792
llama : add Phi-4-mini support (supersede #12099) (#12108) * Added Phi-4-mini-instruct support * Update regex per ngxson * Change the vocab base to Xenova/gpt-4o * fix conversion update script * no need to check longrope * minor style fix * fix python style --------- Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>
b4790
vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizatio…
b4789
CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (#12098)
b4788
ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (#12064) * Added SVE Support for Q2_K Quantized Models * Use 4-space indentation in the switch cases * removed comments lines * Remove the loop Retain the curly bracess for better understanding of code * Remove the comment like added for q3_k_q8_k kernel --------- Co-authored-by: vithulep <p.m.vithule1517@gmail.com>
b4786
vulkan: matmul dequantization improvements (#12015) * faster dequant for old quants * dont use unpack for iq4_nl * vec2 unpack for q8
b4785
vulkan: improve im2col (#11826) * vulkan: improve im2col performance
b4784
cmake: Fix ggml backend dependencies and installation (#11818) * Fix dependencies between ggml and backends ggml backends link only to ggml-base and ggml links to all backends. * Fix installation of ggml backends Set up GNUInstallDirs before setting the installation directory of ggml backends
b4783
llava : add struct for FFI bindgen (#12079) * add struct for FFI bindgen * Apply suggestions from code review --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
b4778
vulkan: fix assertion when qy_needs_dequant (#12068) Looks like a copy/paste bug from qx_needs_dequant.