-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Implement 8-bit GPT-J #5
Labels
enhancement
New feature or request
Comments
CCLDArjun
pushed a commit
to CCLDArjun/ggml
that referenced
this issue
Dec 18, 2023
* use hipblas based on cublas * Update Makefile for the Cuda kernels * Expand arch list and make it overrideable * Fix multi GPU on multiple amd architectures with rocblas_initialize() (ggml-org#5) * add hipBLAS to README * new build arg LLAMA_CUDA_MMQ_Y * fix half2 decomposition * Add intrinsics polyfills for AMD * AMD assembly optimized __dp4a * Allow overriding CC_TURING * use "ROCm" instead of "CUDA" * ignore all build dirs * Add Dockerfiles * fix llama-bench * fix -nommq help for non CUDA/HIP --------- Co-authored-by: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com> Co-authored-by: funnbot <22226942+funnbot@users.noreply.github.com> Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com> Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> Co-authored-by: jammm <2500920+jammm@users.noreply.github.com> Co-authored-by: jdecourval <7315817+jdecourval@users.noreply.github.com>
PABannier
added a commit
to PABannier/ggml
that referenced
this issue
Oct 20, 2024
eddierichter-amd
pushed a commit
to eddierichter-amd/ggml
that referenced
this issue
Feb 17, 2025
* Identifying memory pools * Support for buffer type alignment and max size * Cache memory properties * Comments * Using fixed-width integrals * Buffer allocation support
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Results in ~11Gb weights vs. 16Gb, implemented in PyTorch now as load_in_8bit=True:
https://huggingface.co/hivemind/gpt-j-6B-8bit
The text was updated successfully, but these errors were encountered: