FEAT: Support torchao (huggingface#2062)

Supports torch AO quantization. Currently supported: - int8_weight_only - int8_dynamic_activation_int8_weight --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
BenjaminBossan · Oct 22, 2024 · 98cf284 · 98cf284
1 parent 1d55d8b
commit 98cf284
Show file tree

Hide file tree

Showing 11 changed files with 1,642 additions and 4 deletions.
diff --git a/docker/peft-gpu/Dockerfile b/docker/peft-gpu/Dockerfile
@@ -62,6 +62,7 @@ RUN source activate peft && \
     librosa \
     "soundfile>=0.12.1" \
     scipy \
+    torchao \
     git+/~https://github.com/huggingface/transformers \
     git+/~https://github.com/huggingface/accelerate \
     peft[test]@git+/~https://github.com/huggingface/peft

diff --git a/docs/source/developer_guides/quantization.md b/docs/source/developer_guides/quantization.md
@@ -187,6 +187,30 @@ peft_config = LoraConfig(...)
 quantized_model = get_peft_model(quantized_model, peft_config)
 ```
 
+## torchao (PyTorch Architecture Optimization)
+
+PEFT supports models quantized with [torchao](/~https://github.com/pytorch/ao) ("ao") for int8 quantization.
+
+```python
+from peft import LoraConfig, get_peft_model
+from transformers import AutoModelForCausalLM, TorchAoConfig
+
+model_id = ...
+quantization_config = TorchAoConfig(quant_type="int8_weight_only")
+base_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)
+peft_config = LoraConfig(...)
+model = get_peft_model(base_model, peft_config)
+```
+
+### Caveats:
+
+- Use the most recent versions of torchao (>= v0.4.0) and transformers (> 4.42).
+- Only linear layers are currently supported.
+- `quant_type = "int4_weight_only"` is currently not supported.
+- `NF4` is not implemented in transformers as of yet and is thus also not supported.
+- DoRA only works with `quant_type = "int8_weight_only"` at the moment.
+- There is explicit support for torchao when used with LoRA. However, when torchao quantizes a layer, its class does not change, only the type of the underlying tensor. For this reason, PEFT methods other than LoRA will generally also work with torchao, even if not explicitly supported. Be aware, however, that **merging only works correctly with LoRA and with `quant_type = "int8_weight_only"`**. If you use a different PEFT method or dtype, merging will likely result in an error, and even it doesn't, the results will still be incorrect.
+
 ## Other Supported PEFT Methods
 
 Besides LoRA, the following PEFT methods also support quantization: