Skip to content

Commit

Permalink
FEAT: Support torchao (huggingface#2062)
Browse files Browse the repository at this point in the history
Supports torch AO quantization. Currently supported:

- int8_weight_only
- int8_dynamic_activation_int8_weight

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
  • Loading branch information
BenjaminBossan and SunMarc committed Oct 22, 2024
1 parent 1d55d8b commit 98cf284
Show file tree
Hide file tree
Showing 11 changed files with 1,642 additions and 4 deletions.
1 change: 1 addition & 0 deletions docker/peft-gpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ RUN source activate peft && \
librosa \
"soundfile>=0.12.1" \
scipy \
torchao \
git+/~https://github.com/huggingface/transformers \
git+/~https://github.com/huggingface/accelerate \
peft[test]@git+/~https://github.com/huggingface/peft
Expand Down
24 changes: 24 additions & 0 deletions docs/source/developer_guides/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,30 @@ peft_config = LoraConfig(...)
quantized_model = get_peft_model(quantized_model, peft_config)
```

## torchao (PyTorch Architecture Optimization)

PEFT supports models quantized with [torchao](/~https://github.com/pytorch/ao) ("ao") for int8 quantization.

```python
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, TorchAoConfig

model_id = ...
quantization_config = TorchAoConfig(quant_type="int8_weight_only")
base_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)
peft_config = LoraConfig(...)
model = get_peft_model(base_model, peft_config)
```

### Caveats:

- Use the most recent versions of torchao (>= v0.4.0) and transformers (> 4.42).
- Only linear layers are currently supported.
- `quant_type = "int4_weight_only"` is currently not supported.
- `NF4` is not implemented in transformers as of yet and is thus also not supported.
- DoRA only works with `quant_type = "int8_weight_only"` at the moment.
- There is explicit support for torchao when used with LoRA. However, when torchao quantizes a layer, its class does not change, only the type of the underlying tensor. For this reason, PEFT methods other than LoRA will generally also work with torchao, even if not explicitly supported. Be aware, however, that **merging only works correctly with LoRA and with `quant_type = "int8_weight_only"`**. If you use a different PEFT method or dtype, merging will likely result in an error, and even it doesn't, the results will still be incorrect.

## Other Supported PEFT Methods

Besides LoRA, the following PEFT methods also support quantization:
Expand Down
Loading

0 comments on commit 98cf284

Please sign in to comment.