Speed test
| Quant Scheme |
Observer |
QuantizationModifier |
GPTQModifier |
| fp8_dynamic_per_token |
MinMax |
0.753–0.754 |
|
|
MSE |
0.759-0.760 |
|
| fp8_static_per_tensor |
MinMax |
0.757–0.758 |
|
|
MSE |
0.767-0.770 |
|
| int8_w8a8_dynamic_per_token |
MinMax |
0.760–0.761 |
0.769–0.771 |
|
MSE |
0.770–0.772 |
0.767-0.767 |
| w4a16_actorder_group |
MinMax |
|
0.726-0.726 |
|
MSE |
|
0.712-0.712 |
| w4a16_actorder_weights |
MinMax |
|
0.721-0.722 |
|
MSE |
|
0.717-0.720 |
| w4a16_grouped_quant |
MinMax |
0.666–0.671 |
0.717-718 |
|
MSE |
0.657–0.659 |
0.723-0.724 |
AWQ results
MinMax:
| Task |
Version |
Filter |
n-shot |
Metric |
Value |
| wikitext |
2 |
none |
5 |
bits_per_byte |
0.6291 |
|
|
|
5 |
byte_perplexity |
1.5466 |
|
|
|
5 |
word_perplexity |
10.2949 |
MSE:
| Task |
Version |
Filter |
n-shot |
Metric |
Value |
| wikitext |
2 |
none |
5 |
bits_per_byte |
0.6323 |
|
|
none |
5 |
byte_perplexity |
1.5500 |
|
|
none |
5 |
word_perplexity |
10.4192 |
MSE Observer(0.2 max shrink)
| Quant Scheme |
Observer |
QuantizationModifier |
GPTQModifier |
| fp8_dynamic_per_token |
MinMax |
0.753–0.754 |
|
|
MSE |
0.759-0.760 |
|
| fp8_static_per_tensor |
MinMax |
0.757–0.758 |
|
|
MSE |
0.770-0.770 |
|
| int8_w8a8_dynamic_per_token |
MinMax |
0.760–0.761 |
0.769–0.771 |
|
MSE |
0.764-0.767 |
|
| vl_fp8_dynamic_per_token |
MSE |
|
0.833 |
| vl_w4a16_actorder_weight |
MSE |
|
0.867 |
| w4a16_actorder_group |
MinMax |
|
0.726-0.726 |
|
MSE |
|
0.731-0.731 |
| w4a16_actorder_weights |
MinMax |
|
0.721-0.722 |
|
MSE |
|
0.724-0.726 |
| w4a16_grouped_quant |
MinMax |
0.666–0.671 |
0.717-718 |
|
MSE |
|
0.726-0.727 |
Time Sheets
meta-llama/Meta-Llama-3-8B-Instruct
MinMax:
| Step |
Time (seconds) |
| _load_model_and_processor |
5.772182941436768 |
| _calibrate |
251.95170068740845 |
| _run_oneshot |
252.93479776382446 |
| _save_compressed_model |
41.454792976379395 |
| _handle_recipe |
0.002226591110229492 |
| _run_lm_eval |
1196.4064140319824 |
MSE: