Quantization Model Compression

About 53 results

Open links in new tab

Any time

reddit.com
https://www.reddit.com › LocalLLaMA › comments
What is quantisizing mean? : r/LocalLLaMA - Reddit
May 18, 2023 · Activation Quantization: Activation quantization is a bit more involved, as it requires modifying the model to include the quantization and dequantization of activations during inference. …
reddit.com
https://www.reddit.com › LocalLLaMA › comments
Quantization: How Much Quality is Lost? : r/LocalLLaMA - Reddit
Jul 19, 2023 · Yeehaw, y'all 🤠 I've been pondering a lot about quantization and its impact on large language models (LLMs). As you all may know, quantization techniques like 4-bit and 8-bit …
reddit.com
https://www.reddit.com › LocalLLaMA › comments
Does quantization harm results? : r/LocalLLaMA - Reddit
Jul 20, 2023 · The point is that quantization enables huge memory savings for apparently small losses in performance, so what seems to be the best approach (as shown by Dettmers) is the following: Use 4 …
reddit.com
https://www.reddit.com › LocalLLaMA › comments
Higher quantization vs larger models - how do they compare ... - Reddit
Dec 19, 2023 · Basically wondering how performance between a high quantization of a smaller model like a 13b q6 or q8 fares vs a lower quantization of a larger model like a 30b q2? does this change as …
reddit.com
https://www.reddit.com › LocalLLaMA › comments
How much does Quantization actually impact models? - Reddit
Nov 22, 2023 · 206 votes, 62 comments. So, it was bothering me a bit that the only metric people really had to understand the 'loss' of quantization objectively was…
reddit.com
https://www.reddit.com › LocalLLaMA › comments
A comparative look at (GGML) quantization and parameter size - Reddit
May 18, 2023 · Preamble/credits Based on: the llama.cpp repo README section on quantization. Looking at that, it's a little hard to assess the how different levels of quantization actually affect the …
reddit.com
https://www.reddit.com › LocalLLaMA › comments
The difference between quantization methods for the same bits
Jul 25, 2023 · Speed will be closely related to the model file size. Smaller model file, faster inference, usually lower accuracy. With the older quantisation method, 4_0 is 4.5 bits per weight and 4_1 is 5 …
reddit.com
https://www.reddit.com › LocalLLaMA › comments
Yet another state of the art in LLM quantization - Reddit
The 2-2.5 bit quantization allows running 70B models on an RTX 3090 or Mixtral -like models on 4060 with significantly lower accuracy loss - notably, better than QuIP# and 3-bit GPTQ. We provide an set …
reddit.com
https://www.reddit.com › ... › quantization_vs_model_size
Quantization vs model size : r/LocalLLaMA - Reddit
Jun 21, 2024 · Quantization is a crucial technique for reducing the size and computational requirements of large language models (LLMs) while maintaining their performance. Here is a comprehensive …
reddit.com
https://www.reddit.com › LocalLLaMA › comments
Question about LLM quantization : r/LocalLLaMA - Reddit
Jul 22, 2023 · There's a lot of 7B models to try. I already know which model I will use, however, there's a lot of quantization versions. Since my system has 8 GB of RAM, 4-bit is a biable option for me. …
Pagination
- 1
- 2
- 3
- Next

What is quantisizing mean? : r/LocalLLaMA - Reddit

Quantization: How Much Quality is Lost? : r/LocalLLaMA - Reddit

Does quantization harm results? : r/LocalLLaMA - Reddit

Higher quantization vs larger models - how do they compare ... - Reddit

How much does Quantization actually impact models? - Reddit

A comparative look at (GGML) quantization and parameter size - Reddit

The difference between quantization methods for the same bits

Yet another state of the art in LLM quantization - Reddit

Quantization vs model size : r/LocalLLaMA - Reddit

Question about LLM quantization : r/LocalLLaMA - Reddit