
What is quantisizing mean? : r/LocalLLaMA - Reddit
May 18, 2023 · Activation Quantization: Activation quantization is a bit more involved, as it requires modifying the model to include the quantization and dequantization of activations during inference. …
Quantization: How Much Quality is Lost? : r/LocalLLaMA - Reddit
Jul 19, 2023 · Yeehaw, y'all 🤠 I've been pondering a lot about quantization and its impact on large language models (LLMs). As you all may know, quantization techniques like 4-bit and 8-bit …
Does quantization harm results? : r/LocalLLaMA - Reddit
Jul 20, 2023 · The point is that quantization enables huge memory savings for apparently small losses in performance, so what seems to be the best approach (as shown by Dettmers) is the following: Use 4 …
Higher quantization vs larger models - how do they compare ... - Reddit
Dec 19, 2023 · Basically wondering how performance between a high quantization of a smaller model like a 13b q6 or q8 fares vs a lower quantization of a larger model like a 30b q2? does this change as …
How much does Quantization actually impact models? - Reddit
Nov 22, 2023 · 206 votes, 62 comments. So, it was bothering me a bit that the only metric people really had to understand the 'loss' of quantization objectively was…
A comparative look at (GGML) quantization and parameter size - Reddit
May 18, 2023 · Preamble/credits Based on: the llama.cpp repo README section on quantization. Looking at that, it's a little hard to assess the how different levels of quantization actually affect the …
The difference between quantization methods for the same bits
Jul 25, 2023 · Speed will be closely related to the model file size. Smaller model file, faster inference, usually lower accuracy. With the older quantisation method, 4_0 is 4.5 bits per weight and 4_1 is 5 …
Yet another state of the art in LLM quantization - Reddit
The 2-2.5 bit quantization allows running 70B models on an RTX 3090 or Mixtral -like models on 4060 with significantly lower accuracy loss - notably, better than QuIP# and 3-bit GPTQ. We provide an set …
Quantization vs model size : r/LocalLLaMA - Reddit
Jun 21, 2024 · Quantization is a crucial technique for reducing the size and computational requirements of large language models (LLMs) while maintaining their performance. Here is a comprehensive …
Question about LLM quantization : r/LocalLLaMA - Reddit
Jul 22, 2023 · There's a lot of 7B models to try. I already know which model I will use, however, there's a lot of quantization versions. Since my system has 8 GB of RAM, 4-bit is a biable option for me. …