About 53 results
Open links in new tab
  1. What is quantisizing mean? : r/LocalLLaMA - Reddit

    May 18, 2023 · Activation Quantization: Activation quantization is a bit more involved, as it requires modifying the model to include the quantization and dequantization of activations during inference. …

  2. Quantization: How Much Quality is Lost? : r/LocalLLaMA - Reddit

    Jul 19, 2023 · Yeehaw, y'all 🤠 I've been pondering a lot about quantization and its impact on large language models (LLMs). As you all may know, quantization techniques like 4-bit and 8-bit …

  3. Does quantization harm results? : r/LocalLLaMA - Reddit

    Jul 20, 2023 · The point is that quantization enables huge memory savings for apparently small losses in performance, so what seems to be the best approach (as shown by Dettmers) is the following: Use 4 …

  4. Higher quantization vs larger models - how do they compare ... - Reddit

    Dec 19, 2023 · Basically wondering how performance between a high quantization of a smaller model like a 13b q6 or q8 fares vs a lower quantization of a larger model like a 30b q2? does this change as …

  5. How much does Quantization actually impact models? - Reddit

    Nov 22, 2023 · 206 votes, 62 comments. So, it was bothering me a bit that the only metric people really had to understand the 'loss' of quantization objectively was…

  6. A comparative look at (GGML) quantization and parameter size - Reddit

    May 18, 2023 · Preamble/credits Based on: the llama.cpp repo README section on quantization. Looking at that, it's a little hard to assess the how different levels of quantization actually affect the …

  7. The difference between quantization methods for the same bits

    Jul 25, 2023 · Speed will be closely related to the model file size. Smaller model file, faster inference, usually lower accuracy. With the older quantisation method, 4_0 is 4.5 bits per weight and 4_1 is 5 …

  8. Yet another state of the art in LLM quantization - Reddit

    The 2-2.5 bit quantization allows running 70B models on an RTX 3090 or Mixtral -like models on 4060 with significantly lower accuracy loss - notably, better than QuIP# and 3-bit GPTQ. We provide an set …

  9. Quantization vs model size : r/LocalLLaMA - Reddit

    Jun 21, 2024 · Quantization is a crucial technique for reducing the size and computational requirements of large language models (LLMs) while maintaining their performance. Here is a comprehensive …

  10. Question about LLM quantization : r/LocalLLaMA - Reddit

    Jul 22, 2023 · There's a lot of 7B models to try. I already know which model I will use, however, there's a lot of quantization versions. Since my system has 8 GB of RAM, 4-bit is a biable option for me. …