1.78bit DeepSeek-V3-0324 - 230GB Unsloth 动态 GGUF

我们最初提供的是Deepseek-v3-03241.58 位版本，您仍然可以使用，但其输出效果不是最好的。因此，我们发现有必要通过增加向下项目大小来升级到 1.78 位，以实现更好的性能。

为了确保准确度和尺寸之间的最佳平衡，我们不会量化所有层，而是选择性地将 MoE 层量化为较低位，并将注意力和其他层保留为 4 位或 6 位。这次我们还添加了 3.5 + 4.5 位动态量化。

阅读有关如何在 llama.cpp 上运行 GGUF 的指南：https://docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-v3-0324-locally

我们还发现，如果将所有层转换为 2 位（标准 2 位 GGUF），模型仍然非常糟糕，会产生无限循环、乱码和非常糟糕的代码。我们的动态 2.51 位量化在很大程度上解决了这个问题。1.78 位也是如此，但建议使用我们的 2.51 版本以获得最佳效果。

模型上传：

MoE 位类型磁盘大小高频链路1.78 位 (初步)IQ1_S151GB关联
1.93 位 (初步)IQ1_M178GB关联
2.42 位（初步）IQ2_XXS203GB关联
2.71 位（最佳）暂无数据231GB关联
3.5 位暂无数据321GB关联
4.5 位暂无说明，留下第一条！406GB关联

推荐设置：

温度为 0.3（可能为 0.0，如此处所示）
Min_P 为 0.00（可选，但 0.01 也可以，llama.cpp 默认值为 0.1）
聊天模板：<｜User｜>Create a simple playable Flappy Bird Game in Python. Place the final game inside of a markdown section.<｜Assistant｜>
在标记化过程中会自动添加BOS 令牌<｜begin▁of▁sentence｜>（请勿手动添加！）
DeepSeek也提到使用系统提示该助手为DeepSeek Chat，由深度求索公司创造。\n今天是3月24日，星期一。（可选）——它是中文的：翻译为：The assistant is DeepSeek Chat, created by DeepSeek.\nToday is Monday, March 24th.
对于 KV 缓存量化，使用 8 位，而不是 4 位 - 我们发现它的表现明显更差。

我建议人们现在运行 2.71bit - 其它位量化（列为初步）仍在处理中。

# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/DeepSeek-V3-0324-GGUF",
    local_dir = "unsloth/DeepSeek-V3-0324-GGUF",
    allow_patterns = ["*UD-Q2_K_XL*"], # Dynamic 2.7bit (230GB)
)

我做了 Flappy Bird 和 Heptagon 测试（https://www.reddit.com/r/LocalLLaMA/comments/1j7r47l/i_just_made_an_animation_of_a_ball_bouncing/）

网友：
1、Unsloth通常使用什么规格？
通常使用云PC，因为它们很便宜！您收到了什么错误？你必须使用llama.cpp来运行它。阅读我们的指南：https：//docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-v3-0324-localized

2、这在Mac Studio上的性能如何？
如果是 512GB 统一内存，我认为至少能获得 4 个 token/s

3、用于动态量化模型的代码？
https://github.com/unslothai/llama.cpp

1.78bit DeepSeek-V3-0324 - 230GB Unsloth 动态 GGUF

什么是Context上下文？

抽象两种方法：上下文与类型

Content与Context一字之差暗藏逆天极道

语境崩塌：你的注意力正被劫持

Context逻辑之道