llama.cpp

大模型推理框架 llama.cpp 是开发者 Georgi Gerganov 基于 Meta 的 LLaMA 模型手写的纯 C/C++ 版本：

Georgi Gerganov 是资深的开源社区开发者，曾为 OpenAI 的 Whisper 自动语音识别模型开发 whisper.cpp。

特性

纯 C/C++ 实现，无依赖
支持CPU推理, 当然也支持CUDA/OpenCL推理
支持混合 F16 / F32 精度
支持 4 位、5 位和 8 位量化

主题

《llama.cpp 源码阅读》
《llama.cpp 本地运行》

模型支持

LLaMA
Alpaca
GPT4All
中文 LLaMA / Alpaca
Vigogne (法语)
Vicuna
Koala
OpenBuddy (多语言)
Pygmalion 7B / Metharme 7B
WizardLM

绑定

Python: abetlen/llama-cpp-python
Go: go-skynet/go-llama.cpp
Node.js: hlhr202/llama-node
Ruby: yoshoku/llama_cpp.rb
C#/.NET: SciSharp/LLamaSharp

主题

Arch Linux 部署运行 llama.cpp

树莓派运行

RPi 4

在 4GB RAM Raspberry Pi 4 成功运行 LLaMA 7B。生成速度很慢 10 sec/token。

Ah, yes. A 3-bit implementation of 7B would fit fully in 4GB of RAM and lead to much greater speeds. This is the same issue as in #97.

3-bit support is a proposed enhancement in GPTQ Quantization (3-bit and 4-bit) #9. GPTQ 3-bit has been shown to have negligible output quality vs uncompressed 16-bit and may even provide better output quality than the current naive 4-bit implementation in llama.cpp while requiring 25% less RAM.

构建失败的配置：

@Ronsor something wrong with your environment.
I UNAME_S: Linux
I UNAME_P: unknown
I UNAME_M: aarch64

能构建成功的配置：

My build log on RPI starts with:
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: aarch64
I UNAME_M: aarch64

Raspberry Pi 4 4GB · Issue #58 · ggerganov/llama.cpp (github.com)

Rpi 3

Raspberry Pi 3 Model B Rev 1.2, 1GB RAM + swap (5.7GB on microSD), also works but very slowly

手机运行

1.2 tokens/s on a Samsung S22 Ultra running 4 threads.

The S22 obviously has a more powerful processor. But I do not think it is 12 times more powerful. It's likely you could get much faster speeds on the Pi.

I'd be willing to bet that the bottleneck is not the processor.

网络资源

【2023-11-25】如何看待llama.cpp?

ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++ (github.com)

继续折腾下 llama.cpp + Chinese-LLaMA-Alpaca (listera.top)

本地部署运行中文 LLaMA 模型 - 知乎 (zhihu.com)

本文作者：Maeiee

本文链接：llama.cpp

版权声明：如无特别声明，本文即为原创文章，版权归 Maeiee 所有，未经允许不得转载！

喜欢我文章的朋友请随缘打赏，鼓励我创作更多更好的作品！