遥遥领先！手把手带你用国产香橙派部署清华AI语言模型，比肩GPT，树莓派做得到吗？ - 极术社区

感谢 @顾子韵，Tass及其他朋友的帮助，缺少他们的帮助无法完成该教程。感兴趣的朋友私聊我或他进群一起学习。

省流中文版本

b站手把手教程，小伙伴们可以直接对着视频进行实践：

1.cd /root 来到root目录
2.apt update && apt upgrade -y && apt install cmake -y更新软件包，并安装cmake
3.export ALL_PROXY=socks5://<hostname>:<port> 设置代理（自行准备代理）
4.wget -e https_proxy=<hostname>:<port> https://github.com/conda-forge/miniforge/releases/download/23.3.1-1/Miniforge3-Linux-aarch64.sh 下载miniforge
5.sudo bash Miniforge3-Linux-aarch64.sh
6.安装miniconda步骤，不停按空格。（具体可以在网上搜索conda安装步骤，步骤差不太多）
7.source ~/.bashrc 激活一下conda的python环境
8.wget -e https_proxy=<hostname>:<port> https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.2/clang+llvm-17.0.2-aarch64-linux-gnu.tar.xz 下载llvm
9.sudo tar -xvf clang+llvm-17.0.2-aarch64-linux-gnu.tar.xz
10.git clone --recursive https://github.com/mlc-ai/relax.git tvm_unity && cd tvm_unity/
11.mkdir -p build && cd build
12.cp ../cmake/config.cmake .
13.使用vim在config.cmake文件中修改下面几项：
set(CMAKE_BUILD_TYPE RelWithDebInfo) #这一项在文件中没有，需要添加
set(USE_OPENCL ON) #这一项在文件中可以找到，需要修改
set(HIDE_PRIVATE_SYMBOLS ON) #这一项在文件中没有，需要添加
set(USE_LLVM /root/clang+llvm-17.0.2-aarch64-linux-gnu/bin/llvm-config) #这一项在文件中可以找到，需要修改
14.cmake ..
15.make -j8 开始编译tvm
16.cd ../python
17.pip3 install --user .
18.使用vim在/root/.bashrc文件最下面添加环境变量：export PATH="$PATH:/root/.local/bin"
19.source ~/.bashrc 激活环境变量
20.tvmc 测试tvm是否正常安装成功
21.git clone --recursive https://github.com/mlc-ai/mlc-llm.git && cd mlc-llm
22.pip3 install --user .
23.python3 -m mlc_llm.build –help
24.mkdir -p dist/models && cd dist/models
25.git lfs install && git clone https://huggingface.co/THUDM/chatglm2-6b-32k
26.vim chatglm2-6b-32k/config.json
27.添加这一项"vocab_size": 65024
28.cd ../..
接下来需要先按照https://llm.mlc.ai/docs/install/gpu.html#orange-pi-5-rk3588-based-sbc内的步骤安装OpenCL驱动，然后接着下面的步骤。
29.python3 -m mlc_llm.build --model chatglm2-6b-32k --target opencl --max-seq-len 32768 --quantization q8f16_1 该步骤需要在/root/mlc-llm目录下进行
30.curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh 安装rust
31.设置rust环境变量，即在/root/.bashrc文件最下面添加：export PATH="$PATH:/root/.cargo/bin"
32.mkdir -p build && cd build 该命令需要在/root/mlc-llm目录下运行
33.python3 ../cmake/gen_cmake_config.py
34.cmake .. && cmake --build . --parallel $(nproc) && cd ..
35.ls -l ./build/
36../build/mlc_chat_cli –help
37../build/mlc_chat_cli --model chatglm2-6b-32k-q8f16_1 --device opencl 这里的命令在/root/mlc-llm中运行，注意前面的"."是一个！

Prepare

RK3588 device (OrangePi 5 Plus 16GB, Radxa Rock 5B 16GB, Nanopc T6 16GB)
LLVM
TVM
OpenCL
MLC-LLM
Python 3.10 or higher (with pip)
Models you want to compile
Skills to access Github and Huggingface

You can follow this instruction to install OpenCL.

TVM

Install minimal pre-requisites.

sudo apt-get update
sudo apt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev

LLVM

wget https://github.com/llvm/llvm-project/releases/download/llvmorg-16.0.4/clang+llvm-16.0.4-aarch64-linux-gnu.tar.xz

tar xvf clang+llvm-16.0.4-aarch64-linux-gnu.tar.xz

TVM

There are 2 TVM. Please do not use the one from:
https://tvm.apache.org/ OR https://github.com/apache/tvm/
Because with this repository, we can't import tvm.relax in Python. You should download it from mlc-ai/relax.git

Download tvm from Github.

# clone from GitHub
git clone --recursive https://github.com/mlc-ai/relax.git tvm_unity && cd tvm_unity/
# create build directory
mkdir -p build && cd build
# generate build configuration
cp ../cmake/config.cmake .

Use vim or whatever you want to edit build/config.cmake, append these arguments to this file.

set(CMAKE_BUILD_TYPE RelWithDebInfo)
set(USE_OPENCL ON)
set(HIDE_PRIVATE_SYMBOLS ON)

# You must replace <LLVM_PATH> to your llvm location.
set(USE_LLVM <LLVM_PATH>/clang+llvm-16.0.4-aarch64-linux-gnu/bin/llvm-config)

Then compile it. It takes about 20 minutes.

cmake ..
make -j4

After all. Install tvm Python Package.

cd ../python
pip3 install --user .

If you want to move tvm to anywhere, you must reinstall this python package.

Verify Installation, you will see help message if you successfully install this package.

tvmc

MLC-LLM

Install Rust enviroment.

sudo apt-get update
sudo apt-get install -y rustc cargo

Return to top folder, and download mlc-llm from Github, then install the Python package.

# clone mlc-llm from GitHub
git clone --recursive https://github.com/mlc-ai/mlc-llm.git && cd mlc-llm
cd mlc-llm
pip3 install --user .

Verify Installation, you will see help message if you successfully install this package.

python3 -m mlc_llm.build --help

Compile Model

I use ChatGLM2-6B here for example. In mlc-llm folder, download the model.

mkdir -p dist/models && cd dist/models

# 11GB space used.
git lfs install
git clone https://huggingface.co/THUDM/chatglm2-6b

Add the vocab_size arguments to model config.json

vim chatglm2-6b/config.json
{
   ...,
  "vocab_size": 65024
}

Then compile it inmlc-llm folder.

cd ../..
python3 -m mlc_llm.build --model chatglm2-6b --target opencl --max-seq-len 8192 --quantization q4f16_1

After about 5 minutes, you will see dist/chatglm2-6b-q4f16_1 folder. In this folder, you will see 3 files:

chatglm2-6b-q4f16_1-opencl.so  mod_cache_before_build.pkl  params

chatglm2-6b-q4f16_1-opencl.so is the final product.

Attention

You can change quantization to different option such as: autogptq_llama_q4f16_0, autogptq_llama_q4f16_1, q0f16, q0f32, q3f16_0, q3f16_1, q4f16_0, q4f16_1, q4f16_2, q4f16_ft, q4f32_0, q4f32_1, q8f16_ft, q8f16_1.
q6f16_1 takes about 5GB memory, q8f16_1 takes about 8GB memory. Make sure your device have enough memory, 16GB memory is necessary in most cases.

Use model

Compile mlc_chat_cli

You should compile mlc_chat_cli command or mlc_chat Python package.

Return to mlc-llm folder.

# create build directory
mkdir -p build && cd build
# generate build configuration
python3 ../cmake/gen_cmake_config.py
# build `mlc_chat_cli`
cmake .. && cmake --build . --parallel $(nproc) && cd ..

Verify validation

# expected to see `mlc_chat_cli`, `libmlc_llm.so` and `libtvm_runtime.so`
ls -l ./build/
# expected to see help message
./build/mlc_chat_cli --help

Use mlc_chat_cli

We use mlc_chat_cli for this case:

./build/mlc_chat_cli --model <model_name> --device <device_name>

In the previous chapter, we got chatglm2-6b-q4f16_1-opencl.so, so you can just replace <model_name> with chatglm2-6b-q4f16_1, and <device_name> with opencl:

./build/mlc_chat_cli --model chatglm2-6b-q4f16_1 --device opencl

最终成品

References

https://tvm.apache.org/docs/i...

https://llm.mlc.ai/docs/compi...

https://blog.mlc.ai/2023/08/0...

https://zhuanlan.zhihu.com/p/...

https://llm.mlc.ai/docs/install

作者：A Chang
文章来源：知乎

推荐阅读

更多芯擎AI开发板干货请关注芯擎AI开发板专栏。欢迎添加极术小姐姐微信（id:aijishu20)加入技术交流群，请备注研究方向。