AI学习者 · 8月9日

遥遥领先!手把手带你用国产香橙派部署清华AI语言模型,比肩GPT,树莓派做得到吗?

感谢 @顾子韵 ,Tass及其他朋友的帮助,缺少他们的帮助无法完成该教程。感兴趣的朋友私聊我或他进群一起学习。

省流中文版本

b站手把手教程,小伙伴们可以直接对着视频进行实践:

1.cd /root 来到root目录
2.apt update && apt upgrade -y && apt install cmake -y更新软件包,并安装cmake
3.export ALL_PROXY=socks5://<hostname>:<port> 设置代理(自行准备代理)
4.wget -e https_proxy=<hostname>:<port> https://github.com/conda-forge/miniforge/releases/download/23.3.1-1/Miniforge3-Linux-aarch64.sh 下载miniforge
5.sudo bash Miniforge3-Linux-aarch64.sh
6.安装miniconda步骤,不停按空格。(具体可以在网上搜索conda安装步骤,步骤差不太多)
7.source ~/.bashrc 激活一下conda的python环境
8.wget -e https_proxy=<hostname>:<port> https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.2/clang+llvm-17.0.2-aarch64-linux-gnu.tar.xz 下载llvm
9.sudo tar -xvf clang+llvm-17.0.2-aarch64-linux-gnu.tar.xz
10.git clone --recursive https://github.com/mlc-ai/relax.git tvm_unity && cd tvm_unity/
11.mkdir -p build && cd build
12.cp ../cmake/config.cmake .
13.使用vim在config.cmake文件中修改下面几项:
set(CMAKE_BUILD_TYPE RelWithDebInfo) #这一项在文件中没有,需要添加
set(USE_OPENCL ON) #这一项在文件中可以找到,需要修改
set(HIDE_PRIVATE_SYMBOLS ON) #这一项在文件中没有,需要添加
set(USE_LLVM /root/clang+llvm-17.0.2-aarch64-linux-gnu/bin/llvm-config) #这一项在文件中可以找到,需要修改
14.cmake ..
15.make -j8 开始编译tvm
16.cd ../python
17.pip3 install --user .
18.使用vim在/root/.bashrc文件最下面添加环境变量:export PATH="$PATH:/root/.local/bin"
19.source ~/.bashrc 激活环境变量
20.tvmc 测试tvm是否正常安装成功
21.git clone --recursive https://github.com/mlc-ai/mlc-llm.git && cd mlc-llm
22.pip3 install --user .
23.python3 -m mlc_llm.build –help
24.mkdir -p dist/models && cd dist/models
25.git lfs install && git clone https://huggingface.co/THUDM/chatglm2-6b-32k
26.vim chatglm2-6b-32k/config.json
27.添加这一项"vocab_size": 65024
28.cd ../..
接下来需要先按照https://llm.mlc.ai/docs/install/gpu.html#orange-pi-5-rk3588-based-sbc内的步骤安装OpenCL驱动,然后接着下面的步骤。
29.python3 -m mlc_llm.build --model chatglm2-6b-32k --target opencl --max-seq-len 32768 --quantization q8f16_1 该步骤需要在/root/mlc-llm目录下进行
30.curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh 安装rust
31.设置rust环境变量,即在/root/.bashrc文件最下面添加:export PATH="$PATH:/root/.cargo/bin"
32.mkdir -p build && cd build 该命令需要在/root/mlc-llm目录下运行
33.python3 ../cmake/gen_cmake_config.py
34.cmake .. && cmake --build . --parallel $(nproc) && cd ..
35.ls -l ./build/
36../build/mlc_chat_cli –help
37../build/mlc_chat_cli --model chatglm2-6b-32k-q8f16_1 --device opencl 这里的命令在/root/mlc-llm中运行,注意前面的"."是一个!

Prepare

  1. RK3588 device (OrangePi 5 Plus 16GB, Radxa Rock 5B 16GB, Nanopc T6 16GB)
  2. LLVM
  3. TVM
  4. OpenCL
  5. MLC-LLM
  6. Python 3.10 or higher (with pip)
  7. Models you want to compile
  8. Skills to access Github and Huggingface

You can follow this instruction to install OpenCL.

TVM

Install minimal pre-requisites.

sudo apt-get update
sudo apt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev

LLVM

wget https://github.com/llvm/llvm-project/releases/download/llvmorg-16.0.4/clang+llvm-16.0.4-aarch64-linux-gnu.tar.xz

tar xvf clang+llvm-16.0.4-aarch64-linux-gnu.tar.xz

TVM

There are 2 TVM. Please do not use the one from:
https://tvm.apache.org/ OR https://github.com/apache/tvm/
Because with this repository, we can't import tvm.relax in Python. You should download it from mlc-ai/relax.git

Download tvm from Github.

# clone from GitHub
git clone --recursive https://github.com/mlc-ai/relax.git tvm_unity && cd tvm_unity/
# create build directory
mkdir -p build && cd build
# generate build configuration
cp ../cmake/config.cmake .

Use vim or whatever you want to edit build/config.cmake, append these arguments to this file.

set(CMAKE_BUILD_TYPE RelWithDebInfo)
set(USE_OPENCL ON)
set(HIDE_PRIVATE_SYMBOLS ON)

# You must replace <LLVM_PATH> to your llvm location.
set(USE_LLVM <LLVM_PATH>/clang+llvm-16.0.4-aarch64-linux-gnu/bin/llvm-config)

Then compile it. It takes about 20 minutes.

cmake ..
make -j4

After all. Install tvm Python Package.

cd ../python
pip3 install --user .

If you want to move tvm to anywhere, you must reinstall this python package.

Verify Installation, you will see help message if you successfully install this package.

tvmc

MLC-LLM

Install Rust enviroment.

sudo apt-get update
sudo apt-get install -y rustc cargo

Return to top folder, and download mlc-llm from Github, then install the Python package.

# clone mlc-llm from GitHub
git clone --recursive https://github.com/mlc-ai/mlc-llm.git && cd mlc-llm
cd mlc-llm
pip3 install --user .

Verify Installation, you will see help message if you successfully install this package.

python3 -m mlc_llm.build --help

Compile Model

I use ChatGLM2-6B here for example. In mlc-llm folder, download the model.

mkdir -p dist/models && cd dist/models

# 11GB space used.
git lfs install
git clone https://huggingface.co/THUDM/chatglm2-6b

Add the vocab_size arguments to model config.json

vim chatglm2-6b/config.json
{
   ...,
  "vocab_size": 65024
}

Then compile it inmlc-llm folder.

cd ../..
python3 -m mlc_llm.build --model chatglm2-6b --target opencl --max-seq-len 8192 --quantization q4f16_1

After about 5 minutes, you will see dist/chatglm2-6b-q4f16_1 folder. In this folder, you will see 3 files:

chatglm2-6b-q4f16_1-opencl.so  mod_cache_before_build.pkl  params

chatglm2-6b-q4f16_1-opencl.so is the final product.

Attention

  1. You can change quantization to different option such as: autogptq_llama_q4f16_0, autogptq_llama_q4f16_1, q0f16, q0f32, q3f16_0, q3f16_1, q4f16_0, q4f16_1, q4f16_2, q4f16_ft, q4f32_0, q4f32_1, q8f16_ft, q8f16_1.
  2. q6f16_1 takes about 5GB memory, q8f16_1 takes about 8GB memory. Make sure your device have enough memory, 16GB memory is necessary in most cases.

Use model

Compile mlc_chat_cli

You should compile mlc_chat_cli command or mlc_chat Python package.

Return to mlc-llm folder.

# create build directory
mkdir -p build && cd build
# generate build configuration
python3 ../cmake/gen_cmake_config.py
# build `mlc_chat_cli`
cmake .. && cmake --build . --parallel $(nproc) && cd ..

Verify validation

# expected to see `mlc_chat_cli`, `libmlc_llm.so` and `libtvm_runtime.so`
ls -l ./build/
# expected to see help message
./build/mlc_chat_cli --help

Use mlc_chat_cli

We use mlc_chat_cli for this case:

./build/mlc_chat_cli --model <model_name> --device <device_name>

In the previous chapter, we got chatglm2-6b-q4f16_1-opencl.so, so you can just replace <model_name> with chatglm2-6b-q4f16_1, and <device_name> with opencl:

./build/mlc_chat_cli --model chatglm2-6b-q4f16_1 --device opencl

image.png
最终成品

References

https://tvm.apache.org/docs/i...

https://llm.mlc.ai/docs/compi...

https://blog.mlc.ai/2023/08/0...

https://zhuanlan.zhihu.com/p/...

https://llm.mlc.ai/docs/install

作者:A Chang
文章来源:知乎

推荐阅读

更多芯擎AI开发板干货请关注芯擎AI开发板专栏。欢迎添加极术小姐姐微信(id:aijishu20)加入技术交流群,请备注研究方向。

推荐阅读
关注数
7917
内容数
55
搭载基于安谋科技自研“周易”NPU的芯擎科技工业级“龍鹰一号”SE1000-I处理器
目录
极术微信服务号
关注极术微信号
实时接收点赞提醒和评论通知
安谋科技学堂公众号
关注安谋科技学堂
实时获取安谋科技及 Arm 教学资源
安谋科技招聘公众号
关注安谋科技招聘
实时获取安谋科技中国职位信息