使用 LlamaIndex 和 Ollama 在 AMD Radeon 显卡上构建 RAG 系统 - 极术社区

本文转载自“AMD开发者中心”，原文链接 https://zhuanlan.zhihu.com/p/...

AMD Radeon GPU 正式支持 ROCm，且满足与行业标准软件框架的兼容性。本 Jupyter notebook 利用 Ollama 和 LlamaIndex（ROCm 皆已支持）构建检索增强生成 (RAG) 应用程序。LlamaIndex 促进了从阅读 PDF 到索引数据集和构建查询引擎的通道创建，而 Ollama 则提供了大语言模型 (LLM) 推理的后端服务。

先决条件

本教程使用以下设置进行开发和测试：

硬件

AMD Radeon GPU：确保您使用的是 ROCm 支持的 AMD Radeon GPU。本教程在 AMD Radeon PRO W7900 上进行了测试。

软件

ROCm 6.2：按照 Radeon GPU 安装指南安装 ROCm （https://rocm.docs.amd.com/pro...）。
Python 3.8：确保已安装 Python 并可在您的环境中访问。

环境

安装和配置软件需要 root 或 sudo 访问权限。

安装和启动 Jupyter Notebooks

如果您的系统上尚未安装 Jupyter，请使用以下命令安装并启动 JupyterLab：

pip install jupyter

jupyter-lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root

注意：在运行上述命令之前，请确保您的系统上未占用端口 8888。如果已被占用，您可以通过将 \`--port=8888\` 替换为另一个端口号来指定不同的端口，例如 \`--port=8890\`。

命令执行后，终端输出会显示一个 URL 和 token。将此 URL 复制并粘贴到主机上的 Web 浏览器中以访问 JupyterLab。启动 JupyterLab 后，将本 notebook 上传到环境中并继续按照本教程中的步骤操作。

安装 Ollama

Ollama 为 AMD ROCm GPU 提供无缝支持，无需进一步配置即可实现优化性能。要在 Linux 上安装 Ollama，请使用以下命令：

!curl -fsSL https://ollama.com/install.sh | sh

启动 Ollama 并验证其运行状态：

!sudo systemctl start ollama

!sudo systemctl status ollama

注意：Ollama 安装指南可在这里找到< https://github.com/ollama/ollama a/blob/main/docs/linux.md\>。

下载模型

使用 Ollama 拉取 RAG 所需的模型：

重要提示：如果 Ollama 服务器作为前台进程运行，则必须在新实例中运行其余部分。

!ollama pull nomic-embed-text

!ollama pull llama3.1:8b

验证已下载的模型：

!ollama list llama3.1

NAME ID SIZE MODIFIED

llama3.1:8b 42182419e950 4.7 GB 2 months ago

这里有关更多详情，请参见 Ollama 文档 https:/（https://github.com/ollama/oll...）。

注意：可在[这里https://ollama.com/search ]找到其他可用的模型。

安装 PyTorch（可选）

PyTorch 是本教程的可选内容。本节使用 PyTorch 工具进行验证。

!pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2

https://验证安装包列表：

!pip list | grep torch

pytorch-triton-rocm 3.1.0

torch 2.5.1+rocm6.2

torchaudio 2.5.1+rocm6.2

torchvision 0.20.1+rocm6.2

验证 GPU 功能：

import os

import torch

# Query GPU

if torch.cuda.is_available():

device = torch.device("cuda") # a CUDA device object

print('Using GPU:', torch.cuda.get_device_name(0))

print('GPU properties:', torch.cuda.get_device_properties(0))

else:

device = torch.device("cpu")

print('Using CPU')

安装 LlamaIndex 及其依赖项

使用以下命令安装 LlamaIndex 及相关包：

!pip install llama-index llama-index-llms-ollama llama-index-embeddings-ollama llama-index-vector-stores-chroma chromadb

验证安装：

!pip list | grep llama-index

llama-index 0.12.10

llama-index-agent-openai 0.4.1

llama-index-cli 0.4.0

llama-index-core 0.12.10.post1

llama-index-embeddings-ollama 0.5.0

llama-index-embeddings-openai 0.3.1

llama-index-indices-managed-llama-cloud 0.6.3

llama-index-llms-ollama 0.5.0

llama-index-llms-openai 0.3.13

llama-index-multi-modal-llms-openai 0.4.2

llama-index-program-openai 0.3.1

llama-index-question-gen-openai 0.3.0

llama-index-readers-file 0.4.3

llama-index-readers-llama-parse 0.4.0

llama-index-readers-web 0.3.3

llama-index-vector-stores-chroma 0.4.1

构建 RAG 管道

本节说明如何配置和构建 RAG 管道。

设置索引和查询引擎

导入必要的库：

import chromadb

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings

from llama_index.core.node_parser import SentenceSplitter

from llama_index.core import StorageContext

from llama_index.vector_stores.chroma import ChromaVectorStore

from llama_index.embeddings.ollama import OllamaEmbedding

from llama_index.llms.ollama import Ollama

配置嵌入和 LLM 模型

LlamaIndex 实现了与 Ollama 服务的客户端接口交互。在本例中，请求来自 Ollama 的嵌入和 LLM 服务。

# Set embedding model

emb_fn="nomic-embed-text"

Settings.embed_model = OllamaEmbedding(model_name=emb_fn)

# Set ollama model

Settings.llm = Ollama(model="llama3.1:8b", request_timeout=120.0)

下载 RAG 数据

下载 PDF（例如 ROCm Radeon 文档）并将其保存到 ./data 目录中：

!mkdir ./data && cd ./data && wget --recursive --level=1 --content-disposition --accept=pdf -np -nH --cut-dirs=6

https://rocm.docs.amd.com/_/d... && cd ..

SimpleDirectoryReader 是最常用的数据连接器。提供输入目录或文件列表，它会根据文件扩展名选择最佳文件阅读器。

documents = SimpleDirectoryReader(input_dir="./data/").load_data()

# Check the content

print(documents[10])

使用 Chroma 创建向量数据集

Chroma DB （）（https://www.trychroma.com/）是一个数据库，用于存储和查询嵌入、文档和元数据，适用于与 LlamaIndex 集成的 LLM 应用程序。它可以通过提供的 PDF 文件创建向量数据集。

# Initialize client and save data

db = chromadb.PersistentClient(path="./chroma_db/rocm_db")

# create collection

chroma_collection = db.get_or_create_collection("rocm_db")

# assign chroma as the vector_store to the context

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Build vector index per-document

vector_index = VectorStoreIndex.from_documents(

documents,

storage_context=storage_context,

transformations=[SentenceSplitter(chunk_size=512, chunk_overlap=20)],

)

创建查询引擎

接下来，创建具有响应模式的查询引擎。根据您的具体需求选择响应模式。详细指南请参见 LlamaIndex 响应模式文档https://docs.llamaindex.ai/en...。

\# Query your data

query_engine = vector_index.as_query_engine(response_mode="refine", similarity_top_k=10)

自定义查询提示词

定义特定任务的提示词：

\# Updating Prompt for Q&A

from llama_index.core import PromptTemplate

template = (

"You are a car product expert who is very familiar with the car user manual and provides the guide to the end user.\\n"

"---------------------\\n"

"{context_str}\\n"

"---------------------\\n"

"Given the information from multiple sources and not prior knowledge\\n"

"answer the question according to the index dataset.\\n"

"if the question is not related to ROCm and Radeon GPU, just say it is not related to my knowledge base.\\n"

"if you don't know the answer, just say that I don't know.\\n"

"Answers need to be precise and concise.\\n"

"if the question is in Chinese, please translate Chinese to English in advance"

"Query: {query_str}\\n"

"Answer: "

)

qa_template = PromptTemplate(template)

query_engine.update_prompts(

{"response_synthesizer:text_qa_template": qa_template}

)

template = (

"The original query is as follows: {query_str}.\\n"

"We have provided an existing answer: {existing_answer}.\\n"

"We have the opportunity to refine the existing answer (only if needed) with some more context below.\\n"

"-------------\\n"

"{context_msg}\\n"

"-------------\\n"

"Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.\\n"

"if the question is 'who are you', just say I am an expert of AMD ROCm.\\n"

"Answers need to be precise and concise.\\n"

"Refined Answer: "

)

qa_template = PromptTemplate(template)

query_engine.update_prompts(

{"response_synthesizer:refine_template": qa_template}

)

问答示例

运行以下问答：

\# 问答 1：简要描述安装 ROCm 的步骤？

response = query_engine.query("Briefly describe the steps to install ROCm?")

print(response)

\# 问答 2：哪一章是关于安装 PyTorch ？

response = query_engine.query("Which chapter is about installing PyTorch?")

print(response)

\# 问答 3：如何验证 PyTorch 安装？

response = query_engine.query("How to verify a PyTorch installation?")

print(response)

\# 问答 4：ONNX 能够在 Radeon GPU 上运行吗？

response = query_engine.query("Could ONNX run on a Radeon GPU?")

print(response)

结论

本教程演示了如何使用 LlamaIndex 和 Ollama 在支持 ROCm 的 AMD Radeon GPU 上构建 RAG 系统。详细信息请参见各组件的文档。

阅读原文：https://rocm.docs.amd.com/pro...

扩展阅读：《企业存储技术》文章分类索引更新（微信公众号合集标签）

注：本文只代表作者个人观点，与任何组织机构无关，如有错误和不足之处欢迎在留言中批评指正。如果您想在这个公众号上分享自己的技术干货，也欢迎联系我：）

END

作者：Alex He
原文：企业存储技术

推荐阅读

欢迎关注企业存储技术极术专栏，欢迎添加极术小姐姐微信（id:aijishu20)加入技术交流群，请备注研究方向。