因为本次我的项目计划使用 tensorflow,所以这篇文章主要想做一个引子,介绍如何在“星睿O6”上搭建 tensorflow 的开发环境和验证测试。本文主要分为几个部分:
- 在“星睿O6”上编译安装 tensorflow
- 基于 MNIST 数据集的模型训练和评估
tensorflow 源码编译安装
在编译 tensorflow 的时候,免不了缺少一些程序,需要 apt install
,为了加快 apt install 的速度,我选择了使用阿里云镜像。修改 /etc/apt/sources.list 为如下内容:
deb https://mirrors.aliyun.com/debian/ bookworm main non-free non-free-firmware contrib
deb-src https://mirrors.aliyun.com/debian/ bookworm main non-free non-free-firmware contrib
deb https://mirrors.aliyun.com/debian-security/ bookworm-security main
deb-src https://mirrors.aliyun.com/debian-security/ bookworm-security main
deb https://mirrors.aliyun.com/debian/ bookworm-updates main non-free non-free-firmware contrib
deb-src https://mirrors.aliyun.com/debian/ bookworm-updates main non-free non-free-firmware contrib
deb https://mirrors.aliyun.com/debian/ bookworm-backports main non-free non-free-firmware contrib
deb-src https://mirrors.aliyun.com/debian/ bookworm-backports main non-free non-free-firmware contrib
接下来就是 tensorflow 编译构建,编译过程主要参考 Build from source | TensorFlow,为了保持稳定和本着体验最新版本的效果,我选择了 r2.19 分支, 拉取仓库后切换分支,后面在加载构建的 wheel 包后可以检查版本是否一致,git checkout r2.19
。整个过程分为如下几部分:
- 安装 bazebl-sink deb 包,直接从 Releases · bazelbuild/bazelisk 下载 bazelisk-arm64.deb 包,在 “星睿”O6中
sudo dpkg -i bazelisk-arm64.deb
- 安装 clang 编译器
sudo apt install clang
,安装sudo apt install libhdf5-dev
在打包 wheel 的时候会用到 配置,这里我使用默认配置
-sh-5.2$./configure You have bazel 6.5.0 installed. Please specify the location of python. [Default is /usr/bin/python3]: Found possible Python library paths: /usr/lib/python3/dist-packages /usr/local/lib/python3.11/dist-packages Please input the desired Python library path to use. Default is [/usr/lib/python3/dist-packages] Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: No CUDA support will be enabled for TensorFlow. Do you want to use Clang to build TensorFlow? [Y/n]: Clang will be used to compile TensorFlow. Please specify the path to clang executable. [Default is /usr/bin/clang]: You have Clang 14.0.6 installed. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=mkl_aarch64 # Build with oneDNN and Compute Library for the Arm Architecture (ACL). --config=monolithic # Config for mostly static monolithic build. --config=numa # Build with NUMA support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. --config=v1 # Build with TensorFlow 1 API instead of TF 2 API. Preconfigured Bazel build configs to DISABLE default on features: --config=nogcp # Disable GCP support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished
- 编译过程中,主要解决 无法正常从 github 下载软件包的问题,这里我的解决方法是批量替换使用镜像地址,比如,修改
/home/radxa/.cache/bazel/_bazel_radxa/6b12cd9b265767cc77a16c7f64b094ec/external/rules_python/python/versions.bzl
的DEFAULT_RELEASE_BASE_URL = "https://github.moeyy.xyz/https://github.com/indygreg/python-build-standalone/releases/download"
,因为修改的较多,这里我就不一一列举了,其它的编译还是很顺利的。 编译 wheel 包命令:
bazel build //tensorflow/tools/pip_package:wheel --repo_env=USE_PYWRAP_RULES=1 --repo_env=WHEEL_NAME=tensorflow_cpu
,编译成功的截图:
!可以看到编译花费将近 3 个小时,不容易哦。
编译成功之后,可以看到生成的文件在如下目录
安装编译生成的 tensorflow_cpu 包简单测试下,是否正常:
python3 -m pip install tensorflow_cpu-2.19.0-cp311-cp311-linux_aarch64.whl --break-system-packages sh-5.2$ python3 Python 3.11.2 (main, Nov 30 2024, 21:22:50) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> print("TensorFlow version:", tf.__version__) TensorFlow version: 2.19.0 >>>
可以看到版本和我们切换的一致。
模型训练和评估
这一部分,主要记录下如何在 O6 上使用我们自己编译出来的 tensorflow 开始构建、训练和评估模型。这里主要根据 tensorflow 的教程 TensorFlow 2 quickstart for beginners | TensorFlow Core,开始加载逐步加载数据集、构建模型和训练评估模型。
加载数据集
因为网络原因,我手动下载了 mnist.npz 数据集,然后放在 ~/.keras/datasets/
目录。
构建模型和训练模型
这里主要涉及到定义模型、定义损失函数以及模型训练。
完整的测试代码如下:
#!/usr/bin/python3
import tensorflow as tf
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
print("TensorFlow version:", tf.__version__)
# save gray_array to file
def save_grayscale_array_to_image(gray_array, filename="grayscale_image.png"):
"""
Saves a 2D NumPy array (representing grayscale data) to an image file.
Args:
gray_array (numpy.ndarray): A 2D NumPy array where each element
represents the intensity of a pixel (0-255).
filename (str): The name of the file to save the image to.
Common formats are 'png', 'jpg', 'bmp', etc.
"""
try:
# Ensure the data is in the correct format (uint8)
if gray_array.dtype != np.uint8:
gray_array = gray_array.astype(np.uint8)
# Create a PIL Image object in grayscale ('L' mode)
img = Image.fromarray(gray_array, mode='L')
# Save the image to the specified filename
img.save(filename)
print(f"Grayscale image saved successfully as '{filename}'")
except Exception as e:
print(f"Error saving grayscale image: {e}")
def plt_predict(x,y,ya):
fig = plt.figure(figsize=(8, 6)) # Set the figure size to 8x6 inches
ax = fig.add_subplot(1, 1, 1)
ax.plot(x, y, label='predict', color='blue', linestyle='-') # Plot the first line
ax.plot(x, ya, label='predict_after', color='red', linestyle='--') # Plot the first line
ax.set_xlabel('X Axis') # Set the x-axis label
ax.set_ylabel('Y Axis') # Set the y-axis label
ax.set_title('Predict compare Waves') # Set the subplot title
ax.legend() # Display the legend
plt.savefig('predict.png', dpi=300)
mnist = tf.keras.datasets.mnist
x = np.linspace(0, 9, 10)
# print(x)
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# print(type(x_train), type(x_train[0]), x_train[0], x_train[:1])
# print( x_train[0].shape, x_train[:1].shape, x_train[0].dtype)
# save_grayscale_array_to_image(x_train[0], "train0.png")
x_train, x_test = x_train / 255.0, x_test / 255.0
# print(type(x_train), type(x_train[0]), x_train[0])
# print(type(y_train), type(y_train[0]), y_train[0])
# print(y_train.shape, y_train[0].shape, y_train[:1].shape, y_train[0].dtype)
# exit(-1)
# define the model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
# test model before train
predictions = model(x_train[:1]).numpy()
y = tf.nn.softmax(predictions).numpy().reshape([10,])
# print(type(y), type(yz), yz.shape)
print("predictions before", y)
# print("predictions before yz", yz)
# loss function
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# print(loss_fn(y_train[:1], predictions).numpy())
# construct model
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
# train model
model.fit(x_train, y_train, epochs=5)
# test model
predictions = model(x_train[:1]).numpy()
ya = tf.nn.softmax(predictions).numpy().reshape([10,])
# print("new predictions again", predictions)
print("predictions after", ya)
plt_predict(x,y,ya)
model.evaluate(x_test, y_test, verbose=2)
测试结果
在训练的过程中,我使用 btop 查看了对CPU和内存的使用情况,内存稳定在1GB附近,CPU占用在10%附近,根据打印整个训练过程在80s左右完成。
为了更加直观的看到模型训练前后的输出结果对比,根据预测的结果,我绘制了对比曲线图如下:
可以看到,模型训练之后,预测的结果为 5 的概率最大,我将原始数据存储为对应的灰度图,如下图所示:
可以看到是 5 的样式,从上面两个图可以看出模型训练之后相比训练之前有明显的准确度提升。
此外,想更加直观可视化的看下模型的层次图,我使用 tensorboard 进行可视化,首先是定义一个回调函数,在 fit 的时候调用,这部分改动如下:
# define callback func for tensorboard
tf_callback = tf.keras.callbacks.TensorBoard(log_dir="./logs")
# train model
model.fit(x_train, y_train, epochs=5, callbacks=[tf_callback], verbose=1)
然后重新训练,训练完成后,执行 tensorboard --logdir=logs
使用 tensorboard 加载数据,接着在浏览器打开localhost:6006
就可以看到层次结构。
和源码中定义的层结构可以对应起来:
# define the model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
至此简单演示了从 tensorflow 源码编译安装到基本模型构建、训练和测试对比的一个过程。期待下一篇文章更加复杂的模型部署过程。