AX650N加速卡

AX650N 芯片的 M.2 2280 接口计算卡。

处理器	Octa-corre Cortex-A55@1.7GHz
内存	8GB，64bit LPDDR4x
存储	16MB
峰值算力	18TOPS@INT8。支持 CNN、Transformer 模型部署，支持 LLM、VLM 部署
视频编码	H.264/H.265，16路 1080P@30fps 编码
视频解码	H.264/H.265，32路 1080P@30fps 解码
PCIE	PCIE GEN2 X4
主控CPU支持	支持 Intel、AMD、NXP、Xilinx、Raspberry Pi、Rockchip 等
Host 系统	Ubuntu、Debian、CentOS、麒麟
外形尺寸	M.2 2280，M Key
工作电压	3.3 V
整体系统功耗	＜8 w
算法适配	支持图像分类，目标检测，自然语言处理，语音网络以及常规大模型

开发工具

pulsar工具链

card-doc-yunji1.read...

axcl-docs.readthedoc...

huggingface.co/AXERA...

github.com/AXERA-TEC...

模型转换

以系统版本为 Ubuntu 20.04、工具链 ax_pulsar2_${version}.tar.gz 为例说明 Pulsar2 工具链的安装方法.

导入docker镜像

$ sudo docker load -i ax_pulsar2_${version}.tar.gz
Loaded image: pulsar2:${version}

导入后可以查看镜像

docker image ls
REPOSITORY   TAG          IMAGE ID       CREATED         SIZE
pulsar2      ${version}   xxxxxxxxxxxx   9 seconds ago   3.27GB

启动镜像

docker run -it --net host --rm -v $PWD:/data pulsar2:${version}

进入镜像后，可以查看版本信息是否正确

pulsar2 version
version: ${version}
commit: xxxxxxxx

Pulsar2 工具链中的功能指令以 pulsar2 开头, 与用户强相关的命令为 pulsar2 build , pulsar2 run 以及 pulsar2 version.

pulsar2 build 用于将 onnx 模型转换为 axmodel 格式模型
pulsar2 run 用于模型转换后的仿真运行
pulsar2 version 可以用于查看当前工具链的版本信息, 通常在反馈问题时需要提供此信息

模型转换命令

pulsar2 build  --target_hardware AX650 --input model/mobilenetv2-sim.onnx --output_dir output --config config/mobilenet_v2_build_config.json

模型编译输出文件说明

root@xxx:/data# tree output/
output/
├── build_context.json
├── compiled.axmodel            # 最终板上运行模型，AxModel
├── compiler                    # 编译器后端中间结果及 debug 信息
├── frontend                    # 前端图优化中间结果及 debug 信息
│   └── optimized.onnx          # 输入模型经过图优化以后的浮点 ONNX 模型
└── quant                       # 量化工具输出及 debug 信息目录
    ├── dataset                 # 解压后的校准集数据目录
    │   └── input
    │       ├── ILSVRC2012_val_00000001.JPEG
    │       ├── ......
    │       └── ILSVRC2012_val_00000032.JPEG
    ├── debug
    ├── quant_axmodel.json      # 量化配置信息
    └── quant_axmodel.onnx      # 量化后的模型，QuantAxModel

compiled.axmodel 为最终编译生成的板上可运行的 .axmodel 模型文件

部署模型

为了更方便、更准确的进行 LLM DEMO 展示，我们采用 transformers 内置的 tokenizer 解析服务，因此需要安装 python 环境和 transformers 库
安装 miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh
chmod a+x Miniconda3-latest-Linux-aarch64.sh
./Miniconda3-latest-Linux-aarch64.sh

启用 python 环境

conda create --name axcl python=3.9
conda activate axcl

安装 transformers

pip install transformers==4.41.1 -i https://mirrors.aliyun.com/pypi/simple

启动 tokenizer 解析器
运行 tokenizer 服务，Host ip 默认为 localhost，端口号设置为 12345，正在运行后信息如下

(axcl) axera@raspberrypi:~/qwen2.5-0.5b-prefill-ax650 $ python qwen2.5_tokenizer.py --port 12345
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
None None 151645 <|im_end|>
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
hello world<|im_end|>
<|im_start|>assistant

[151644, 8948, 198, 2610, 525, 1207, 16948, 11, 3465, 553, 54364, 14817, 13, 1446, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 14990, 1879, 151645, 198, 151644, 77091, 198]
http://localhost:12345

运行Qwen2

./run_qwen2_0.5B_prefill_pcie.sh

作者：SteveChen 创建时间：2025-05-27 00:58
最后编辑：SteveChen 更新时间：2025-06-14 01:15

上一篇： recovery更新
下一篇： NPU功能