麒麟系统SP2 与昇腾300I芯片测试qwen7B模型记录

2023-12-20 12:44

本文主要是介绍麒麟系统SP2 与昇腾300I芯片测试qwen7B模型记录,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

1. 查看系统版本

uname -a

Linux localhost.localdomain 4.19.90-24.4.v2101.ky10.aarch64 #1 SMP Mon May 24 14:45:37 CST 2021 aarch64 aarch64 aarch64 GNU/Linux

2. 查看显卡

npu-smi info

前情提要:

官网给出支持昇腾910架构,刚好有300I资源,测试一下,给大家提供参考~~菜鸟一枚还需向各位大佬学习

https://github.com/QwenLM/Qwen/tree/5aa84bdfd3237b37f01bc88cd49b3279b9a71d0b/ascend-supporticon-default.png?t=N7T8https://github.com/QwenLM/Qwen/tree/5aa84bdfd3237b37f01bc88cd49b3279b9a71d0b/ascend-support主要测试参考该方法,暂时不做深入研究。 

暂时了解 该系统可以做简单的算法模型,主要是架构不同,需要重新写算法,可以安装pytorch、tensorflow和mindformers等。

查看具体参数:

uname -m && cat /etc/*release

 

aarch64
Kylin Linux Advanced Server release V10 (Sword)
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Sword)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Sword)"
ANSI_COLOR="0;31"Kylin Linux Advanced Server release V10 (Sword)

3. 配置docker,有两种配置方法,一种在官网下载,一种直接用命令yum 安装即可

4. 安装minconda ,注意安装arrch64版本即可

5.按照教程配置,这里不做详细介绍了,直接给出记录

6.没有使用教程启动docker的命令,使用以下命令。

sudo docker run -it --rm -u root --network=host --ipc=host --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2  --device=/dev/davinci3 --device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 --name=6bff46b104b8 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /etc/ascend_install.info:/etc/ascend_install.info  -v /root/qwen/Qwen-7B-Chat:/data/qwen/models/Qwen-7B-Chat -v /var/log/npu/:/usr/slog  qwenllm/qwen-mindspore /bin/bash

成功启动docker。

7.转换模型

python3 /data/qwen/mindformers/research/qwen/convert_weight.py

/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.return self._float_to_str(self.smallest_subnormal)
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.return self._float_to_str(self.smallest_subnormal)
Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。
Flash attention will be disabled because it does NOT support fp32.
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|??????????????????????????????????????????????????????????????????????????????| 8/8 [00:03<00:00,  2.35it/s]
Parameter (name=transformer.wte.weight, shape=torch.Size([151936, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.wte.weight->transformer.wte.embedding_weight
Parameter (name=transformer.h.0.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.ln_1.weight->transformer.layers.0.attention_norm.weight
Parameter (name=transformer.h.0.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.attn.c_attn.weight->transformer.layers.0.attn.c_attn.weight
Parameter (name=transformer.h.0.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.attn.c_attn.bias->transformer.layers.0.attn.c_attn.bias
Parameter (name=transformer.h.0.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.attn.c_proj.weight->transformer.layers.0.attention.wo.weight
Parameter (name=transformer.h.0.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.ln_2.weight->transformer.layers.0.ffn_norm.weight
Parameter (name=transformer.h.0.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.mlp.w1.weight->transformer.layers.0.feed_forward.w1.weight
Parameter (name=transformer.h.0.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.mlp.w2.weight->transformer.layers.0.feed_forward.w3.weight
Parameter (name=transformer.h.0.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.mlp.c_proj.weight->transformer.layers.0.feed_forward.w2.weight
Parameter (name=transformer.h.1.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.ln_1.weight->transformer.layers.1.attention_norm.weight
Parameter (name=transformer.h.1.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.attn.c_attn.weight->transformer.layers.1.attn.c_attn.weight
Parameter (name=transformer.h.1.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.attn.c_attn.bias->transformer.layers.1.attn.c_attn.bias
Parameter (name=transformer.h.1.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.attn.c_proj.weight->transformer.layers.1.attention.wo.weight
Parameter (name=transformer.h.1.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.ln_2.weight->transformer.layers.1.ffn_norm.weight
Parameter (name=transformer.h.1.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.mlp.w1.weight->transformer.layers.1.feed_forward.w1.weight
Parameter (name=transformer.h.1.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.mlp.w2.weight->transformer.layers.1.feed_forward.w3.weight
Parameter (name=transformer.h.1.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.mlp.c_proj.weight->transformer.layers.1.feed_forward.w2.weight
Parameter (name=transformer.h.2.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.ln_1.weight->transformer.layers.2.attention_norm.weight
Parameter (name=transformer.h.2.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.attn.c_attn.weight->transformer.layers.2.attn.c_attn.weight
Parameter (name=transformer.h.2.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.attn.c_attn.bias->transformer.layers.2.attn.c_attn.bias
Parameter (name=transformer.h.2.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.attn.c_proj.weight->transformer.layers.2.attention.wo.weight
Parameter (name=transformer.h.2.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.ln_2.weight->transformer.layers.2.ffn_norm.weight
Parameter (name=transformer.h.2.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.mlp.w1.weight->transformer.layers.2.feed_forward.w1.weight
Parameter (name=transformer.h.2.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.mlp.w2.weight->transformer.layers.2.feed_forward.w3.weight
Parameter (name=transformer.h.2.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.mlp.c_proj.weight->transformer.layers.2.feed_forward.w2.weight
Parameter (name=transformer.h.3.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.ln_1.weight->transformer.layers.3.attention_norm.weight
Parameter (name=transformer.h.3.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.attn.c_attn.weight->transformer.layers.3.attn.c_attn.weight
Parameter (name=transformer.h.3.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.attn.c_attn.bias->transformer.layers.3.attn.c_attn.bias
Parameter (name=transformer.h.3.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.attn.c_proj.weight->transformer.layers.3.attention.wo.weight
Parameter (name=transformer.h.3.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.ln_2.weight->transformer.layers.3.ffn_norm.weight
Parameter (name=transformer.h.3.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.mlp.w1.weight->transformer.layers.3.feed_forward.w1.weight
Parameter (name=transformer.h.3.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.mlp.w2.weight->transformer.layers.3.feed_forward.w3.weight
Parameter (name=transformer.h.3.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.mlp.c_proj.weight->transformer.layers.3.feed_forward.w2.weight
Parameter (name=transformer.h.4.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.ln_1.weight->transformer.layers.4.attention_norm.weight
Parameter (name=transformer.h.4.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.attn.c_attn.weight->transformer.layers.4.attn.c_attn.weight
Parameter (name=transformer.h.4.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.attn.c_attn.bias->transformer.layers.4.attn.c_attn.bias
Parameter (name=transformer.h.4.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.attn.c_proj.weight->transformer.layers.4.attention.wo.weight
Parameter (name=transformer.h.4.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.ln_2.weight->transformer.layers.4.ffn_norm.weight
Parameter (name=transformer.h.4.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.mlp.w1.weight->transformer.layers.4.feed_forward.w1.weight
Parameter (name=transformer.h.4.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.mlp.w2.weight->transformer.layers.4.feed_forward.w3.weight
Parameter (name=transformer.h.4.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.mlp.c_proj.weight->transformer.layers.4.feed_forward.w2.weight
Parameter (name=transformer.h.5.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.ln_1.weight->transformer.layers.5.attention_norm.weight
Parameter (name=transformer.h.5.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.attn.c_attn.weight->transformer.layers.5.attn.c_attn.weight
Parameter (name=transformer.h.5.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.attn.c_attn.bias->transformer.layers.5.attn.c_attn.bias
Parameter (name=transformer.h.5.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.attn.c_proj.weight->transformer.layers.5.attention.wo.weight
Parameter (name=transformer.h.5.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.ln_2.weight->transformer.layers.5.ffn_norm.weight
Parameter (name=transformer.h.5.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.mlp.w1.weight->transformer.layers.5.feed_forward.w1.weight
Parameter (name=transformer.h.5.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.mlp.w2.weight->transformer.layers.5.feed_forward.w3.weight
Parameter (name=transformer.h.5.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.mlp.c_proj.weight->transformer.layers.5.feed_forward.w2.weight
Parameter (name=transformer.h.6.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.ln_1.weight->transformer.layers.6.attention_norm.weight
Parameter (name=transformer.h.6.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.attn.c_attn.weight->transformer.layers.6.attn.c_attn.weight
Parameter (name=transformer.h.6.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.attn.c_attn.bias->transformer.layers.6.attn.c_attn.bias
Parameter (name=transformer.h.6.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.attn.c_proj.weight->transformer.layers.6.attention.wo.weight
Parameter (name=transformer.h.6.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.ln_2.weight->transformer.layers.6.ffn_norm.weight
Parameter (name=transformer.h.6.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.mlp.w1.weight->transformer.layers.6.feed_forward.w1.weight
Parameter (name=transformer.h.6.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.mlp.w2.weight->transformer.layers.6.feed_forward.w3.weight
Parameter (name=transformer.h.6.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.mlp.c_proj.weight->transformer.layers.6.feed_forward.w2.weight
Parameter (name=transformer.h.7.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.ln_1.weight->transformer.layers.7.attention_norm.weight
Parameter (name=transformer.h.7.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.attn.c_attn.weight->transformer.layers.7.attn.c_attn.weight
Parameter (name=transformer.h.7.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.attn.c_attn.bias->transformer.layers.7.attn.c_attn.bias
Parameter (name=transformer.h.7.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.attn.c_proj.weight->transformer.layers.7.attention.wo.weight
Parameter (name=transformer.h.7.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.ln_2.weight->transformer.layers.7.ffn_norm.weight
Parameter (name=transformer.h.7.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.mlp.w1.weight->transformer.layers.7.feed_forward.w1.weight
Parameter (name=transformer.h.7.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.mlp.w2.weight->transformer.layers.7.feed_forward.w3.weight
Parameter (name=transformer.h.7.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.mlp.c_proj.weight->transformer.layers.7.feed_forward.w2.weight
Parameter (name=transformer.h.8.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.ln_1.weight->transformer.layers.8.attention_norm.weight
Parameter (name=transformer.h.8.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.attn.c_attn.weight->transformer.layers.8.attn.c_attn.weight
Parameter (name=transformer.h.8.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.attn.c_attn.bias->transformer.layers.8.attn.c_attn.bias
Parameter (name=transformer.h.8.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.attn.c_proj.weight->transformer.layers.8.attention.wo.weight
Parameter (name=transformer.h.8.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.ln_2.weight->transformer.layers.8.ffn_norm.weight
Parameter (name=transformer.h.8.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.mlp.w1.weight->transformer.layers.8.feed_forward.w1.weight
Parameter (name=transformer.h.8.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.mlp.w2.weight->transformer.layers.8.feed_forward.w3.weight
Parameter (name=transformer.h.8.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.mlp.c_proj.weight->transformer.layers.8.feed_forward.w2.weight
Parameter (name=transformer.h.9.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.ln_1.weight->transformer.layers.9.attention_norm.weight
Parameter (name=transformer.h.9.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.attn.c_attn.weight->transformer.layers.9.attn.c_attn.weight
Parameter (name=transformer.h.9.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.attn.c_attn.bias->transformer.layers.9.attn.c_attn.bias
Parameter (name=transformer.h.9.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.attn.c_proj.weight->transformer.layers.9.attention.wo.weight
Parameter (name=transformer.h.9.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.ln_2.weight->transformer.layers.9.ffn_norm.weight
Parameter (name=transformer.h.9.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.mlp.w1.weight->transformer.layers.9.feed_forward.w1.weight
Parameter (name=transformer.h.9.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.mlp.w2.weight->transformer.layers.9.feed_forward.w3.weight
Parameter (name=transformer.h.9.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.mlp.c_proj.weight->transformer.layers.9.feed_forward.w2.weight
Parameter (name=transformer.h.10.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.ln_1.weight->transformer.layers.10.attention_norm.weight
Parameter (name=transformer.h.10.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.attn.c_attn.weight->transformer.layers.10.attn.c_attn.weight
Parameter (name=transformer.h.10.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.attn.c_attn.bias->transformer.layers.10.attn.c_attn.bias
Parameter (name=transformer.h.10.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.attn.c_proj.weight->transformer.layers.10.attention.wo.weight
Parameter (name=transformer.h.10.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.ln_2.weight->transformer.layers.10.ffn_norm.weight
Parameter (name=transformer.h.10.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.mlp.w1.weight->transformer.layers.10.feed_forward.w1.weight
Parameter (name=transformer.h.10.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.mlp.w2.weight->transformer.layers.10.feed_forward.w3.weight
Parameter (name=transformer.h.10.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.mlp.c_proj.weight->transformer.layers.10.feed_forward.w2.weight
Parameter (name=transformer.h.11.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.ln_1.weight->transformer.layers.11.attention_norm.weight
Parameter (name=transformer.h.11.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.attn.c_attn.weight->transformer.layers.11.attn.c_attn.weight
Parameter (name=transformer.h.11.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.attn.c_attn.bias->transformer.layers.11.attn.c_attn.bias
Parameter (name=transformer.h.11.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.attn.c_proj.weight->transformer.layers.11.attention.wo.weight
Parameter (name=transformer.h.11.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.ln_2.weight->transformer.layers.11.ffn_norm.weight
Parameter (name=transformer.h.11.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.mlp.w1.weight->transformer.layers.11.feed_forward.w1.weight
Parameter (name=transformer.h.11.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.mlp.w2.weight->transformer.layers.11.feed_forward.w3.weight
Parameter (name=transformer.h.11.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.mlp.c_proj.weight->transformer.layers.11.feed_forward.w2.weight
Parameter (name=transformer.h.12.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.ln_1.weight->transformer.layers.12.attention_norm.weight
Parameter (name=transformer.h.12.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.attn.c_attn.weight->transformer.layers.12.attn.c_attn.weight
Parameter (name=transformer.h.12.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.attn.c_attn.bias->transformer.layers.12.attn.c_attn.bias
Parameter (name=transformer.h.12.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.attn.c_proj.weight->transformer.layers.12.attention.wo.weight
Parameter (name=transformer.h.12.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.ln_2.weight->transformer.layers.12.ffn_norm.weight
Parameter (name=transformer.h.12.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.mlp.w1.weight->transformer.layers.12.feed_forward.w1.weight
Parameter (name=transformer.h.12.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.mlp.w2.weight->transformer.layers.12.feed_forward.w3.weight
Parameter (name=transformer.h.12.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.mlp.c_proj.weight->transformer.layers.12.feed_forward.w2.weight
Parameter (name=transformer.h.13.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.ln_1.weight->transformer.layers.13.attention_norm.weight
Parameter (name=transformer.h.13.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.attn.c_attn.weight->transformer.layers.13.attn.c_attn.weight
Parameter (name=transformer.h.13.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.attn.c_attn.bias->transformer.layers.13.attn.c_attn.bias
Parameter (name=transformer.h.13.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.attn.c_proj.weight->transformer.layers.13.attention.wo.weight
Parameter (name=transformer.h.13.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.ln_2.weight->transformer.layers.13.ffn_norm.weight
Parameter (name=transformer.h.13.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.mlp.w1.weight->transformer.layers.13.feed_forward.w1.weight
Parameter (name=transformer.h.13.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.mlp.w2.weight->transformer.layers.13.feed_forward.w3.weight
Parameter (name=transformer.h.13.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.mlp.c_proj.weight->transformer.layers.13.feed_forward.w2.weight
Parameter (name=transformer.h.14.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.ln_1.weight->transformer.layers.14.attention_norm.weight
Parameter (name=transformer.h.14.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.attn.c_attn.weight->transformer.layers.14.attn.c_attn.weight
Parameter (name=transformer.h.14.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.attn.c_attn.bias->transformer.layers.14.attn.c_attn.bias
Parameter (name=transformer.h.14.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.attn.c_proj.weight->transformer.layers.14.attention.wo.weight
Parameter (name=transformer.h.14.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.ln_2.weight->transformer.layers.14.ffn_norm.weight
Parameter (name=transformer.h.14.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.mlp.w1.weight->transformer.layers.14.feed_forward.w1.weight
Parameter (name=transformer.h.14.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.mlp.w2.weight->transformer.layers.14.feed_forward.w3.weight
Parameter (name=transformer.h.14.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.mlp.c_proj.weight->transformer.layers.14.feed_forward.w2.weight
Parameter (name=transformer.h.15.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.ln_1.weight->transformer.layers.15.attention_norm.weight
Parameter (name=transformer.h.15.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.attn.c_attn.weight->transformer.layers.15.attn.c_attn.weight
Parameter (name=transformer.h.15.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.attn.c_attn.bias->transformer.layers.15.attn.c_attn.bias
Parameter (name=transformer.h.15.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.attn.c_proj.weight->transformer.layers.15.attention.wo.weight
Parameter (name=transformer.h.15.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.ln_2.weight->transformer.layers.15.ffn_norm.weight
Parameter (name=transformer.h.15.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.mlp.w1.weight->transformer.layers.15.feed_forward.w1.weight
Parameter (name=transformer.h.15.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.mlp.w2.weight->transformer.layers.15.feed_forward.w3.weight
Parameter (name=transformer.h.15.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.mlp.c_proj.weight->transformer.layers.15.feed_forward.w2.weight
Parameter (name=transformer.h.16.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.ln_1.weight->transformer.layers.16.attention_norm.weight
Parameter (name=transformer.h.16.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.attn.c_attn.weight->transformer.layers.16.attn.c_attn.weight
Parameter (name=transformer.h.16.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.attn.c_attn.bias->transformer.layers.16.attn.c_attn.bias
Parameter (name=transformer.h.16.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.attn.c_proj.weight->transformer.layers.16.attention.wo.weight
Parameter (name=transformer.h.16.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.ln_2.weight->transformer.layers.16.ffn_norm.weight
Parameter (name=transformer.h.16.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.mlp.w1.weight->transformer.layers.16.feed_forward.w1.weight
Parameter (name=transformer.h.16.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.mlp.w2.weight->transformer.layers.16.feed_forward.w3.weight
Parameter (name=transformer.h.16.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.mlp.c_proj.weight->transformer.layers.16.feed_forward.w2.weight
Parameter (name=transformer.h.17.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.ln_1.weight->transformer.layers.17.attention_norm.weight
Parameter (name=transformer.h.17.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.attn.c_attn.weight->transformer.layers.17.attn.c_attn.weight
Parameter (name=transformer.h.17.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.attn.c_attn.bias->transformer.layers.17.attn.c_attn.bias
Parameter (name=transformer.h.17.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.attn.c_proj.weight->transformer.layers.17.attention.wo.weight
Parameter (name=transformer.h.17.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.ln_2.weight->transformer.layers.17.ffn_norm.weight
Parameter (name=transformer.h.17.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.mlp.w1.weight->transformer.layers.17.feed_forward.w1.weight
Parameter (name=transformer.h.17.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.mlp.w2.weight->transformer.layers.17.feed_forward.w3.weight
Parameter (name=transformer.h.17.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.mlp.c_proj.weight->transformer.layers.17.feed_forward.w2.weight
Parameter (name=transformer.h.18.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.ln_1.weight->transformer.layers.18.attention_norm.weight
Parameter (name=transformer.h.18.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.attn.c_attn.weight->transformer.layers.18.attn.c_attn.weight
Parameter (name=transformer.h.18.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.attn.c_attn.bias->transformer.layers.18.attn.c_attn.bias
Parameter (name=transformer.h.18.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.attn.c_proj.weight->transformer.layers.18.attention.wo.weight
Parameter (name=transformer.h.18.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.ln_2.weight->transformer.layers.18.ffn_norm.weight
Parameter (name=transformer.h.18.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.mlp.w1.weight->transformer.layers.18.feed_forward.w1.weight
Parameter (name=transformer.h.18.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.mlp.w2.weight->transformer.layers.18.feed_forward.w3.weight
Parameter (name=transformer.h.18.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.mlp.c_proj.weight->transformer.layers.18.feed_forward.w2.weight
Parameter (name=transformer.h.19.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.ln_1.weight->transformer.layers.19.attention_norm.weight
Parameter (name=transformer.h.19.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.attn.c_attn.weight->transformer.layers.19.attn.c_attn.weight
Parameter (name=transformer.h.19.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.attn.c_attn.bias->transformer.layers.19.attn.c_attn.bias
Parameter (name=transformer.h.19.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.attn.c_proj.weight->transformer.layers.19.attention.wo.weight
Parameter (name=transformer.h.19.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.ln_2.weight->transformer.layers.19.ffn_norm.weight
Parameter (name=transformer.h.19.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.mlp.w1.weight->transformer.layers.19.feed_forward.w1.weight
Parameter (name=transformer.h.19.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.mlp.w2.weight->transformer.layers.19.feed_forward.w3.weight
Parameter (name=transformer.h.19.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.mlp.c_proj.weight->transformer.layers.19.feed_forward.w2.weight
Parameter (name=transformer.h.20.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.ln_1.weight->transformer.layers.20.attention_norm.weight
Parameter (name=transformer.h.20.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.attn.c_attn.weight->transformer.layers.20.attn.c_attn.weight
Parameter (name=transformer.h.20.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.attn.c_attn.bias->transformer.layers.20.attn.c_attn.bias
Parameter (name=transformer.h.20.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.attn.c_proj.weight->transformer.layers.20.attention.wo.weight
Parameter (name=transformer.h.20.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.ln_2.weight->transformer.layers.20.ffn_norm.weight
Parameter (name=transformer.h.20.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.mlp.w1.weight->transformer.layers.20.feed_forward.w1.weight
Parameter (name=transformer.h.20.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.mlp.w2.weight->transformer.layers.20.feed_forward.w3.weight
Parameter (name=transformer.h.20.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.mlp.c_proj.weight->transformer.layers.20.feed_forward.w2.weight
Parameter (name=transformer.h.21.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.ln_1.weight->transformer.layers.21.attention_norm.weight
Parameter (name=transformer.h.21.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.attn.c_attn.weight->transformer.layers.21.attn.c_attn.weight
Parameter (name=transformer.h.21.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.attn.c_attn.bias->transformer.layers.21.attn.c_attn.bias
Parameter (name=transformer.h.21.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.attn.c_proj.weight->transformer.layers.21.attention.wo.weight
Parameter (name=transformer.h.21.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.ln_2.weight->transformer.layers.21.ffn_norm.weight
Parameter (name=transformer.h.21.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.mlp.w1.weight->transformer.layers.21.feed_forward.w1.weight
Parameter (name=transformer.h.21.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.mlp.w2.weight->transformer.layers.21.feed_forward.w3.weight
Parameter (name=transformer.h.21.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.mlp.c_proj.weight->transformer.layers.21.feed_forward.w2.weight
Parameter (name=transformer.h.22.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.ln_1.weight->transformer.layers.22.attention_norm.weight
Parameter (name=transformer.h.22.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.attn.c_attn.weight->transformer.layers.22.attn.c_attn.weight
Parameter (name=transformer.h.22.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.attn.c_attn.bias->transformer.layers.22.attn.c_attn.bias
Parameter (name=transformer.h.22.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.attn.c_proj.weight->transformer.layers.22.attention.wo.weight
Parameter (name=transformer.h.22.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.ln_2.weight->transformer.layers.22.ffn_norm.weight
Parameter (name=transformer.h.22.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.mlp.w1.weight->transformer.layers.22.feed_forward.w1.weight
Parameter (name=transformer.h.22.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.mlp.w2.weight->transformer.layers.22.feed_forward.w3.weight
Parameter (name=transformer.h.22.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.mlp.c_proj.weight->transformer.layers.22.feed_forward.w2.weight
Parameter (name=transformer.h.23.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.ln_1.weight->transformer.layers.23.attention_norm.weight
Parameter (name=transformer.h.23.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.attn.c_attn.weight->transformer.layers.23.attn.c_attn.weight
Parameter (name=transformer.h.23.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.attn.c_attn.bias->transformer.layers.23.attn.c_attn.bias
Parameter (name=transformer.h.23.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.attn.c_proj.weight->transformer.layers.23.attention.wo.weight
Parameter (name=transformer.h.23.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.ln_2.weight->transformer.layers.23.ffn_norm.weight
Parameter (name=transformer.h.23.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.mlp.w1.weight->transformer.layers.23.feed_forward.w1.weight
Parameter (name=transformer.h.23.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.mlp.w2.weight->transformer.layers.23.feed_forward.w3.weight
Parameter (name=transformer.h.23.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.mlp.c_proj.weight->transformer.layers.23.feed_forward.w2.weight
Parameter (name=transformer.h.24.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.ln_1.weight->transformer.layers.24.attention_norm.weight
Parameter (name=transformer.h.24.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.attn.c_attn.weight->transformer.layers.24.attn.c_attn.weight
Parameter (name=transformer.h.24.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.attn.c_attn.bias->transformer.layers.24.attn.c_attn.bias
Parameter (name=transformer.h.24.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.attn.c_proj.weight->transformer.layers.24.attention.wo.weight
Parameter (name=transformer.h.24.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.ln_2.weight->transformer.layers.24.ffn_norm.weight
Parameter (name=transformer.h.24.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.mlp.w1.weight->transformer.layers.24.feed_forward.w1.weight
Parameter (name=transformer.h.24.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.mlp.w2.weight->transformer.layers.24.feed_forward.w3.weight
Parameter (name=transformer.h.24.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.mlp.c_proj.weight->transformer.layers.24.feed_forward.w2.weight
Parameter (name=transformer.h.25.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.ln_1.weight->transformer.layers.25.attention_norm.weight
Parameter (name=transformer.h.25.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.attn.c_attn.weight->transformer.layers.25.attn.c_attn.weight
Parameter (name=transformer.h.25.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.attn.c_attn.bias->transformer.layers.25.attn.c_attn.bias
Parameter (name=transformer.h.25.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.attn.c_proj.weight->transformer.layers.25.attention.wo.weight
Parameter (name=transformer.h.25.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.ln_2.weight->transformer.layers.25.ffn_norm.weight
Parameter (name=transformer.h.25.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.mlp.w1.weight->transformer.layers.25.feed_forward.w1.weight
Parameter (name=transformer.h.25.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.mlp.w2.weight->transformer.layers.25.feed_forward.w3.weight
Parameter (name=transformer.h.25.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.mlp.c_proj.weight->transformer.layers.25.feed_forward.w2.weight
Parameter (name=transformer.h.26.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.ln_1.weight->transformer.layers.26.attention_norm.weight
Parameter (name=transformer.h.26.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.attn.c_attn.weight->transformer.layers.26.attn.c_attn.weight
Parameter (name=transformer.h.26.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.attn.c_attn.bias->transformer.layers.26.attn.c_attn.bias
Parameter (name=transformer.h.26.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.attn.c_proj.weight->transformer.layers.26.attention.wo.weight
Parameter (name=transformer.h.26.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.ln_2.weight->transformer.layers.26.ffn_norm.weight
Parameter (name=transformer.h.26.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.mlp.w1.weight->transformer.layers.26.feed_forward.w1.weight
Parameter (name=transformer.h.26.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.mlp.w2.weight->transformer.layers.26.feed_forward.w3.weight
Parameter (name=transformer.h.26.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.mlp.c_proj.weight->transformer.layers.26.feed_forward.w2.weight
Parameter (name=transformer.h.27.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.ln_1.weight->transformer.layers.27.attention_norm.weight
Parameter (name=transformer.h.27.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.attn.c_attn.weight->transformer.layers.27.attn.c_attn.weight
Parameter (name=transformer.h.27.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.attn.c_attn.bias->transformer.layers.27.attn.c_attn.bias
Parameter (name=transformer.h.27.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.attn.c_proj.weight->transformer.layers.27.attention.wo.weight
Parameter (name=transformer.h.27.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.ln_2.weight->transformer.layers.27.ffn_norm.weight
Parameter (name=transformer.h.27.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.mlp.w1.weight->transformer.layers.27.feed_forward.w1.weight
Parameter (name=transformer.h.27.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.mlp.w2.weight->transformer.layers.27.feed_forward.w3.weight
Parameter (name=transformer.h.27.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.mlp.c_proj.weight->transformer.layers.27.feed_forward.w2.weight
Parameter (name=transformer.h.28.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.ln_1.weight->transformer.layers.28.attention_norm.weight
Parameter (name=transformer.h.28.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.attn.c_attn.weight->transformer.layers.28.attn.c_attn.weight
Parameter (name=transformer.h.28.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.attn.c_attn.bias->transformer.layers.28.attn.c_attn.bias
Parameter (name=transformer.h.28.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.attn.c_proj.weight->transformer.layers.28.attention.wo.weight
Parameter (name=transformer.h.28.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.ln_2.weight->transformer.layers.28.ffn_norm.weight
Parameter (name=transformer.h.28.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.mlp.w1.weight->transformer.layers.28.feed_forward.w1.weight
Parameter (name=transformer.h.28.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.mlp.w2.weight->transformer.layers.28.feed_forward.w3.weight
Parameter (name=transformer.h.28.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.mlp.c_proj.weight->transformer.layers.28.feed_forward.w2.weight
Parameter (name=transformer.h.29.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.ln_1.weight->transformer.layers.29.attention_norm.weight
Parameter (name=transformer.h.29.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.attn.c_attn.weight->transformer.layers.29.attn.c_attn.weight
Parameter (name=transformer.h.29.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.attn.c_attn.bias->transformer.layers.29.attn.c_attn.bias
Parameter (name=transformer.h.29.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.attn.c_proj.weight->transformer.layers.29.attention.wo.weight
Parameter (name=transformer.h.29.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.ln_2.weight->transformer.layers.29.ffn_norm.weight
Parameter (name=transformer.h.29.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.mlp.w1.weight->transformer.layers.29.feed_forward.w1.weight
Parameter (name=transformer.h.29.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.mlp.w2.weight->transformer.layers.29.feed_forward.w3.weight
Parameter (name=transformer.h.29.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.mlp.c_proj.weight->transformer.layers.29.feed_forward.w2.weight
Parameter (name=transformer.h.30.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.ln_1.weight->transformer.layers.30.attention_norm.weight
Parameter (name=transformer.h.30.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.attn.c_attn.weight->transformer.layers.30.attn.c_attn.weight
Parameter (name=transformer.h.30.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.attn.c_attn.bias->transformer.layers.30.attn.c_attn.bias
Parameter (name=transformer.h.30.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.attn.c_proj.weight->transformer.layers.30.attention.wo.weight
Parameter (name=transformer.h.30.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.ln_2.weight->transformer.layers.30.ffn_norm.weight
Parameter (name=transformer.h.30.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.mlp.w1.weight->transformer.layers.30.feed_forward.w1.weight
Parameter (name=transformer.h.30.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.mlp.w2.weight->transformer.layers.30.feed_forward.w3.weight
Parameter (name=transformer.h.30.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.mlp.c_proj.weight->transformer.layers.30.feed_forward.w2.weight
Parameter (name=transformer.h.31.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.ln_1.weight->transformer.layers.31.attention_norm.weight
Parameter (name=transformer.h.31.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.attn.c_attn.weight->transformer.layers.31.attn.c_attn.weight
Parameter (name=transformer.h.31.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.attn.c_attn.bias->transformer.layers.31.attn.c_attn.bias
Parameter (name=transformer.h.31.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.attn.c_proj.weight->transformer.layers.31.attention.wo.weight
Parameter (name=transformer.h.31.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.ln_2.weight->transformer.layers.31.ffn_norm.weight
Parameter (name=transformer.h.31.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.mlp.w1.weight->transformer.layers.31.feed_forward.w1.weight
Parameter (name=transformer.h.31.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.mlp.w2.weight->transformer.layers.31.feed_forward.w3.weight
Parameter (name=transformer.h.31.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.mlp.c_proj.weight->transformer.layers.31.feed_forward.w2.weight
Parameter (name=transformer.ln_f.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
Parameter (name=lm_head.weight, shape=torch.Size([151936, 4096]), dtype=torch.float32, requires_grad=True)
Saving converted weights to /data/qwen/models/Qwen-7B-Chat/qwen-7b-chat.ckpt...
Done

配置路径,启动推理脚本。

cd /data/qwen/mindformers/research/qwen

export PYTHONPATH=/data/qwen/mindformers:$PYTHONPATH

python3 infer_qwen.py

/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.return self._float_to_str(self.smallest_subnormal)
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.return self._float_to_str(self.smallest_subnormal)
[Warning]Can not find libascendalog.so
[Warning]Can not find libascendalog.so
Traceback (most recent call last):File "/data/qwen/mindformers/research/qwen/infer_qwen.py", line 4, in <module>from mindformers.trainer import TrainerFile "/data/qwen/mindformers/mindformers/__init__.py", line 17, in <module>from mindformers import core, auto_class, dataset, \File "/data/qwen/mindformers/mindformers/core/__init__.py", line 19, in <module>from .metric import build_metricFile "/data/qwen/mindformers/mindformers/core/metric/__init__.py", line 17, in <module>from .metric import *File "/data/qwen/mindformers/mindformers/core/metric/metric.py", line 37, in <module>from mindformers.models import BasicTokenizerFile "/data/qwen/mindformers/mindformers/models/__init__.py", line 21, in <module>from .blip2 import *File "/data/qwen/mindformers/mindformers/models/blip2/__init__.py", line 17, in <module>from .blip2_config import Blip2ConfigFile "/data/qwen/mindformers/mindformers/models/blip2/blip2_config.py", line 23, in <module>from mindformers.models.llama import LlamaConfigFile "/data/qwen/mindformers/mindformers/models/llama/__init__.py", line 18, in <module>from .llama import LlamaForCausalLM, LlamaForCausalLMWithLora, LlamaModelFile "/data/qwen/mindformers/mindformers/models/llama/llama.py", line 30, in <module>from mindspore.nn.layer.flash_attention import FlashAttentionFile "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/nn/layer/flash_attention.py", line 24, in <module>from mindspore.ops._op_impl._custom_op.flash_attention.flash_attention_impl import get_flash_attentionFile "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/__init__.py", line 17, in <module>from mindspore.ops._op_impl._custom_op.dsd_impl import dsd_matmulFile "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/dsd_impl.py", line 17, in <module>from te import tikFile "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/te/__init__.py", line 128, in <module>from tbe import tvmFile "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/__init__.py", line 44, in <module>import tvmFile "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/__init__.py", line 26, in <module>from ._ffi.base import TVMError, __version__, _RUNTIME_ONLYFile "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/__init__.py", line 28, in <module>from .base import register_errorFile "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/base.py", line 72, in <module>_LIB, _LIB_NAME = _load_lib()File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/base.py", line 52, in _load_liblib_path = libinfo.find_lib_path()File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/libinfo.py", line 147, in find_lib_pathraise RuntimeError(message)
RuntimeError: Cannot find the files.
List of candidates:
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/lib/plugin/cpu/libtvm.so
/usr/local/Ascend/driver/libtvm.so
/data/qwen/mindformers/research/qwen/libtvm.so
/usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/bin/libtvm.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/aarch64-linux/ccec_compiler/bin/libtvm.so
/root/miniconda3/envs/mindspore2.2_py39/bin/libtvm.so
/root/miniconda3/condabin/libtvm.so
/usr/local/sbin/libtvm.so
/usr/local/bin/libtvm.so
/usr/sbin/libtvm.so
/usr/bin/libtvm.so
/usr/sbin/libtvm.so
/usr/bin/libtvm.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/libtvm.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/libtvm.so
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/lib/plugin/cpu/libtvm_runtime.so
/usr/local/Ascend/driver/libtvm_runtime.so
/data/qwen/mindformers/research/qwen/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/bin/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/aarch64-linux/ccec_compiler/bin/libtvm_runtime.so
/root/miniconda3/envs/mindspore2.2_py39/bin/libtvm_runtime.so
/root/miniconda3/condabin/libtvm_runtime.so
/usr/local/sbin/libtvm_runtime.so
/usr/local/bin/libtvm_runtime.so
/usr/sbin/libtvm_runtime.so
/usr/bin/libtvm_runtime.so
/usr/sbin/libtvm_runtime.so
/usr/bin/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/libtvm_runtime.so

报错信息,应该是和配置芯片架构中缺少的文件,当前不做深入探究。

这篇关于麒麟系统SP2 与昇腾300I芯片测试qwen7B模型记录的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/516092

相关文章

Linux系统中查询JDK安装目录的几种常用方法

《Linux系统中查询JDK安装目录的几种常用方法》:本文主要介绍Linux系统中查询JDK安装目录的几种常用方法,方法分别是通过update-alternatives、Java命令、环境变量及目... 目录方法 1:通过update-alternatives查询(推荐)方法 2:检查所有已安装的 JDK方

Linux系统之lvcreate命令使用解读

《Linux系统之lvcreate命令使用解读》lvcreate是LVM中创建逻辑卷的核心命令,支持线性、条带化、RAID、镜像、快照、瘦池和缓存池等多种类型,实现灵活存储资源管理,需注意空间分配、R... 目录lvcreate命令详解一、命令概述二、语法格式三、核心功能四、选项详解五、使用示例1. 创建逻

Zabbix在MySQL性能监控方面的运用及最佳实践记录

《Zabbix在MySQL性能监控方面的运用及最佳实践记录》Zabbix通过自定义脚本和内置模板监控MySQL核心指标(连接、查询、资源、复制),支持自动发现多实例及告警通知,结合可视化仪表盘,可有效... 目录一、核心监控指标及配置1. 关键监控指标示例2. 配置方法二、自动发现与多实例管理1. 实践步骤

使用Python构建一个高效的日志处理系统

《使用Python构建一个高效的日志处理系统》这篇文章主要为大家详细讲解了如何使用Python开发一个专业的日志分析工具,能够自动化处理、分析和可视化各类日志文件,大幅提升运维效率,需要的可以了解下... 目录环境准备工具功能概述完整代码实现代码深度解析1. 类设计与初始化2. 日志解析核心逻辑3. 文件处

golang程序打包成脚本部署到Linux系统方式

《golang程序打包成脚本部署到Linux系统方式》Golang程序通过本地编译(设置GOOS为linux生成无后缀二进制文件),上传至Linux服务器后赋权执行,使用nohup命令实现后台运行,完... 目录本地编译golang程序上传Golang二进制文件到linux服务器总结本地编译Golang程序

Linux系统性能检测命令详解

《Linux系统性能检测命令详解》本文介绍了Linux系统常用的监控命令(如top、vmstat、iostat、htop等)及其参数功能,涵盖进程状态、内存使用、磁盘I/O、系统负载等多维度资源监控,... 目录toppsuptimevmstatIOStatiotopslabtophtopdstatnmon

在Spring Boot中集成RabbitMQ的实战记录

《在SpringBoot中集成RabbitMQ的实战记录》本文介绍SpringBoot集成RabbitMQ的步骤,涵盖配置连接、消息发送与接收,并对比两种定义Exchange与队列的方式:手动声明(... 目录前言准备工作1. 安装 RabbitMQ2. 消息发送者(Producer)配置1. 创建 Spr

linux重启命令有哪些? 7个实用的Linux系统重启命令汇总

《linux重启命令有哪些?7个实用的Linux系统重启命令汇总》Linux系统提供了多种重启命令,常用的包括shutdown-r、reboot、init6等,不同命令适用于不同场景,本文将详细... 在管理和维护 linux 服务器时,完成系统更新、故障排查或日常维护后,重启系统往往是必不可少的步骤。本文

使用Python进行GRPC和Dubbo协议的高级测试

《使用Python进行GRPC和Dubbo协议的高级测试》GRPC(GoogleRemoteProcedureCall)是一种高性能、开源的远程过程调用(RPC)框架,Dubbo是一种高性能的分布式服... 目录01 GRPC测试安装gRPC编写.proto文件实现服务02 Dubbo测试1. 安装Dubb

Python的端到端测试框架SeleniumBase使用解读

《Python的端到端测试框架SeleniumBase使用解读》:本文主要介绍Python的端到端测试框架SeleniumBase使用,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全... 目录SeleniumBase详细介绍及用法指南什么是 SeleniumBase?SeleniumBase