麒麟系统SP2 与昇腾300I芯片测试qwen7B模型记录

2023-12-20 12:44

本文主要是介绍麒麟系统SP2 与昇腾300I芯片测试qwen7B模型记录,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

1. 查看系统版本

uname -a

Linux localhost.localdomain 4.19.90-24.4.v2101.ky10.aarch64 #1 SMP Mon May 24 14:45:37 CST 2021 aarch64 aarch64 aarch64 GNU/Linux

2. 查看显卡

npu-smi info

前情提要:

官网给出支持昇腾910架构,刚好有300I资源,测试一下,给大家提供参考~~菜鸟一枚还需向各位大佬学习

https://github.com/QwenLM/Qwen/tree/5aa84bdfd3237b37f01bc88cd49b3279b9a71d0b/ascend-supporticon-default.png?t=N7T8https://github.com/QwenLM/Qwen/tree/5aa84bdfd3237b37f01bc88cd49b3279b9a71d0b/ascend-support主要测试参考该方法,暂时不做深入研究。 

暂时了解 该系统可以做简单的算法模型,主要是架构不同,需要重新写算法,可以安装pytorch、tensorflow和mindformers等。

查看具体参数:

uname -m && cat /etc/*release

 

aarch64
Kylin Linux Advanced Server release V10 (Sword)
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Sword)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Sword)"
ANSI_COLOR="0;31"Kylin Linux Advanced Server release V10 (Sword)

3. 配置docker,有两种配置方法,一种在官网下载,一种直接用命令yum 安装即可

4. 安装minconda ,注意安装arrch64版本即可

5.按照教程配置,这里不做详细介绍了,直接给出记录

6.没有使用教程启动docker的命令,使用以下命令。

sudo docker run -it --rm -u root --network=host --ipc=host --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2  --device=/dev/davinci3 --device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 --name=6bff46b104b8 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /etc/ascend_install.info:/etc/ascend_install.info  -v /root/qwen/Qwen-7B-Chat:/data/qwen/models/Qwen-7B-Chat -v /var/log/npu/:/usr/slog  qwenllm/qwen-mindspore /bin/bash

成功启动docker。

7.转换模型

python3 /data/qwen/mindformers/research/qwen/convert_weight.py

/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.return self._float_to_str(self.smallest_subnormal)
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.return self._float_to_str(self.smallest_subnormal)
Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。
Flash attention will be disabled because it does NOT support fp32.
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|??????????????????????????????????????????????????????????????????????????????| 8/8 [00:03<00:00,  2.35it/s]
Parameter (name=transformer.wte.weight, shape=torch.Size([151936, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.wte.weight->transformer.wte.embedding_weight
Parameter (name=transformer.h.0.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.ln_1.weight->transformer.layers.0.attention_norm.weight
Parameter (name=transformer.h.0.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.attn.c_attn.weight->transformer.layers.0.attn.c_attn.weight
Parameter (name=transformer.h.0.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.attn.c_attn.bias->transformer.layers.0.attn.c_attn.bias
Parameter (name=transformer.h.0.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.attn.c_proj.weight->transformer.layers.0.attention.wo.weight
Parameter (name=transformer.h.0.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.ln_2.weight->transformer.layers.0.ffn_norm.weight
Parameter (name=transformer.h.0.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.mlp.w1.weight->transformer.layers.0.feed_forward.w1.weight
Parameter (name=transformer.h.0.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.mlp.w2.weight->transformer.layers.0.feed_forward.w3.weight
Parameter (name=transformer.h.0.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.mlp.c_proj.weight->transformer.layers.0.feed_forward.w2.weight
Parameter (name=transformer.h.1.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.ln_1.weight->transformer.layers.1.attention_norm.weight
Parameter (name=transformer.h.1.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.attn.c_attn.weight->transformer.layers.1.attn.c_attn.weight
Parameter (name=transformer.h.1.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.attn.c_attn.bias->transformer.layers.1.attn.c_attn.bias
Parameter (name=transformer.h.1.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.attn.c_proj.weight->transformer.layers.1.attention.wo.weight
Parameter (name=transformer.h.1.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.ln_2.weight->transformer.layers.1.ffn_norm.weight
Parameter (name=transformer.h.1.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.mlp.w1.weight->transformer.layers.1.feed_forward.w1.weight
Parameter (name=transformer.h.1.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.mlp.w2.weight->transformer.layers.1.feed_forward.w3.weight
Parameter (name=transformer.h.1.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.mlp.c_proj.weight->transformer.layers.1.feed_forward.w2.weight
Parameter (name=transformer.h.2.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.ln_1.weight->transformer.layers.2.attention_norm.weight
Parameter (name=transformer.h.2.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.attn.c_attn.weight->transformer.layers.2.attn.c_attn.weight
Parameter (name=transformer.h.2.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.attn.c_attn.bias->transformer.layers.2.attn.c_attn.bias
Parameter (name=transformer.h.2.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.attn.c_proj.weight->transformer.layers.2.attention.wo.weight
Parameter (name=transformer.h.2.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.ln_2.weight->transformer.layers.2.ffn_norm.weight
Parameter (name=transformer.h.2.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.mlp.w1.weight->transformer.layers.2.feed_forward.w1.weight
Parameter (name=transformer.h.2.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.mlp.w2.weight->transformer.layers.2.feed_forward.w3.weight
Parameter (name=transformer.h.2.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.mlp.c_proj.weight->transformer.layers.2.feed_forward.w2.weight
Parameter (name=transformer.h.3.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.ln_1.weight->transformer.layers.3.attention_norm.weight
Parameter (name=transformer.h.3.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.attn.c_attn.weight->transformer.layers.3.attn.c_attn.weight
Parameter (name=transformer.h.3.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.attn.c_attn.bias->transformer.layers.3.attn.c_attn.bias
Parameter (name=transformer.h.3.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.attn.c_proj.weight->transformer.layers.3.attention.wo.weight
Parameter (name=transformer.h.3.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.ln_2.weight->transformer.layers.3.ffn_norm.weight
Parameter (name=transformer.h.3.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.mlp.w1.weight->transformer.layers.3.feed_forward.w1.weight
Parameter (name=transformer.h.3.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.mlp.w2.weight->transformer.layers.3.feed_forward.w3.weight
Parameter (name=transformer.h.3.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.mlp.c_proj.weight->transformer.layers.3.feed_forward.w2.weight
Parameter (name=transformer.h.4.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.ln_1.weight->transformer.layers.4.attention_norm.weight
Parameter (name=transformer.h.4.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.attn.c_attn.weight->transformer.layers.4.attn.c_attn.weight
Parameter (name=transformer.h.4.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.attn.c_attn.bias->transformer.layers.4.attn.c_attn.bias
Parameter (name=transformer.h.4.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.attn.c_proj.weight->transformer.layers.4.attention.wo.weight
Parameter (name=transformer.h.4.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.ln_2.weight->transformer.layers.4.ffn_norm.weight
Parameter (name=transformer.h.4.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.mlp.w1.weight->transformer.layers.4.feed_forward.w1.weight
Parameter (name=transformer.h.4.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.mlp.w2.weight->transformer.layers.4.feed_forward.w3.weight
Parameter (name=transformer.h.4.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.mlp.c_proj.weight->transformer.layers.4.feed_forward.w2.weight
Parameter (name=transformer.h.5.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.ln_1.weight->transformer.layers.5.attention_norm.weight
Parameter (name=transformer.h.5.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.attn.c_attn.weight->transformer.layers.5.attn.c_attn.weight
Parameter (name=transformer.h.5.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.attn.c_attn.bias->transformer.layers.5.attn.c_attn.bias
Parameter (name=transformer.h.5.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.attn.c_proj.weight->transformer.layers.5.attention.wo.weight
Parameter (name=transformer.h.5.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.ln_2.weight->transformer.layers.5.ffn_norm.weight
Parameter (name=transformer.h.5.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.mlp.w1.weight->transformer.layers.5.feed_forward.w1.weight
Parameter (name=transformer.h.5.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.mlp.w2.weight->transformer.layers.5.feed_forward.w3.weight
Parameter (name=transformer.h.5.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.mlp.c_proj.weight->transformer.layers.5.feed_forward.w2.weight
Parameter (name=transformer.h.6.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.ln_1.weight->transformer.layers.6.attention_norm.weight
Parameter (name=transformer.h.6.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.attn.c_attn.weight->transformer.layers.6.attn.c_attn.weight
Parameter (name=transformer.h.6.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.attn.c_attn.bias->transformer.layers.6.attn.c_attn.bias
Parameter (name=transformer.h.6.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.attn.c_proj.weight->transformer.layers.6.attention.wo.weight
Parameter (name=transformer.h.6.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.ln_2.weight->transformer.layers.6.ffn_norm.weight
Parameter (name=transformer.h.6.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.mlp.w1.weight->transformer.layers.6.feed_forward.w1.weight
Parameter (name=transformer.h.6.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.mlp.w2.weight->transformer.layers.6.feed_forward.w3.weight
Parameter (name=transformer.h.6.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.mlp.c_proj.weight->transformer.layers.6.feed_forward.w2.weight
Parameter (name=transformer.h.7.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.ln_1.weight->transformer.layers.7.attention_norm.weight
Parameter (name=transformer.h.7.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.attn.c_attn.weight->transformer.layers.7.attn.c_attn.weight
Parameter (name=transformer.h.7.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.attn.c_attn.bias->transformer.layers.7.attn.c_attn.bias
Parameter (name=transformer.h.7.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.attn.c_proj.weight->transformer.layers.7.attention.wo.weight
Parameter (name=transformer.h.7.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.ln_2.weight->transformer.layers.7.ffn_norm.weight
Parameter (name=transformer.h.7.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.mlp.w1.weight->transformer.layers.7.feed_forward.w1.weight
Parameter (name=transformer.h.7.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.mlp.w2.weight->transformer.layers.7.feed_forward.w3.weight
Parameter (name=transformer.h.7.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.mlp.c_proj.weight->transformer.layers.7.feed_forward.w2.weight
Parameter (name=transformer.h.8.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.ln_1.weight->transformer.layers.8.attention_norm.weight
Parameter (name=transformer.h.8.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.attn.c_attn.weight->transformer.layers.8.attn.c_attn.weight
Parameter (name=transformer.h.8.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.attn.c_attn.bias->transformer.layers.8.attn.c_attn.bias
Parameter (name=transformer.h.8.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.attn.c_proj.weight->transformer.layers.8.attention.wo.weight
Parameter (name=transformer.h.8.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.ln_2.weight->transformer.layers.8.ffn_norm.weight
Parameter (name=transformer.h.8.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.mlp.w1.weight->transformer.layers.8.feed_forward.w1.weight
Parameter (name=transformer.h.8.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.mlp.w2.weight->transformer.layers.8.feed_forward.w3.weight
Parameter (name=transformer.h.8.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.mlp.c_proj.weight->transformer.layers.8.feed_forward.w2.weight
Parameter (name=transformer.h.9.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.ln_1.weight->transformer.layers.9.attention_norm.weight
Parameter (name=transformer.h.9.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.attn.c_attn.weight->transformer.layers.9.attn.c_attn.weight
Parameter (name=transformer.h.9.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.attn.c_attn.bias->transformer.layers.9.attn.c_attn.bias
Parameter (name=transformer.h.9.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.attn.c_proj.weight->transformer.layers.9.attention.wo.weight
Parameter (name=transformer.h.9.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.ln_2.weight->transformer.layers.9.ffn_norm.weight
Parameter (name=transformer.h.9.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.mlp.w1.weight->transformer.layers.9.feed_forward.w1.weight
Parameter (name=transformer.h.9.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.mlp.w2.weight->transformer.layers.9.feed_forward.w3.weight
Parameter (name=transformer.h.9.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.mlp.c_proj.weight->transformer.layers.9.feed_forward.w2.weight
Parameter (name=transformer.h.10.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.ln_1.weight->transformer.layers.10.attention_norm.weight
Parameter (name=transformer.h.10.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.attn.c_attn.weight->transformer.layers.10.attn.c_attn.weight
Parameter (name=transformer.h.10.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.attn.c_attn.bias->transformer.layers.10.attn.c_attn.bias
Parameter (name=transformer.h.10.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.attn.c_proj.weight->transformer.layers.10.attention.wo.weight
Parameter (name=transformer.h.10.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.ln_2.weight->transformer.layers.10.ffn_norm.weight
Parameter (name=transformer.h.10.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.mlp.w1.weight->transformer.layers.10.feed_forward.w1.weight
Parameter (name=transformer.h.10.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.mlp.w2.weight->transformer.layers.10.feed_forward.w3.weight
Parameter (name=transformer.h.10.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.mlp.c_proj.weight->transformer.layers.10.feed_forward.w2.weight
Parameter (name=transformer.h.11.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.ln_1.weight->transformer.layers.11.attention_norm.weight
Parameter (name=transformer.h.11.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.attn.c_attn.weight->transformer.layers.11.attn.c_attn.weight
Parameter (name=transformer.h.11.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.attn.c_attn.bias->transformer.layers.11.attn.c_attn.bias
Parameter (name=transformer.h.11.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.attn.c_proj.weight->transformer.layers.11.attention.wo.weight
Parameter (name=transformer.h.11.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.ln_2.weight->transformer.layers.11.ffn_norm.weight
Parameter (name=transformer.h.11.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.mlp.w1.weight->transformer.layers.11.feed_forward.w1.weight
Parameter (name=transformer.h.11.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.mlp.w2.weight->transformer.layers.11.feed_forward.w3.weight
Parameter (name=transformer.h.11.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.mlp.c_proj.weight->transformer.layers.11.feed_forward.w2.weight
Parameter (name=transformer.h.12.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.ln_1.weight->transformer.layers.12.attention_norm.weight
Parameter (name=transformer.h.12.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.attn.c_attn.weight->transformer.layers.12.attn.c_attn.weight
Parameter (name=transformer.h.12.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.attn.c_attn.bias->transformer.layers.12.attn.c_attn.bias
Parameter (name=transformer.h.12.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.attn.c_proj.weight->transformer.layers.12.attention.wo.weight
Parameter (name=transformer.h.12.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.ln_2.weight->transformer.layers.12.ffn_norm.weight
Parameter (name=transformer.h.12.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.mlp.w1.weight->transformer.layers.12.feed_forward.w1.weight
Parameter (name=transformer.h.12.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.mlp.w2.weight->transformer.layers.12.feed_forward.w3.weight
Parameter (name=transformer.h.12.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.mlp.c_proj.weight->transformer.layers.12.feed_forward.w2.weight
Parameter (name=transformer.h.13.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.ln_1.weight->transformer.layers.13.attention_norm.weight
Parameter (name=transformer.h.13.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.attn.c_attn.weight->transformer.layers.13.attn.c_attn.weight
Parameter (name=transformer.h.13.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.attn.c_attn.bias->transformer.layers.13.attn.c_attn.bias
Parameter (name=transformer.h.13.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.attn.c_proj.weight->transformer.layers.13.attention.wo.weight
Parameter (name=transformer.h.13.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.ln_2.weight->transformer.layers.13.ffn_norm.weight
Parameter (name=transformer.h.13.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.mlp.w1.weight->transformer.layers.13.feed_forward.w1.weight
Parameter (name=transformer.h.13.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.mlp.w2.weight->transformer.layers.13.feed_forward.w3.weight
Parameter (name=transformer.h.13.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.mlp.c_proj.weight->transformer.layers.13.feed_forward.w2.weight
Parameter (name=transformer.h.14.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.ln_1.weight->transformer.layers.14.attention_norm.weight
Parameter (name=transformer.h.14.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.attn.c_attn.weight->transformer.layers.14.attn.c_attn.weight
Parameter (name=transformer.h.14.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.attn.c_attn.bias->transformer.layers.14.attn.c_attn.bias
Parameter (name=transformer.h.14.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.attn.c_proj.weight->transformer.layers.14.attention.wo.weight
Parameter (name=transformer.h.14.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.ln_2.weight->transformer.layers.14.ffn_norm.weight
Parameter (name=transformer.h.14.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.mlp.w1.weight->transformer.layers.14.feed_forward.w1.weight
Parameter (name=transformer.h.14.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.mlp.w2.weight->transformer.layers.14.feed_forward.w3.weight
Parameter (name=transformer.h.14.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.mlp.c_proj.weight->transformer.layers.14.feed_forward.w2.weight
Parameter (name=transformer.h.15.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.ln_1.weight->transformer.layers.15.attention_norm.weight
Parameter (name=transformer.h.15.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.attn.c_attn.weight->transformer.layers.15.attn.c_attn.weight
Parameter (name=transformer.h.15.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.attn.c_attn.bias->transformer.layers.15.attn.c_attn.bias
Parameter (name=transformer.h.15.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.attn.c_proj.weight->transformer.layers.15.attention.wo.weight
Parameter (name=transformer.h.15.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.ln_2.weight->transformer.layers.15.ffn_norm.weight
Parameter (name=transformer.h.15.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.mlp.w1.weight->transformer.layers.15.feed_forward.w1.weight
Parameter (name=transformer.h.15.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.mlp.w2.weight->transformer.layers.15.feed_forward.w3.weight
Parameter (name=transformer.h.15.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.mlp.c_proj.weight->transformer.layers.15.feed_forward.w2.weight
Parameter (name=transformer.h.16.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.ln_1.weight->transformer.layers.16.attention_norm.weight
Parameter (name=transformer.h.16.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.attn.c_attn.weight->transformer.layers.16.attn.c_attn.weight
Parameter (name=transformer.h.16.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.attn.c_attn.bias->transformer.layers.16.attn.c_attn.bias
Parameter (name=transformer.h.16.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.attn.c_proj.weight->transformer.layers.16.attention.wo.weight
Parameter (name=transformer.h.16.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.ln_2.weight->transformer.layers.16.ffn_norm.weight
Parameter (name=transformer.h.16.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.mlp.w1.weight->transformer.layers.16.feed_forward.w1.weight
Parameter (name=transformer.h.16.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.mlp.w2.weight->transformer.layers.16.feed_forward.w3.weight
Parameter (name=transformer.h.16.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.mlp.c_proj.weight->transformer.layers.16.feed_forward.w2.weight
Parameter (name=transformer.h.17.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.ln_1.weight->transformer.layers.17.attention_norm.weight
Parameter (name=transformer.h.17.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.attn.c_attn.weight->transformer.layers.17.attn.c_attn.weight
Parameter (name=transformer.h.17.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.attn.c_attn.bias->transformer.layers.17.attn.c_attn.bias
Parameter (name=transformer.h.17.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.attn.c_proj.weight->transformer.layers.17.attention.wo.weight
Parameter (name=transformer.h.17.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.ln_2.weight->transformer.layers.17.ffn_norm.weight
Parameter (name=transformer.h.17.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.mlp.w1.weight->transformer.layers.17.feed_forward.w1.weight
Parameter (name=transformer.h.17.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.mlp.w2.weight->transformer.layers.17.feed_forward.w3.weight
Parameter (name=transformer.h.17.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.mlp.c_proj.weight->transformer.layers.17.feed_forward.w2.weight
Parameter (name=transformer.h.18.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.ln_1.weight->transformer.layers.18.attention_norm.weight
Parameter (name=transformer.h.18.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.attn.c_attn.weight->transformer.layers.18.attn.c_attn.weight
Parameter (name=transformer.h.18.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.attn.c_attn.bias->transformer.layers.18.attn.c_attn.bias
Parameter (name=transformer.h.18.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.attn.c_proj.weight->transformer.layers.18.attention.wo.weight
Parameter (name=transformer.h.18.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.ln_2.weight->transformer.layers.18.ffn_norm.weight
Parameter (name=transformer.h.18.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.mlp.w1.weight->transformer.layers.18.feed_forward.w1.weight
Parameter (name=transformer.h.18.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.mlp.w2.weight->transformer.layers.18.feed_forward.w3.weight
Parameter (name=transformer.h.18.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.mlp.c_proj.weight->transformer.layers.18.feed_forward.w2.weight
Parameter (name=transformer.h.19.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.ln_1.weight->transformer.layers.19.attention_norm.weight
Parameter (name=transformer.h.19.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.attn.c_attn.weight->transformer.layers.19.attn.c_attn.weight
Parameter (name=transformer.h.19.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.attn.c_attn.bias->transformer.layers.19.attn.c_attn.bias
Parameter (name=transformer.h.19.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.attn.c_proj.weight->transformer.layers.19.attention.wo.weight
Parameter (name=transformer.h.19.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.ln_2.weight->transformer.layers.19.ffn_norm.weight
Parameter (name=transformer.h.19.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.mlp.w1.weight->transformer.layers.19.feed_forward.w1.weight
Parameter (name=transformer.h.19.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.mlp.w2.weight->transformer.layers.19.feed_forward.w3.weight
Parameter (name=transformer.h.19.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.mlp.c_proj.weight->transformer.layers.19.feed_forward.w2.weight
Parameter (name=transformer.h.20.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.ln_1.weight->transformer.layers.20.attention_norm.weight
Parameter (name=transformer.h.20.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.attn.c_attn.weight->transformer.layers.20.attn.c_attn.weight
Parameter (name=transformer.h.20.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.attn.c_attn.bias->transformer.layers.20.attn.c_attn.bias
Parameter (name=transformer.h.20.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.attn.c_proj.weight->transformer.layers.20.attention.wo.weight
Parameter (name=transformer.h.20.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.ln_2.weight->transformer.layers.20.ffn_norm.weight
Parameter (name=transformer.h.20.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.mlp.w1.weight->transformer.layers.20.feed_forward.w1.weight
Parameter (name=transformer.h.20.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.mlp.w2.weight->transformer.layers.20.feed_forward.w3.weight
Parameter (name=transformer.h.20.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.mlp.c_proj.weight->transformer.layers.20.feed_forward.w2.weight
Parameter (name=transformer.h.21.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.ln_1.weight->transformer.layers.21.attention_norm.weight
Parameter (name=transformer.h.21.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.attn.c_attn.weight->transformer.layers.21.attn.c_attn.weight
Parameter (name=transformer.h.21.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.attn.c_attn.bias->transformer.layers.21.attn.c_attn.bias
Parameter (name=transformer.h.21.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.attn.c_proj.weight->transformer.layers.21.attention.wo.weight
Parameter (name=transformer.h.21.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.ln_2.weight->transformer.layers.21.ffn_norm.weight
Parameter (name=transformer.h.21.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.mlp.w1.weight->transformer.layers.21.feed_forward.w1.weight
Parameter (name=transformer.h.21.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.mlp.w2.weight->transformer.layers.21.feed_forward.w3.weight
Parameter (name=transformer.h.21.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.mlp.c_proj.weight->transformer.layers.21.feed_forward.w2.weight
Parameter (name=transformer.h.22.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.ln_1.weight->transformer.layers.22.attention_norm.weight
Parameter (name=transformer.h.22.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.attn.c_attn.weight->transformer.layers.22.attn.c_attn.weight
Parameter (name=transformer.h.22.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.attn.c_attn.bias->transformer.layers.22.attn.c_attn.bias
Parameter (name=transformer.h.22.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.attn.c_proj.weight->transformer.layers.22.attention.wo.weight
Parameter (name=transformer.h.22.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.ln_2.weight->transformer.layers.22.ffn_norm.weight
Parameter (name=transformer.h.22.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.mlp.w1.weight->transformer.layers.22.feed_forward.w1.weight
Parameter (name=transformer.h.22.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.mlp.w2.weight->transformer.layers.22.feed_forward.w3.weight
Parameter (name=transformer.h.22.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.mlp.c_proj.weight->transformer.layers.22.feed_forward.w2.weight
Parameter (name=transformer.h.23.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.ln_1.weight->transformer.layers.23.attention_norm.weight
Parameter (name=transformer.h.23.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.attn.c_attn.weight->transformer.layers.23.attn.c_attn.weight
Parameter (name=transformer.h.23.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.attn.c_attn.bias->transformer.layers.23.attn.c_attn.bias
Parameter (name=transformer.h.23.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.attn.c_proj.weight->transformer.layers.23.attention.wo.weight
Parameter (name=transformer.h.23.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.ln_2.weight->transformer.layers.23.ffn_norm.weight
Parameter (name=transformer.h.23.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.mlp.w1.weight->transformer.layers.23.feed_forward.w1.weight
Parameter (name=transformer.h.23.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.mlp.w2.weight->transformer.layers.23.feed_forward.w3.weight
Parameter (name=transformer.h.23.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.mlp.c_proj.weight->transformer.layers.23.feed_forward.w2.weight
Parameter (name=transformer.h.24.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.ln_1.weight->transformer.layers.24.attention_norm.weight
Parameter (name=transformer.h.24.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.attn.c_attn.weight->transformer.layers.24.attn.c_attn.weight
Parameter (name=transformer.h.24.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.attn.c_attn.bias->transformer.layers.24.attn.c_attn.bias
Parameter (name=transformer.h.24.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.attn.c_proj.weight->transformer.layers.24.attention.wo.weight
Parameter (name=transformer.h.24.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.ln_2.weight->transformer.layers.24.ffn_norm.weight
Parameter (name=transformer.h.24.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.mlp.w1.weight->transformer.layers.24.feed_forward.w1.weight
Parameter (name=transformer.h.24.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.mlp.w2.weight->transformer.layers.24.feed_forward.w3.weight
Parameter (name=transformer.h.24.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.mlp.c_proj.weight->transformer.layers.24.feed_forward.w2.weight
Parameter (name=transformer.h.25.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.ln_1.weight->transformer.layers.25.attention_norm.weight
Parameter (name=transformer.h.25.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.attn.c_attn.weight->transformer.layers.25.attn.c_attn.weight
Parameter (name=transformer.h.25.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.attn.c_attn.bias->transformer.layers.25.attn.c_attn.bias
Parameter (name=transformer.h.25.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.attn.c_proj.weight->transformer.layers.25.attention.wo.weight
Parameter (name=transformer.h.25.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.ln_2.weight->transformer.layers.25.ffn_norm.weight
Parameter (name=transformer.h.25.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.mlp.w1.weight->transformer.layers.25.feed_forward.w1.weight
Parameter (name=transformer.h.25.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.mlp.w2.weight->transformer.layers.25.feed_forward.w3.weight
Parameter (name=transformer.h.25.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.mlp.c_proj.weight->transformer.layers.25.feed_forward.w2.weight
Parameter (name=transformer.h.26.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.ln_1.weight->transformer.layers.26.attention_norm.weight
Parameter (name=transformer.h.26.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.attn.c_attn.weight->transformer.layers.26.attn.c_attn.weight
Parameter (name=transformer.h.26.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.attn.c_attn.bias->transformer.layers.26.attn.c_attn.bias
Parameter (name=transformer.h.26.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.attn.c_proj.weight->transformer.layers.26.attention.wo.weight
Parameter (name=transformer.h.26.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.ln_2.weight->transformer.layers.26.ffn_norm.weight
Parameter (name=transformer.h.26.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.mlp.w1.weight->transformer.layers.26.feed_forward.w1.weight
Parameter (name=transformer.h.26.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.mlp.w2.weight->transformer.layers.26.feed_forward.w3.weight
Parameter (name=transformer.h.26.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.mlp.c_proj.weight->transformer.layers.26.feed_forward.w2.weight
Parameter (name=transformer.h.27.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.ln_1.weight->transformer.layers.27.attention_norm.weight
Parameter (name=transformer.h.27.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.attn.c_attn.weight->transformer.layers.27.attn.c_attn.weight
Parameter (name=transformer.h.27.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.attn.c_attn.bias->transformer.layers.27.attn.c_attn.bias
Parameter (name=transformer.h.27.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.attn.c_proj.weight->transformer.layers.27.attention.wo.weight
Parameter (name=transformer.h.27.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.ln_2.weight->transformer.layers.27.ffn_norm.weight
Parameter (name=transformer.h.27.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.mlp.w1.weight->transformer.layers.27.feed_forward.w1.weight
Parameter (name=transformer.h.27.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.mlp.w2.weight->transformer.layers.27.feed_forward.w3.weight
Parameter (name=transformer.h.27.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.mlp.c_proj.weight->transformer.layers.27.feed_forward.w2.weight
Parameter (name=transformer.h.28.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.ln_1.weight->transformer.layers.28.attention_norm.weight
Parameter (name=transformer.h.28.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.attn.c_attn.weight->transformer.layers.28.attn.c_attn.weight
Parameter (name=transformer.h.28.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.attn.c_attn.bias->transformer.layers.28.attn.c_attn.bias
Parameter (name=transformer.h.28.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.attn.c_proj.weight->transformer.layers.28.attention.wo.weight
Parameter (name=transformer.h.28.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.ln_2.weight->transformer.layers.28.ffn_norm.weight
Parameter (name=transformer.h.28.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.mlp.w1.weight->transformer.layers.28.feed_forward.w1.weight
Parameter (name=transformer.h.28.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.mlp.w2.weight->transformer.layers.28.feed_forward.w3.weight
Parameter (name=transformer.h.28.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.mlp.c_proj.weight->transformer.layers.28.feed_forward.w2.weight
Parameter (name=transformer.h.29.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.ln_1.weight->transformer.layers.29.attention_norm.weight
Parameter (name=transformer.h.29.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.attn.c_attn.weight->transformer.layers.29.attn.c_attn.weight
Parameter (name=transformer.h.29.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.attn.c_attn.bias->transformer.layers.29.attn.c_attn.bias
Parameter (name=transformer.h.29.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.attn.c_proj.weight->transformer.layers.29.attention.wo.weight
Parameter (name=transformer.h.29.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.ln_2.weight->transformer.layers.29.ffn_norm.weight
Parameter (name=transformer.h.29.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.mlp.w1.weight->transformer.layers.29.feed_forward.w1.weight
Parameter (name=transformer.h.29.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.mlp.w2.weight->transformer.layers.29.feed_forward.w3.weight
Parameter (name=transformer.h.29.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.mlp.c_proj.weight->transformer.layers.29.feed_forward.w2.weight
Parameter (name=transformer.h.30.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.ln_1.weight->transformer.layers.30.attention_norm.weight
Parameter (name=transformer.h.30.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.attn.c_attn.weight->transformer.layers.30.attn.c_attn.weight
Parameter (name=transformer.h.30.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.attn.c_attn.bias->transformer.layers.30.attn.c_attn.bias
Parameter (name=transformer.h.30.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.attn.c_proj.weight->transformer.layers.30.attention.wo.weight
Parameter (name=transformer.h.30.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.ln_2.weight->transformer.layers.30.ffn_norm.weight
Parameter (name=transformer.h.30.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.mlp.w1.weight->transformer.layers.30.feed_forward.w1.weight
Parameter (name=transformer.h.30.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.mlp.w2.weight->transformer.layers.30.feed_forward.w3.weight
Parameter (name=transformer.h.30.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.mlp.c_proj.weight->transformer.layers.30.feed_forward.w2.weight
Parameter (name=transformer.h.31.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.ln_1.weight->transformer.layers.31.attention_norm.weight
Parameter (name=transformer.h.31.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.attn.c_attn.weight->transformer.layers.31.attn.c_attn.weight
Parameter (name=transformer.h.31.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.attn.c_attn.bias->transformer.layers.31.attn.c_attn.bias
Parameter (name=transformer.h.31.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.attn.c_proj.weight->transformer.layers.31.attention.wo.weight
Parameter (name=transformer.h.31.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.ln_2.weight->transformer.layers.31.ffn_norm.weight
Parameter (name=transformer.h.31.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.mlp.w1.weight->transformer.layers.31.feed_forward.w1.weight
Parameter (name=transformer.h.31.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.mlp.w2.weight->transformer.layers.31.feed_forward.w3.weight
Parameter (name=transformer.h.31.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.mlp.c_proj.weight->transformer.layers.31.feed_forward.w2.weight
Parameter (name=transformer.ln_f.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
Parameter (name=lm_head.weight, shape=torch.Size([151936, 4096]), dtype=torch.float32, requires_grad=True)
Saving converted weights to /data/qwen/models/Qwen-7B-Chat/qwen-7b-chat.ckpt...
Done

配置路径,启动推理脚本。

cd /data/qwen/mindformers/research/qwen

export PYTHONPATH=/data/qwen/mindformers:$PYTHONPATH

python3 infer_qwen.py

/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.return self._float_to_str(self.smallest_subnormal)
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.return self._float_to_str(self.smallest_subnormal)
[Warning]Can not find libascendalog.so
[Warning]Can not find libascendalog.so
Traceback (most recent call last):File "/data/qwen/mindformers/research/qwen/infer_qwen.py", line 4, in <module>from mindformers.trainer import TrainerFile "/data/qwen/mindformers/mindformers/__init__.py", line 17, in <module>from mindformers import core, auto_class, dataset, \File "/data/qwen/mindformers/mindformers/core/__init__.py", line 19, in <module>from .metric import build_metricFile "/data/qwen/mindformers/mindformers/core/metric/__init__.py", line 17, in <module>from .metric import *File "/data/qwen/mindformers/mindformers/core/metric/metric.py", line 37, in <module>from mindformers.models import BasicTokenizerFile "/data/qwen/mindformers/mindformers/models/__init__.py", line 21, in <module>from .blip2 import *File "/data/qwen/mindformers/mindformers/models/blip2/__init__.py", line 17, in <module>from .blip2_config import Blip2ConfigFile "/data/qwen/mindformers/mindformers/models/blip2/blip2_config.py", line 23, in <module>from mindformers.models.llama import LlamaConfigFile "/data/qwen/mindformers/mindformers/models/llama/__init__.py", line 18, in <module>from .llama import LlamaForCausalLM, LlamaForCausalLMWithLora, LlamaModelFile "/data/qwen/mindformers/mindformers/models/llama/llama.py", line 30, in <module>from mindspore.nn.layer.flash_attention import FlashAttentionFile "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/nn/layer/flash_attention.py", line 24, in <module>from mindspore.ops._op_impl._custom_op.flash_attention.flash_attention_impl import get_flash_attentionFile "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/__init__.py", line 17, in <module>from mindspore.ops._op_impl._custom_op.dsd_impl import dsd_matmulFile "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/dsd_impl.py", line 17, in <module>from te import tikFile "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/te/__init__.py", line 128, in <module>from tbe import tvmFile "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/__init__.py", line 44, in <module>import tvmFile "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/__init__.py", line 26, in <module>from ._ffi.base import TVMError, __version__, _RUNTIME_ONLYFile "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/__init__.py", line 28, in <module>from .base import register_errorFile "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/base.py", line 72, in <module>_LIB, _LIB_NAME = _load_lib()File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/base.py", line 52, in _load_liblib_path = libinfo.find_lib_path()File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/libinfo.py", line 147, in find_lib_pathraise RuntimeError(message)
RuntimeError: Cannot find the files.
List of candidates:
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/lib/plugin/cpu/libtvm.so
/usr/local/Ascend/driver/libtvm.so
/data/qwen/mindformers/research/qwen/libtvm.so
/usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/bin/libtvm.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/aarch64-linux/ccec_compiler/bin/libtvm.so
/root/miniconda3/envs/mindspore2.2_py39/bin/libtvm.so
/root/miniconda3/condabin/libtvm.so
/usr/local/sbin/libtvm.so
/usr/local/bin/libtvm.so
/usr/sbin/libtvm.so
/usr/bin/libtvm.so
/usr/sbin/libtvm.so
/usr/bin/libtvm.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/libtvm.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/libtvm.so
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/lib/plugin/cpu/libtvm_runtime.so
/usr/local/Ascend/driver/libtvm_runtime.so
/data/qwen/mindformers/research/qwen/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/bin/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/aarch64-linux/ccec_compiler/bin/libtvm_runtime.so
/root/miniconda3/envs/mindspore2.2_py39/bin/libtvm_runtime.so
/root/miniconda3/condabin/libtvm_runtime.so
/usr/local/sbin/libtvm_runtime.so
/usr/local/bin/libtvm_runtime.so
/usr/sbin/libtvm_runtime.so
/usr/bin/libtvm_runtime.so
/usr/sbin/libtvm_runtime.so
/usr/bin/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/libtvm_runtime.so

报错信息,应该是和配置芯片架构中缺少的文件,当前不做深入探究。

这篇关于麒麟系统SP2 与昇腾300I芯片测试qwen7B模型记录的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/516092

相关文章

JWT + 拦截器实现无状态登录系统

《JWT+拦截器实现无状态登录系统》JWT(JSONWebToken)提供了一种无状态的解决方案:用户登录后,服务器返回一个Token,后续请求携带该Token即可完成身份验证,无需服务器存储会话... 目录✅ 引言 一、JWT 是什么? 二、技术选型 三、项目结构 四、核心代码实现4.1 添加依赖(pom

基于Python实现自动化邮件发送系统的完整指南

《基于Python实现自动化邮件发送系统的完整指南》在现代软件开发和自动化流程中,邮件通知是一个常见且实用的功能,无论是用于发送报告、告警信息还是用户提醒,通过Python实现自动化的邮件发送功能都能... 目录一、前言:二、项目概述三、配置文件 `.env` 解析四、代码结构解析1. 导入模块2. 加载环

linux系统上安装JDK8全过程

《linux系统上安装JDK8全过程》文章介绍安装JDK的必要性及Linux下JDK8的安装步骤,包括卸载旧版本、下载解压、配置环境变量等,强调开发需JDK,运行可选JRE,现JDK已集成JRE... 目录为什么要安装jdk?1.查看linux系统是否有自带的jdk:2.下载jdk压缩包2.解压3.配置环境

Linux查询服务器系统版本号的多种方法

《Linux查询服务器系统版本号的多种方法》在Linux系统管理和维护工作中,了解当前操作系统的版本信息是最基础也是最重要的操作之一,系统版本不仅关系到软件兼容性、安全更新策略,还直接影响到故障排查和... 目录一、引言:系统版本查询的重要性二、基础命令解析:cat /etc/Centos-release详

更改linux系统的默认Python版本方式

《更改linux系统的默认Python版本方式》通过删除原Python软链接并创建指向python3.6的新链接,可切换系统默认Python版本,需注意版本冲突、环境混乱及维护问题,建议使用pyenv... 目录更改系统的默认python版本软链接软链接的特点创建软链接的命令使用场景注意事项总结更改系统的默

基于Spring Boot 的小区人脸识别与出入记录管理系统功能

《基于SpringBoot的小区人脸识别与出入记录管理系统功能》文章介绍基于SpringBoot框架与百度AI人脸识别API的小区出入管理系统,实现自动识别、记录及查询功能,涵盖技术选型、数据模型... 目录系统功能概述技术栈选择核心依赖配置数据模型设计出入记录实体类出入记录查询表单出入记录 VO 类(用于

基于Python Playwright进行前端性能测试的脚本实现

《基于PythonPlaywright进行前端性能测试的脚本实现》在当今Web应用开发中,性能优化是提升用户体验的关键因素之一,本文将介绍如何使用Playwright构建一个自动化性能测试工具,希望... 目录引言工具概述整体架构核心实现解析1. 浏览器初始化2. 性能数据收集3. 资源分析4. 关键性能指

在Linux系统上连接GitHub的方法步骤(适用2025年)

《在Linux系统上连接GitHub的方法步骤(适用2025年)》在2025年,使用Linux系统连接GitHub的推荐方式是通过SSH(SecureShell)协议进行身份验证,这种方式不仅安全,还... 目录步骤一:检查并安装 Git步骤二:生成 SSH 密钥步骤三:将 SSH 公钥添加到 github

java中pdf模版填充表单踩坑实战记录(itextPdf、openPdf、pdfbox)

《java中pdf模版填充表单踩坑实战记录(itextPdf、openPdf、pdfbox)》:本文主要介绍java中pdf模版填充表单踩坑的相关资料,OpenPDF、iText、PDFBox是三... 目录准备Pdf模版方法1:itextpdf7填充表单(1)加入依赖(2)代码(3)遇到的问题方法2:pd

Linux系统中查询JDK安装目录的几种常用方法

《Linux系统中查询JDK安装目录的几种常用方法》:本文主要介绍Linux系统中查询JDK安装目录的几种常用方法,方法分别是通过update-alternatives、Java命令、环境变量及目... 目录方法 1:通过update-alternatives查询(推荐)方法 2:检查所有已安装的 JDK方