Internlm_xcomposer2模型结构解读

2024-06-10 18:52

本文主要是介绍Internlm_xcomposer2模型结构解读,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Internlm_xcomposer2模型结构解读

项目地址

Internlm_xcomposer2模型总体结构

<class 'transformers_modules.internlm-xcomposer2-4khd-7b.modeling_internlm_xcomposer2.InternLMXComposer2ForCausalLM'>
InternLMXComposer2ForCausalLM((model): InternLM2Model((tok_embeddings): Embedding(92544, 4096, padding_idx=2)(layers): ModuleList((0-31): 32 x InternLM2DecoderLayer((attention): InternLM2FlashAttention2((wqkv): PLoRA(in_features=4096, out_features=6144, bias=False(lora_dropout): Dropout(p=0.05, inplace=False)(Plora_A): Linear(in_features=4096, out_features=8, bias=False)(Plora_B): Linear(in_features=8, out_features=6144, bias=False))(wo): PLoRA(in_features=4096, out_features=4096, bias=False(lora_dropout): Dropout(p=0.05, inplace=False)(Plora_A): Linear(in_features=4096, out_features=256, bias=False)(Plora_B): Linear(in_features=256, out_features=4096, bias=False))(rotary_emb): InternLM2RotaryEmbedding())(feed_forward): InternLM2MLP((w1): PLoRA(in_features=4096, out_features=14336, bias=False(lora_dropout): Dropout(p=0.05, inplace=False)(Plora_A): Linear(in_features=4096, out_features=256, bias=False)(Plora_B): Linear(in_features=256, out_features=14336, bias=False))(w3): PLoRA(in_features=4096, out_features=14336, bias=False(lora_dropout): Dropout(p=0.05, inplace=False)(Plora_A): Linear(in_features=4096, out_features=256, bias=False)(Plora_B): Linear(in_features=256, out_features=14336, bias=False))(w2): PLoRA(in_features=14336, out_features=4096, bias=False(lora_dropout): Dropout(p=0.05, inplace=False)(Plora_A): Linear(in_features=14336, out_features=256, bias=False)(Plora_B): Linear(in_features=256, out_features=4096, bias=False))(act_fn): SiLUActivation())(attention_norm): InternLM2RMSNorm()(ffn_norm): InternLM2RMSNorm()))(norm): InternLM2RMSNorm())(output): Linear(in_features=4096, out_features=92544, bias=False)(vit): CLIPVisionTower((vision_tower): CLIPVisionModel((vision_model): CLIPVisionTransformer((embeddings): CLIPVisionEmbeddings((patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14), bias=False)(position_embedding): Embedding(577, 1024))(pre_layrnorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)(encoder): CLIPEncoder((layers): ModuleList((0-23): 24 x CLIPEncoderLayer((self_attn): CLIPAttention((k_proj): Linear(in_features=1024, out_features=1024, bias=True)(v_proj): Linear(in_features=1024, out_features=1024, bias=True)(q_proj): Linear(in_features=1024, out_features=1024, bias=True)(out_proj): Linear(in_features=1024, out_features=1024, bias=True))(layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)(mlp): CLIPMLP((activation_fn): QuickGELUActivation()(fc1): Linear(in_features=1024, out_features=4096, bias=True)(fc2): Linear(in_features=4096, out_features=1024, bias=True))(layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True))))(post_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True))))(vision_proj): Sequential((0): Linear(in_features=4096, out_features=4096, bias=True)(1): GELU(approximate='none')(2): Linear(in_features=4096, out_features=4096, bias=True))
)

Internlm_xcomposer2模型详细结构(下面是从输入到输出的顺序输出的每层的参数量)


plora_glb_GN: torch.Size([1, 1, 4096])
plora_sub_GN: torch.Size([1, 1, 1, 4096])
model.tok_embeddings.weight: torch.Size([92544, 4096])#主体层(接受文本和后面vit图像的输入)
model.layers.0.attention.wqkv.weight: torch.Size([6144, 4096])
model.layers.0.attention.wqkv.Plora_A.weight: torch.Size([8, 4096])
model.layers.0.attention.wqkv.Plora_B.weight: torch.Size([6144, 8])
model.layers.0.attention.wo.weight: torch.Size([4096, 4096])
model.layers.0.attention.wo.Plora_A.weight: torch.Size([256, 4096])
model.layers.0.attention.wo.Plora_B.weight: torch.Size([4096, 256])
model.layers.0.feed_forward.w1.weight: torch.Size([14336, 4096])
model.layers.0.feed_forward.w1.Plora_A.weight: torch.Size([256, 4096])
model.layers.0.feed_forward.w1.Plora_B.weight: torch.Size([14336, 256])
model.layers.0.feed_forward.w3.weight: torch.Size([14336, 4096])
model.layers.0.feed_forward.w3.Plora_A.weight: torch.Size([256, 4096])
model.layers.0.feed_forward.w3.Plora_B.weight: torch.Size([14336, 256])
model.layers.0.feed_forward.w2.weight: torch.Size([4096, 14336])
model.layers.0.feed_forward.w2.Plora_A.weight: torch.Size([256, 14336])
model.layers.0.feed_forward.w2.Plora_B.weight: torch.Size([4096, 256])
model.layers.0.attention_norm.weight: torch.Size([4096])
model.layers.0.ffn_norm.weight: torch.Size([4096])...32个model.layers.层,这里省略model.layers.1----model.layers.30model.layers.31.attention.wqkv.weight: torch.Size([6144, 4096])
model.layers.31.attention.wqkv.Plora_A.weight: torch.Size([8, 4096])
model.layers.31.attention.wqkv.Plora_B.weight: torch.Size([6144, 8])
model.layers.31.attention.wo.weight: torch.Size([4096, 4096])
model.layers.31.attention.wo.Plora_A.weight: torch.Size([256, 4096])
model.layers.31.attention.wo.Plora_B.weight: torch.Size([4096, 256])
model.layers.31.feed_forward.w1.weight: torch.Size([14336, 4096])
model.layers.31.feed_forward.w1.Plora_A.weight: torch.Size([256, 4096])
model.layers.31.feed_forward.w1.Plora_B.weight: torch.Size([14336, 256])
model.layers.31.feed_forward.w3.weight: torch.Size([14336, 4096])
model.layers.31.feed_forward.w3.Plora_A.weight: torch.Size([256, 4096])
model.layers.31.feed_forward.w3.Plora_B.weight: torch.Size([14336, 256])
model.layers.31.feed_forward.w2.weight: torch.Size([4096, 14336])
model.layers.31.feed_forward.w2.Plora_A.weight: torch.Size([256, 14336])
model.layers.31.feed_forward.w2.Plora_B.weight: torch.Size([4096, 256])
model.layers.31.attention_norm.weight: torch.Size([4096])
model.layers.31.ffn_norm.weight: torch.Size([4096])#输出层
model.norm.weight: torch.Size([4096])
output.weight: torch.Size([92544, 4096])vit.vision_tower.vision_model.embeddings.class_embedding: torch.Size([1024])
vit.vision_tower.vision_model.embeddings.patch_embedding.weight: torch.Size([1024, 3, 14, 14])
vit.vision_tower.vision_model.embeddings.position_embedding.weight: torch.Size([577, 1024])
vit.vision_tower.vision_model.pre_layrnorm.weight: torch.Size([1024])
vit.vision_tower.vision_model.pre_layrnorm.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight: torch.Size([4096, 1024])
vit.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias: torch.Size([4096])
vit.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight: torch.Size([1024, 4096])
vit.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias: torch.Size([1024])...24个vit.vision_tower.vision_model.encoder.layers层,这里省略vit.vision_tower.vision_model.encoder.layers.1----vit.vision_tower.vision_model.encoder.layers.22vit.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight: torch.Size([1024, 1024])
vit.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight: torch.Size([4096, 1024])
vit.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias: torch.Size([4096])
vit.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight: torch.Size([1024, 4096])
vit.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight: torch.Size([1024])
vit.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias: torch.Size([1024])
vit.vision_tower.vision_model.post_layernorm.weight: torch.Size([1024])
vit.vision_tower.vision_model.post_layernorm.bias: torch.Size([1024])#对齐到model.layers.0层的vision_proj
vision_proj.0.weight: torch.Size([4096, 4096])
vision_proj.0.bias: torch.Size([4096])
vision_proj.2.weight: torch.Size([4096, 4096])
vision_proj.2.bias: torch.Size([4096])

这篇关于Internlm_xcomposer2模型结构解读的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1048985

相关文章

Linux jq命令的使用解读

《Linuxjq命令的使用解读》jq是一个强大的命令行工具,用于处理JSON数据,它可以用来查看、过滤、修改、格式化JSON数据,通过使用各种选项和过滤器,可以实现复杂的JSON处理任务... 目录一. 简介二. 选项2.1.2.2-c2.3-r2.4-R三. 字段提取3.1 普通字段3.2 数组字段四.

Redis中Set结构使用过程与原理说明

《Redis中Set结构使用过程与原理说明》本文解析了RedisSet数据结构,涵盖其基本操作(如添加、查找)、集合运算(交并差)、底层实现(intset与hashtable自动切换机制)、典型应用场... 目录开篇:从购物车到Redis Set一、Redis Set的基本操作1.1 编程常用命令1.2 集

MySQL之搜索引擎使用解读

《MySQL之搜索引擎使用解读》MySQL存储引擎是数据存储和管理的核心组件,不同引擎(如InnoDB、MyISAM)采用不同机制,InnoDB支持事务与行锁,适合高并发场景;MyISAM不支持事务,... 目录mysql的存储引擎是什么MySQL存储引擎的功能MySQL的存储引擎的分类查看存储引擎1.命令

Spring的基础事务注解@Transactional作用解读

《Spring的基础事务注解@Transactional作用解读》文章介绍了Spring框架中的事务管理,核心注解@Transactional用于声明事务,支持传播机制、隔离级别等配置,结合@Tran... 目录一、事务管理基础1.1 Spring事务的核心注解1.2 注解属性详解1.3 实现原理二、事务事

Linux五种IO模型的使用解读

《Linux五种IO模型的使用解读》文章系统解析了Linux的五种IO模型(阻塞、非阻塞、IO复用、信号驱动、异步),重点区分同步与异步IO的本质差异,强调同步由用户发起,异步由内核触发,通过对比各模... 目录1.IO模型简介2.五种IO模型2.1 IO模型分析方法2.2 阻塞IO2.3 非阻塞IO2.4

MySQL8.0临时表空间的使用及解读

《MySQL8.0临时表空间的使用及解读》MySQL8.0+引入会话级(temp_N.ibt)和全局(ibtmp1)InnoDB临时表空间,用于存储临时数据及事务日志,自动创建与回收,重启释放,管理高... 目录一、核心概念:为什么需要“临时表空间”?二、InnoDB 临时表空间的两种类型1. 会话级临时表

Vite 打包目录结构自定义配置小结

《Vite打包目录结构自定义配置小结》在Vite工程开发中,默认打包后的dist目录资源常集中在asset目录下,不利于资源管理,本文基于Rollup配置原理,本文就来介绍一下通过Vite配置自定义... 目录一、实现原理二、具体配置步骤1. 基础配置文件2. 配置说明(1)js 资源分离(2)非 JS 资

Java集合中的链表与结构详解

《Java集合中的链表与结构详解》链表是一种物理存储结构上非连续的存储结构,数据元素的逻辑顺序的通过链表中的引用链接次序实现,文章对比ArrayList与LinkedList的结构差异,详细讲解了链表... 目录一、链表概念与结构二、当向单链表的实现2.1 准备工作2.2 初始化链表2.3 打印数据、链表长

创建springBoot模块没有目录结构的解决方案

《创建springBoot模块没有目录结构的解决方案》2023版IntelliJIDEA创建模块时可能出现目录结构识别错误,导致文件显示异常,解决方法为选择模块后点击确认,重新校准项目结构设置,确保源... 目录创建spChina编程ringBoot模块没有目录结构解决方案总结创建springBoot模块没有目录

SpringBoot利用树形结构优化查询速度

《SpringBoot利用树形结构优化查询速度》这篇文章主要为大家详细介绍了SpringBoot利用树形结构优化查询速度,文中的示例代码讲解详细,感兴趣的小伙伴可以跟随小编一起学习一下... 目录一个真实的性能灾难传统方案为什么这么慢N+1查询灾难性能测试数据对比核心解决方案:一次查询 + O(n)算法解决