深度学习构建肿瘤依赖性图谱

2024-01-31 16:30

本文主要是介绍深度学习构建肿瘤依赖性图谱,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

来源于论文

Predicting and characterizing a cancer dependency map oftumors with deep learning

代码地址:Code Ocean

大家好呀!今天给大家介绍一篇2021年发表在Science Advances上的文章。

全基因组功能缺失筛查揭示了对癌细胞增殖十分重要的基因,称为肿瘤依赖性

然而将肿瘤依赖性关系癌细胞的分子组成联系起来并进一步与肿瘤联系起来还是一个巨大的挑战。

本研究,作者提出了tensorflow框架的深度学习模型Deep—DEP

版本要求:

  • tensorflow:1.4.0
  • python3.5.2
  • cuda8.0.61
  • cudnn6.0.21
  • h5py==2.7.1
  • keras==1.2.2

首先作者队该模型使用无标签的肿瘤基因组(CCL)进行无监督预训练然后保存权重。

无监督预训练(训练集与label一致,带激活函数)

模型流程图:

 

作者使用三个独立数据集验证DeepDEP的性能。通过系统的模型解释,作者扩展了当前的癌症依赖性图谱。将DeepDEP应用于泛癌的肿瘤基因组数据并首次构建了具有临床相关性的泛癌依赖性图谱。总的来说,DeepDEP作为一种新的工具可以用于研究癌症依赖性。

无监督预训练

# Pretrain an autoencoder (AE) of tumor genomics (TCGA) to be used to initialize DeepDEP model training
print("\n\nStarting to run PretrainAE.py with a demo example of gene mutation data of 50 TCGA tumors...")import pickle
from keras import models
from keras.layers import Dense, Merge
from keras.callbacks import EarlyStopping
import numpy as np
import timedef load_data(filename):data = []gene_names = []data_labels = []lines = open(filename).readlines()#readlines读取全内容sample_names = lines[0].replace('\n', '').split('\t')[1:]#replace将空格替换  #拆分字符串。dx = 1for line in lines[dx:]:values = line.replace('\n', '').split('\t')gene = str.upper(values[0]) #upper将字符串中的小写字母转为大写字母。gene_names.append(gene)data.append(values[1:])data = np.array(data, dtype='float32')data = np.transpose(data)return data, data_labels, sample_names, gene_namesdef AE_dense_3layers(input_dim, first_layer_dim, second_layer_dim, third_layer_dim, activation_func, init='he_uniform'):print('input_dim = ', input_dim)print('first_layer_dim = ', first_layer_dim)print('second_layer_dim = ', second_layer_dim)print('third_layer_dim = ', third_layer_dim)print('init = ', init)model = models.Sequential()model.add(Dense(output_dim = first_layer_dim, input_dim = input_dim, activation = activation_func, init = init))model.add(Dense(output_dim = second_layer_dim, input_dim = first_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = third_layer_dim, input_dim = second_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = second_layer_dim, input_dim = third_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = first_layer_dim, input_dim = second_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = input_dim, input_dim = first_layer_dim, activation = activation_func, init = init))return modeldef save_weight_to_pickle(model, file_name):print('saving weights')weight_list = []for layer in model.layers:weight_list.append(layer.get_weights())with open(file_name, 'wb') as handle:pickle.dump(weight_list, handle)if __name__ == '__main__':# load TCGA mutation data, substitute here with other genomicsdata_mut_tcga, data_labels_mut_tcga, sample_names_mut_tcga, gene_names_mut_tcga = load_data(r"D:\DEPOI\data/tcga_mut_data_paired_with_ccl.txt")print("\n\nDatasets successfully loaded.")samples_to_predict = np.arange(0, 50)# predict the first 50 samples for DEMO ONLY, for all samples please substitute 50 by data_mut_tcga.shape[0]# prediction results of all 8238 TCGA samples can be found in /data/premodel_tcga_*.pickleprint()input_dim = data_mut_tcga.shape[1]first_layer_dim = 1000second_layer_dim = 100third_layer_dim = 50batch_size = 64epoch_size = 100activation_function = 'relu'init = 'he_uniform'model_save_name = "premodel_tcga_mut_%d_%d_%d" % (first_layer_dim, second_layer_dim, third_layer_dim)t = time.time()model = AE_dense_3layers(input_dim = input_dim, first_layer_dim = first_layer_dim, second_layer_dim=second_layer_dim, third_layer_dim=third_layer_dim, activation_func=activation_function, init=init)model.compile(loss = 'mse', optimizer = 'adam')model.fit(data_mut_tcga[samples_to_predict], data_mut_tcga[samples_to_predict], nb_epoch=epoch_size, batch_size=batch_size, shuffle=True)cost = model.evaluate(data_mut_tcga[samples_to_predict], data_mut_tcga[samples_to_predict], verbose = 0)print('\n\nAutoencoder training completed in %.1f mins.\n with testloss:%.4f' % ((time.time()-t)/60, cost))save_weight_to_pickle(model, r'D:\DEPOI/results/autoencoders/' + model_save_name + '_demo.pickle')print("\nResults saved in /results/autoencoders/%s_demo.pickle\n\n" % model_save_name)

经过无监督预训练后,保存权重到pickle文件,以后载入到训练模型上用

主训练

# Train, validate, and test single-, 2-, and full 4-omics DeepDEP models
print("\n\nStarting to run TrainNewModel.py with a demo example of 28 CCLs x 1298 DepOIs...")import pickle
from keras import models
from keras.layers import Dense, Merge
from keras.callbacks import EarlyStopping
import numpy as np
import time
from matplotlib import pyplot as pltif __name__ == '__main__':with open(r'D:\DEPOI/data/ccl_complete_data_278CCL_1298DepOI_360844samples.pickle', 'rb') as f:data_mut, data_exp, data_cna, data_meth, data_dep, data_fprint = pickle.load(f)# This pickle file is for DEMO ONLY (containing 28 CCLs x 1298 DepOIs = 36344 samples)!# First 1298 samples correspond to 1298 DepOIs of the first CCL, and so on.# For the complete data used in the paper (278 CCLs x 1298 DepOIs = 360844 samples),# please substitute by 'ccl_complete_data_278CCL_1298DepOI_360844samples.pickle',# to which a link can be found in README.md# Load autoencoders of each genomics that were pre-trained using 8238 TCGA samples# New autoencoders can be pretrained using PretrainAE.pypremodel_mut = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_mut_1000_100_50.pickle', 'rb'))premodel_exp = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_exp_500_200_50.pickle', 'rb'))premodel_cna = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_cna_500_200_50.pickle', 'rb'))premodel_meth = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_meth_500_200_50.pickle', 'rb'))print("\n\nDatasets successfully loaded.")activation_func = 'relu' # for all middle layersactivation_func2 = 'linear' # for output layer to output unbounded gene-effect scoresinit = 'he_uniform'dense_layer_dim = 250batch_size = 10000num_epoch = 100num_DepOI = 1298 # 1298 DepOIs as defined in our papernum_ccl = int(data_mut.shape[0]/num_DepOI)# 90% CCLs for training/validation, and 10% for testingid_rand = np.random.permutation(num_ccl)id_cell_train = id_rand[np.arange(0, round(num_ccl*0.9))]id_cell_test = id_rand[np.arange(round(num_ccl*0.9), num_ccl)]# print(id_cell_train)# prepare sample indices (selected CCLs x 1298 DepOIs)id_x=np.arange(0, 1298)id_y=id_cell_train[0]*1298id_train = np.arange(0, 1298) + id_cell_train[0]*1298for y in id_cell_train:id_train = np.union1d(id_train, np.arange(0, 1298) + y*1298)id_test = np.arange(0, 1298) + id_cell_test[0] * 1298for y in id_cell_test:id_test = np.union1d(id_test, np.arange(0, 1298) + y*1298)print("\n\nTraining/validation on %d samples (%d CCLs x %d DepOIs) and testing on %d samples (%d CCLs x %d DepOIs).\n\n" % (len(id_train), len(id_cell_train), num_DepOI, len(id_test), len(id_cell_test), num_DepOI))# Full 4-omic DeepDEP model, composed of 6 sub-networks:# model_mut, model_exp, model_cna, model_meth: to learn data embedding of each omics# model_gene: to learn data embedding of gene fingerprints (involvement of a gene in 3115 functions)# model_final: to merge the above 5 sub-networks and predict gene-effect scorest = time.time()# subnetwork of mutationsmodel_mut = models.Sequential()model_mut.add(Dense(output_dim=1000, input_dim=premodel_mut[0][0].shape[0], activation=activation_func,weights=premodel_mut[0], trainable=True))model_mut.add(Dense(output_dim=100, input_dim=1000, activation=activation_func, weights=premodel_mut[1],trainable=True))model_mut.add(Dense(output_dim=50, input_dim=100, activation=activation_func, weights=premodel_mut[2],trainable=True))# subnetwork of expressionmodel_exp = models.Sequential()model_exp.add(Dense(output_dim=500, input_dim=premodel_exp[0][0].shape[0], activation=activation_func,weights=premodel_exp[0], trainable=True))model_exp.add(Dense(output_dim=200, input_dim=500, activation=activation_func, weights=premodel_exp[1],trainable=True))model_exp.add(Dense(output_dim=50, input_dim=200, activation=activation_func, weights=premodel_exp[2],trainable=True))# subnetwork of copy number alterationsmodel_cna = models.Sequential()model_cna.add(Dense(output_dim=500, input_dim=premodel_cna[0][0].shape[0], activation=activation_func,weights=premodel_cna[0], trainable=True))model_cna.add(Dense(output_dim=200, input_dim=500, activation=activation_func, weights=premodel_cna[1],trainable=True))model_cna.add(Dense(output_dim=50, input_dim=200, activation=activation_func, weights=premodel_cna[2],trainable=True))# subnetwork of DNA methylationsmodel_meth = models.Sequential()model_meth.add(Dense(output_dim=500, input_dim=premodel_meth[0][0].shape[0], activation=activation_func,weights=premodel_meth[0], trainable=True))model_meth.add(Dense(output_dim=200, input_dim=500, activation=activation_func, weights=premodel_meth[1],trainable=True))model_meth.add(Dense(output_dim=50, input_dim=200, activation=activation_func, weights=premodel_meth[2],trainable=True))# subnetwork of gene fingerprintsmodel_gene = models.Sequential()model_gene.add(Dense(output_dim=1000, input_dim=data_fprint.shape[1], activation=activation_func, init=init,trainable=True))model_gene.add(Dense(output_dim=100, input_dim=1000, activation=activation_func, init=init, trainable=True))model_gene.add(Dense(output_dim=50, input_dim=100, activation=activation_func, init=init, trainable=True))# prediction networkmodel_final = models.Sequential()model_final.add(Merge([model_mut, model_exp, model_cna, model_meth, model_gene], mode='concat'))model_final.add(Dense(output_dim=dense_layer_dim, input_dim=250, activation=activation_func, init=init,trainable=True))model_final.add(Dense(output_dim=dense_layer_dim, input_dim=dense_layer_dim, activation=activation_func, init=init,trainable=True))model_final.add(Dense(output_dim=1, input_dim=dense_layer_dim, activation=activation_func2, init=init,trainable=True))# training with early stopping with 3 patiencehistory = EarlyStopping(monitor='val_loss', min_delta=0, patience=100, verbose=0, mode='min')model_final.compile(loss='mse', optimizer='adam')model_final.fit([data_mut[id_train], data_exp[id_train], data_cna[id_train], data_meth[id_train], data_fprint[id_train]],data_dep[id_train], nb_epoch=num_epoch, validation_split=1/9, batch_size=batch_size, shuffle=True,callbacks=[history])cost_testing = model_final.evaluate([data_mut[id_test], data_exp[id_test], data_cna[id_test], data_meth[id_test], data_fprint[id_test]],data_dep[id_test], verbose=0, batch_size=batch_size)print("\n\nFull DeepDEP model training completed in %.1f mins.\nloss:%.4f valloss:%.4f testloss:%.4f" % ((time.time() - t)/60,history.model.model.history.history['loss'][history.stopped_epoch],history.model.model.history.history['val_loss'][history.stopped_epoch], cost_testing))model_final.save(r'D:\DEPOI\results_cai/models/model_demo.h5')print("\n\nFull DeepDEP model saved in /results/models/model_demo.h5\n\n")
############################################################################################################################loss = history.model.model.history.history['loss']val_loss = history.model.model.history.history['val_loss']fig = plt.figure()plt.plot(loss, label="Training Loss")plt.plot(val_loss, label="Validation Loss")plt.title("Training and Validation Loss")plt.legend()fig.savefig("loss.png")plt.show()

预测:观察模型性能

# Predict TCGA (or other new) samples using a trained model
print("\n\nStarting to run PredictNewSamples.py with a demo example of 10 TCGA tumors...")import numpy as np
import pandas as pd
from keras import models
import time
import tensorflow as tf
import pickleif __name__ == '__main__':model_name = "model_demo"model_saved = models.load_model(r"D:\DEPOI\results_cai/models/%s.h5" % model_name)#D:\DEPOI\results_cai\models# model_paper is the full 4-omics DeepDEP model used in the paper# user can choose from single-omics, 2-omics, or full DeepDEP models from the# /data/full_results_models_paper/models/ directorywith open(r'D:\DEPOI/data/ccl_complete_data_28CCL_1298DepOI_36344samples_demo.pickle', 'rb') as f:data_mut, data_exp, data_cna, data_meth, data_dep, data_fprint = pickle.load(f)print("\n\nDatasets successfully loaded.\n\n")batch_size = 500# predict the first 10 samples for DEMO ONLY, for all samples please substitute 10 by data_mut_tcga.shape[0]# prediction results of all 8238 TCGA samples can be found in /data/full_results_models_paper/predictions/## t = time.time()y = data_depdata_pred_tmp = model_saved.predict([data_mut,data_exp,data_cna,data_meth,data_fprint], batch_size=batch_size, verbose=0)def MSE(y, t):return np.sum((y - t) ** 2)T = []T[:] = y[:, 0]P = []P[:] = data_pred_tmp[:,0]x =(MSE(np.array(P),np.array(T)).sum())X = x/(data_mut.shape[0])print(X)

这篇关于深度学习构建肿瘤依赖性图谱的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/664379

相关文章

Java中Redisson 的原理深度解析

《Java中Redisson的原理深度解析》Redisson是一个高性能的Redis客户端,它通过将Redis数据结构映射为Java对象和分布式对象,实现了在Java应用中方便地使用Redis,本文... 目录前言一、核心设计理念二、核心架构与通信层1. 基于 Netty 的异步非阻塞通信2. 编解码器三、

Java HashMap的底层实现原理深度解析

《JavaHashMap的底层实现原理深度解析》HashMap基于数组+链表+红黑树结构,通过哈希算法和扩容机制优化性能,负载因子与树化阈值平衡效率,是Java开发必备的高效数据结构,本文给大家介绍... 目录一、概述:HashMap的宏观结构二、核心数据结构解析1. 数组(桶数组)2. 链表节点(Node

Java 虚拟线程的创建与使用深度解析

《Java虚拟线程的创建与使用深度解析》虚拟线程是Java19中以预览特性形式引入,Java21起正式发布的轻量级线程,本文给大家介绍Java虚拟线程的创建与使用,感兴趣的朋友一起看看吧... 目录一、虚拟线程简介1.1 什么是虚拟线程?1.2 为什么需要虚拟线程?二、虚拟线程与平台线程对比代码对比示例:三

Python函数作用域与闭包举例深度解析

《Python函数作用域与闭包举例深度解析》Python函数的作用域规则和闭包是编程中的关键概念,它们决定了变量的访问和生命周期,:本文主要介绍Python函数作用域与闭包的相关资料,文中通过代码... 目录1. 基础作用域访问示例1:访问全局变量示例2:访问外层函数变量2. 闭包基础示例3:简单闭包示例4

使用Node.js和PostgreSQL构建数据库应用

《使用Node.js和PostgreSQL构建数据库应用》PostgreSQL是一个功能强大的开源关系型数据库,而Node.js是构建高效网络应用的理想平台,结合这两个技术,我们可以创建出色的数据驱动... 目录初始化项目与安装依赖建立数据库连接执行CRUD操作查询数据插入数据更新数据删除数据完整示例与最佳

Docker多阶段镜像构建与缓存利用性能优化实践指南

《Docker多阶段镜像构建与缓存利用性能优化实践指南》这篇文章将从原理层面深入解析Docker多阶段构建与缓存机制,结合实际项目示例,说明如何有效利用构建缓存,组织镜像层次,最大化提升构建速度并减少... 目录一、技术背景与应用场景二、核心原理深入分析三、关键 dockerfile 解读3.1 Docke

深度解析Python中递归下降解析器的原理与实现

《深度解析Python中递归下降解析器的原理与实现》在编译器设计、配置文件处理和数据转换领域,递归下降解析器是最常用且最直观的解析技术,本文将详细介绍递归下降解析器的原理与实现,感兴趣的小伙伴可以跟随... 目录引言:解析器的核心价值一、递归下降解析器基础1.1 核心概念解析1.2 基本架构二、简单算术表达

Three.js构建一个 3D 商品展示空间完整实战项目

《Three.js构建一个3D商品展示空间完整实战项目》Three.js是一个强大的JavaScript库,专用于在Web浏览器中创建3D图形,:本文主要介绍Three.js构建一个3D商品展... 目录引言项目核心技术1. 项目架构与资源组织2. 多模型切换、交互热点绑定3. 移动端适配与帧率优化4. 可

深度解析Java @Serial 注解及常见错误案例

《深度解析Java@Serial注解及常见错误案例》Java14引入@Serial注解,用于编译时校验序列化成员,替代传统方式解决运行时错误,适用于Serializable类的方法/字段,需注意签... 目录Java @Serial 注解深度解析1. 注解本质2. 核心作用(1) 主要用途(2) 适用位置3

Java MCP 的鉴权深度解析

《JavaMCP的鉴权深度解析》文章介绍JavaMCP鉴权的实现方式,指出客户端可通过queryString、header或env传递鉴权信息,服务器端支持工具单独鉴权、过滤器集中鉴权及启动时鉴权... 目录一、MCP Client 侧(负责传递,比较简单)(1)常见的 mcpServers json 配置