深度学习构建肿瘤依赖性图谱

2024-01-31 16:30

本文主要是介绍深度学习构建肿瘤依赖性图谱,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

来源于论文

Predicting and characterizing a cancer dependency map oftumors with deep learning

代码地址:Code Ocean

大家好呀!今天给大家介绍一篇2021年发表在Science Advances上的文章。

全基因组功能缺失筛查揭示了对癌细胞增殖十分重要的基因,称为肿瘤依赖性

然而将肿瘤依赖性关系癌细胞的分子组成联系起来并进一步与肿瘤联系起来还是一个巨大的挑战。

本研究,作者提出了tensorflow框架的深度学习模型Deep—DEP

版本要求:

  • tensorflow:1.4.0
  • python3.5.2
  • cuda8.0.61
  • cudnn6.0.21
  • h5py==2.7.1
  • keras==1.2.2

首先作者队该模型使用无标签的肿瘤基因组(CCL)进行无监督预训练然后保存权重。

无监督预训练(训练集与label一致,带激活函数)

模型流程图:

 

作者使用三个独立数据集验证DeepDEP的性能。通过系统的模型解释,作者扩展了当前的癌症依赖性图谱。将DeepDEP应用于泛癌的肿瘤基因组数据并首次构建了具有临床相关性的泛癌依赖性图谱。总的来说,DeepDEP作为一种新的工具可以用于研究癌症依赖性。

无监督预训练

# Pretrain an autoencoder (AE) of tumor genomics (TCGA) to be used to initialize DeepDEP model training
print("\n\nStarting to run PretrainAE.py with a demo example of gene mutation data of 50 TCGA tumors...")import pickle
from keras import models
from keras.layers import Dense, Merge
from keras.callbacks import EarlyStopping
import numpy as np
import timedef load_data(filename):data = []gene_names = []data_labels = []lines = open(filename).readlines()#readlines读取全内容sample_names = lines[0].replace('\n', '').split('\t')[1:]#replace将空格替换  #拆分字符串。dx = 1for line in lines[dx:]:values = line.replace('\n', '').split('\t')gene = str.upper(values[0]) #upper将字符串中的小写字母转为大写字母。gene_names.append(gene)data.append(values[1:])data = np.array(data, dtype='float32')data = np.transpose(data)return data, data_labels, sample_names, gene_namesdef AE_dense_3layers(input_dim, first_layer_dim, second_layer_dim, third_layer_dim, activation_func, init='he_uniform'):print('input_dim = ', input_dim)print('first_layer_dim = ', first_layer_dim)print('second_layer_dim = ', second_layer_dim)print('third_layer_dim = ', third_layer_dim)print('init = ', init)model = models.Sequential()model.add(Dense(output_dim = first_layer_dim, input_dim = input_dim, activation = activation_func, init = init))model.add(Dense(output_dim = second_layer_dim, input_dim = first_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = third_layer_dim, input_dim = second_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = second_layer_dim, input_dim = third_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = first_layer_dim, input_dim = second_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = input_dim, input_dim = first_layer_dim, activation = activation_func, init = init))return modeldef save_weight_to_pickle(model, file_name):print('saving weights')weight_list = []for layer in model.layers:weight_list.append(layer.get_weights())with open(file_name, 'wb') as handle:pickle.dump(weight_list, handle)if __name__ == '__main__':# load TCGA mutation data, substitute here with other genomicsdata_mut_tcga, data_labels_mut_tcga, sample_names_mut_tcga, gene_names_mut_tcga = load_data(r"D:\DEPOI\data/tcga_mut_data_paired_with_ccl.txt")print("\n\nDatasets successfully loaded.")samples_to_predict = np.arange(0, 50)# predict the first 50 samples for DEMO ONLY, for all samples please substitute 50 by data_mut_tcga.shape[0]# prediction results of all 8238 TCGA samples can be found in /data/premodel_tcga_*.pickleprint()input_dim = data_mut_tcga.shape[1]first_layer_dim = 1000second_layer_dim = 100third_layer_dim = 50batch_size = 64epoch_size = 100activation_function = 'relu'init = 'he_uniform'model_save_name = "premodel_tcga_mut_%d_%d_%d" % (first_layer_dim, second_layer_dim, third_layer_dim)t = time.time()model = AE_dense_3layers(input_dim = input_dim, first_layer_dim = first_layer_dim, second_layer_dim=second_layer_dim, third_layer_dim=third_layer_dim, activation_func=activation_function, init=init)model.compile(loss = 'mse', optimizer = 'adam')model.fit(data_mut_tcga[samples_to_predict], data_mut_tcga[samples_to_predict], nb_epoch=epoch_size, batch_size=batch_size, shuffle=True)cost = model.evaluate(data_mut_tcga[samples_to_predict], data_mut_tcga[samples_to_predict], verbose = 0)print('\n\nAutoencoder training completed in %.1f mins.\n with testloss:%.4f' % ((time.time()-t)/60, cost))save_weight_to_pickle(model, r'D:\DEPOI/results/autoencoders/' + model_save_name + '_demo.pickle')print("\nResults saved in /results/autoencoders/%s_demo.pickle\n\n" % model_save_name)

经过无监督预训练后,保存权重到pickle文件,以后载入到训练模型上用

主训练

# Train, validate, and test single-, 2-, and full 4-omics DeepDEP models
print("\n\nStarting to run TrainNewModel.py with a demo example of 28 CCLs x 1298 DepOIs...")import pickle
from keras import models
from keras.layers import Dense, Merge
from keras.callbacks import EarlyStopping
import numpy as np
import time
from matplotlib import pyplot as pltif __name__ == '__main__':with open(r'D:\DEPOI/data/ccl_complete_data_278CCL_1298DepOI_360844samples.pickle', 'rb') as f:data_mut, data_exp, data_cna, data_meth, data_dep, data_fprint = pickle.load(f)# This pickle file is for DEMO ONLY (containing 28 CCLs x 1298 DepOIs = 36344 samples)!# First 1298 samples correspond to 1298 DepOIs of the first CCL, and so on.# For the complete data used in the paper (278 CCLs x 1298 DepOIs = 360844 samples),# please substitute by 'ccl_complete_data_278CCL_1298DepOI_360844samples.pickle',# to which a link can be found in README.md# Load autoencoders of each genomics that were pre-trained using 8238 TCGA samples# New autoencoders can be pretrained using PretrainAE.pypremodel_mut = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_mut_1000_100_50.pickle', 'rb'))premodel_exp = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_exp_500_200_50.pickle', 'rb'))premodel_cna = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_cna_500_200_50.pickle', 'rb'))premodel_meth = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_meth_500_200_50.pickle', 'rb'))print("\n\nDatasets successfully loaded.")activation_func = 'relu' # for all middle layersactivation_func2 = 'linear' # for output layer to output unbounded gene-effect scoresinit = 'he_uniform'dense_layer_dim = 250batch_size = 10000num_epoch = 100num_DepOI = 1298 # 1298 DepOIs as defined in our papernum_ccl = int(data_mut.shape[0]/num_DepOI)# 90% CCLs for training/validation, and 10% for testingid_rand = np.random.permutation(num_ccl)id_cell_train = id_rand[np.arange(0, round(num_ccl*0.9))]id_cell_test = id_rand[np.arange(round(num_ccl*0.9), num_ccl)]# print(id_cell_train)# prepare sample indices (selected CCLs x 1298 DepOIs)id_x=np.arange(0, 1298)id_y=id_cell_train[0]*1298id_train = np.arange(0, 1298) + id_cell_train[0]*1298for y in id_cell_train:id_train = np.union1d(id_train, np.arange(0, 1298) + y*1298)id_test = np.arange(0, 1298) + id_cell_test[0] * 1298for y in id_cell_test:id_test = np.union1d(id_test, np.arange(0, 1298) + y*1298)print("\n\nTraining/validation on %d samples (%d CCLs x %d DepOIs) and testing on %d samples (%d CCLs x %d DepOIs).\n\n" % (len(id_train), len(id_cell_train), num_DepOI, len(id_test), len(id_cell_test), num_DepOI))# Full 4-omic DeepDEP model, composed of 6 sub-networks:# model_mut, model_exp, model_cna, model_meth: to learn data embedding of each omics# model_gene: to learn data embedding of gene fingerprints (involvement of a gene in 3115 functions)# model_final: to merge the above 5 sub-networks and predict gene-effect scorest = time.time()# subnetwork of mutationsmodel_mut = models.Sequential()model_mut.add(Dense(output_dim=1000, input_dim=premodel_mut[0][0].shape[0], activation=activation_func,weights=premodel_mut[0], trainable=True))model_mut.add(Dense(output_dim=100, input_dim=1000, activation=activation_func, weights=premodel_mut[1],trainable=True))model_mut.add(Dense(output_dim=50, input_dim=100, activation=activation_func, weights=premodel_mut[2],trainable=True))# subnetwork of expressionmodel_exp = models.Sequential()model_exp.add(Dense(output_dim=500, input_dim=premodel_exp[0][0].shape[0], activation=activation_func,weights=premodel_exp[0], trainable=True))model_exp.add(Dense(output_dim=200, input_dim=500, activation=activation_func, weights=premodel_exp[1],trainable=True))model_exp.add(Dense(output_dim=50, input_dim=200, activation=activation_func, weights=premodel_exp[2],trainable=True))# subnetwork of copy number alterationsmodel_cna = models.Sequential()model_cna.add(Dense(output_dim=500, input_dim=premodel_cna[0][0].shape[0], activation=activation_func,weights=premodel_cna[0], trainable=True))model_cna.add(Dense(output_dim=200, input_dim=500, activation=activation_func, weights=premodel_cna[1],trainable=True))model_cna.add(Dense(output_dim=50, input_dim=200, activation=activation_func, weights=premodel_cna[2],trainable=True))# subnetwork of DNA methylationsmodel_meth = models.Sequential()model_meth.add(Dense(output_dim=500, input_dim=premodel_meth[0][0].shape[0], activation=activation_func,weights=premodel_meth[0], trainable=True))model_meth.add(Dense(output_dim=200, input_dim=500, activation=activation_func, weights=premodel_meth[1],trainable=True))model_meth.add(Dense(output_dim=50, input_dim=200, activation=activation_func, weights=premodel_meth[2],trainable=True))# subnetwork of gene fingerprintsmodel_gene = models.Sequential()model_gene.add(Dense(output_dim=1000, input_dim=data_fprint.shape[1], activation=activation_func, init=init,trainable=True))model_gene.add(Dense(output_dim=100, input_dim=1000, activation=activation_func, init=init, trainable=True))model_gene.add(Dense(output_dim=50, input_dim=100, activation=activation_func, init=init, trainable=True))# prediction networkmodel_final = models.Sequential()model_final.add(Merge([model_mut, model_exp, model_cna, model_meth, model_gene], mode='concat'))model_final.add(Dense(output_dim=dense_layer_dim, input_dim=250, activation=activation_func, init=init,trainable=True))model_final.add(Dense(output_dim=dense_layer_dim, input_dim=dense_layer_dim, activation=activation_func, init=init,trainable=True))model_final.add(Dense(output_dim=1, input_dim=dense_layer_dim, activation=activation_func2, init=init,trainable=True))# training with early stopping with 3 patiencehistory = EarlyStopping(monitor='val_loss', min_delta=0, patience=100, verbose=0, mode='min')model_final.compile(loss='mse', optimizer='adam')model_final.fit([data_mut[id_train], data_exp[id_train], data_cna[id_train], data_meth[id_train], data_fprint[id_train]],data_dep[id_train], nb_epoch=num_epoch, validation_split=1/9, batch_size=batch_size, shuffle=True,callbacks=[history])cost_testing = model_final.evaluate([data_mut[id_test], data_exp[id_test], data_cna[id_test], data_meth[id_test], data_fprint[id_test]],data_dep[id_test], verbose=0, batch_size=batch_size)print("\n\nFull DeepDEP model training completed in %.1f mins.\nloss:%.4f valloss:%.4f testloss:%.4f" % ((time.time() - t)/60,history.model.model.history.history['loss'][history.stopped_epoch],history.model.model.history.history['val_loss'][history.stopped_epoch], cost_testing))model_final.save(r'D:\DEPOI\results_cai/models/model_demo.h5')print("\n\nFull DeepDEP model saved in /results/models/model_demo.h5\n\n")
############################################################################################################################loss = history.model.model.history.history['loss']val_loss = history.model.model.history.history['val_loss']fig = plt.figure()plt.plot(loss, label="Training Loss")plt.plot(val_loss, label="Validation Loss")plt.title("Training and Validation Loss")plt.legend()fig.savefig("loss.png")plt.show()

预测:观察模型性能

# Predict TCGA (or other new) samples using a trained model
print("\n\nStarting to run PredictNewSamples.py with a demo example of 10 TCGA tumors...")import numpy as np
import pandas as pd
from keras import models
import time
import tensorflow as tf
import pickleif __name__ == '__main__':model_name = "model_demo"model_saved = models.load_model(r"D:\DEPOI\results_cai/models/%s.h5" % model_name)#D:\DEPOI\results_cai\models# model_paper is the full 4-omics DeepDEP model used in the paper# user can choose from single-omics, 2-omics, or full DeepDEP models from the# /data/full_results_models_paper/models/ directorywith open(r'D:\DEPOI/data/ccl_complete_data_28CCL_1298DepOI_36344samples_demo.pickle', 'rb') as f:data_mut, data_exp, data_cna, data_meth, data_dep, data_fprint = pickle.load(f)print("\n\nDatasets successfully loaded.\n\n")batch_size = 500# predict the first 10 samples for DEMO ONLY, for all samples please substitute 10 by data_mut_tcga.shape[0]# prediction results of all 8238 TCGA samples can be found in /data/full_results_models_paper/predictions/## t = time.time()y = data_depdata_pred_tmp = model_saved.predict([data_mut,data_exp,data_cna,data_meth,data_fprint], batch_size=batch_size, verbose=0)def MSE(y, t):return np.sum((y - t) ** 2)T = []T[:] = y[:, 0]P = []P[:] = data_pred_tmp[:,0]x =(MSE(np.array(P),np.array(T)).sum())X = x/(data_mut.shape[0])print(X)

这篇关于深度学习构建肿瘤依赖性图谱的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/664379

相关文章

深度解析Spring Security 中的 SecurityFilterChain核心功能

《深度解析SpringSecurity中的SecurityFilterChain核心功能》SecurityFilterChain通过组件化配置、类型安全路径匹配、多链协同三大特性,重构了Spri... 目录Spring Security 中的SecurityFilterChain深度解析一、Security

使用Python构建智能BAT文件生成器的完美解决方案

《使用Python构建智能BAT文件生成器的完美解决方案》这篇文章主要为大家详细介绍了如何使用wxPython构建一个智能的BAT文件生成器,它不仅能够为Python脚本生成启动脚本,还提供了完整的文... 目录引言运行效果图项目背景与需求分析核心需求技术选型核心功能实现1. 数据库设计2. 界面布局设计3

深入浅出SpringBoot WebSocket构建实时应用全面指南

《深入浅出SpringBootWebSocket构建实时应用全面指南》WebSocket是一种在单个TCP连接上进行全双工通信的协议,这篇文章主要为大家详细介绍了SpringBoot如何集成WebS... 目录前言为什么需要 WebSocketWebSocket 是什么Spring Boot 如何简化 We

深度解析Nginx日志分析与499状态码问题解决

《深度解析Nginx日志分析与499状态码问题解决》在Web服务器运维和性能优化过程中,Nginx日志是排查问题的重要依据,本文将围绕Nginx日志分析、499状态码的成因、排查方法及解决方案展开讨论... 目录前言1. Nginx日志基础1.1 Nginx日志存放位置1.2 Nginx日志格式2. 499

Spring Boot Maven 插件如何构建可执行 JAR 的核心配置

《SpringBootMaven插件如何构建可执行JAR的核心配置》SpringBoot核心Maven插件,用于生成可执行JAR/WAR,内置服务器简化部署,支持热部署、多环境配置及依赖管理... 目录前言一、插件的核心功能与目标1.1 插件的定位1.2 插件的 Goals(目标)1.3 插件定位1.4 核

使用Python构建一个高效的日志处理系统

《使用Python构建一个高效的日志处理系统》这篇文章主要为大家详细讲解了如何使用Python开发一个专业的日志分析工具,能够自动化处理、分析和可视化各类日志文件,大幅提升运维效率,需要的可以了解下... 目录环境准备工具功能概述完整代码实现代码深度解析1. 类设计与初始化2. 日志解析核心逻辑3. 文件处

使用Docker构建Python Flask程序的详细教程

《使用Docker构建PythonFlask程序的详细教程》在当今的软件开发领域,容器化技术正变得越来越流行,而Docker无疑是其中的佼佼者,本文我们就来聊聊如何使用Docker构建一个简单的Py... 目录引言一、准备工作二、创建 Flask 应用程序三、创建 dockerfile四、构建 Docker

深度解析Java DTO(最新推荐)

《深度解析JavaDTO(最新推荐)》DTO(DataTransferObject)是一种用于在不同层(如Controller层、Service层)之间传输数据的对象设计模式,其核心目的是封装数据,... 目录一、什么是DTO?DTO的核心特点:二、为什么需要DTO?(对比Entity)三、实际应用场景解析

深度解析Java项目中包和包之间的联系

《深度解析Java项目中包和包之间的联系》文章浏览阅读850次,点赞13次,收藏8次。本文详细介绍了Java分层架构中的几个关键包:DTO、Controller、Service和Mapper。_jav... 目录前言一、各大包1.DTO1.1、DTO的核心用途1.2. DTO与实体类(Entity)的区别1

深度解析Python装饰器常见用法与进阶技巧

《深度解析Python装饰器常见用法与进阶技巧》Python装饰器(Decorator)是提升代码可读性与复用性的强大工具,本文将深入解析Python装饰器的原理,常见用法,进阶技巧与最佳实践,希望可... 目录装饰器的基本原理函数装饰器的常见用法带参数的装饰器类装饰器与方法装饰器装饰器的嵌套与组合进阶技巧