深度学习构建肿瘤依赖性图谱

2024-01-31 16:30

本文主要是介绍深度学习构建肿瘤依赖性图谱,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

来源于论文

Predicting and characterizing a cancer dependency map oftumors with deep learning

代码地址:Code Ocean

大家好呀!今天给大家介绍一篇2021年发表在Science Advances上的文章。

全基因组功能缺失筛查揭示了对癌细胞增殖十分重要的基因,称为肿瘤依赖性

然而将肿瘤依赖性关系癌细胞的分子组成联系起来并进一步与肿瘤联系起来还是一个巨大的挑战。

本研究,作者提出了tensorflow框架的深度学习模型Deep—DEP

版本要求:

  • tensorflow:1.4.0
  • python3.5.2
  • cuda8.0.61
  • cudnn6.0.21
  • h5py==2.7.1
  • keras==1.2.2

首先作者队该模型使用无标签的肿瘤基因组(CCL)进行无监督预训练然后保存权重。

无监督预训练(训练集与label一致,带激活函数)

模型流程图:

 

作者使用三个独立数据集验证DeepDEP的性能。通过系统的模型解释,作者扩展了当前的癌症依赖性图谱。将DeepDEP应用于泛癌的肿瘤基因组数据并首次构建了具有临床相关性的泛癌依赖性图谱。总的来说,DeepDEP作为一种新的工具可以用于研究癌症依赖性。

无监督预训练

# Pretrain an autoencoder (AE) of tumor genomics (TCGA) to be used to initialize DeepDEP model training
print("\n\nStarting to run PretrainAE.py with a demo example of gene mutation data of 50 TCGA tumors...")import pickle
from keras import models
from keras.layers import Dense, Merge
from keras.callbacks import EarlyStopping
import numpy as np
import timedef load_data(filename):data = []gene_names = []data_labels = []lines = open(filename).readlines()#readlines读取全内容sample_names = lines[0].replace('\n', '').split('\t')[1:]#replace将空格替换  #拆分字符串。dx = 1for line in lines[dx:]:values = line.replace('\n', '').split('\t')gene = str.upper(values[0]) #upper将字符串中的小写字母转为大写字母。gene_names.append(gene)data.append(values[1:])data = np.array(data, dtype='float32')data = np.transpose(data)return data, data_labels, sample_names, gene_namesdef AE_dense_3layers(input_dim, first_layer_dim, second_layer_dim, third_layer_dim, activation_func, init='he_uniform'):print('input_dim = ', input_dim)print('first_layer_dim = ', first_layer_dim)print('second_layer_dim = ', second_layer_dim)print('third_layer_dim = ', third_layer_dim)print('init = ', init)model = models.Sequential()model.add(Dense(output_dim = first_layer_dim, input_dim = input_dim, activation = activation_func, init = init))model.add(Dense(output_dim = second_layer_dim, input_dim = first_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = third_layer_dim, input_dim = second_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = second_layer_dim, input_dim = third_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = first_layer_dim, input_dim = second_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = input_dim, input_dim = first_layer_dim, activation = activation_func, init = init))return modeldef save_weight_to_pickle(model, file_name):print('saving weights')weight_list = []for layer in model.layers:weight_list.append(layer.get_weights())with open(file_name, 'wb') as handle:pickle.dump(weight_list, handle)if __name__ == '__main__':# load TCGA mutation data, substitute here with other genomicsdata_mut_tcga, data_labels_mut_tcga, sample_names_mut_tcga, gene_names_mut_tcga = load_data(r"D:\DEPOI\data/tcga_mut_data_paired_with_ccl.txt")print("\n\nDatasets successfully loaded.")samples_to_predict = np.arange(0, 50)# predict the first 50 samples for DEMO ONLY, for all samples please substitute 50 by data_mut_tcga.shape[0]# prediction results of all 8238 TCGA samples can be found in /data/premodel_tcga_*.pickleprint()input_dim = data_mut_tcga.shape[1]first_layer_dim = 1000second_layer_dim = 100third_layer_dim = 50batch_size = 64epoch_size = 100activation_function = 'relu'init = 'he_uniform'model_save_name = "premodel_tcga_mut_%d_%d_%d" % (first_layer_dim, second_layer_dim, third_layer_dim)t = time.time()model = AE_dense_3layers(input_dim = input_dim, first_layer_dim = first_layer_dim, second_layer_dim=second_layer_dim, third_layer_dim=third_layer_dim, activation_func=activation_function, init=init)model.compile(loss = 'mse', optimizer = 'adam')model.fit(data_mut_tcga[samples_to_predict], data_mut_tcga[samples_to_predict], nb_epoch=epoch_size, batch_size=batch_size, shuffle=True)cost = model.evaluate(data_mut_tcga[samples_to_predict], data_mut_tcga[samples_to_predict], verbose = 0)print('\n\nAutoencoder training completed in %.1f mins.\n with testloss:%.4f' % ((time.time()-t)/60, cost))save_weight_to_pickle(model, r'D:\DEPOI/results/autoencoders/' + model_save_name + '_demo.pickle')print("\nResults saved in /results/autoencoders/%s_demo.pickle\n\n" % model_save_name)

经过无监督预训练后,保存权重到pickle文件,以后载入到训练模型上用

主训练

# Train, validate, and test single-, 2-, and full 4-omics DeepDEP models
print("\n\nStarting to run TrainNewModel.py with a demo example of 28 CCLs x 1298 DepOIs...")import pickle
from keras import models
from keras.layers import Dense, Merge
from keras.callbacks import EarlyStopping
import numpy as np
import time
from matplotlib import pyplot as pltif __name__ == '__main__':with open(r'D:\DEPOI/data/ccl_complete_data_278CCL_1298DepOI_360844samples.pickle', 'rb') as f:data_mut, data_exp, data_cna, data_meth, data_dep, data_fprint = pickle.load(f)# This pickle file is for DEMO ONLY (containing 28 CCLs x 1298 DepOIs = 36344 samples)!# First 1298 samples correspond to 1298 DepOIs of the first CCL, and so on.# For the complete data used in the paper (278 CCLs x 1298 DepOIs = 360844 samples),# please substitute by 'ccl_complete_data_278CCL_1298DepOI_360844samples.pickle',# to which a link can be found in README.md# Load autoencoders of each genomics that were pre-trained using 8238 TCGA samples# New autoencoders can be pretrained using PretrainAE.pypremodel_mut = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_mut_1000_100_50.pickle', 'rb'))premodel_exp = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_exp_500_200_50.pickle', 'rb'))premodel_cna = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_cna_500_200_50.pickle', 'rb'))premodel_meth = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_meth_500_200_50.pickle', 'rb'))print("\n\nDatasets successfully loaded.")activation_func = 'relu' # for all middle layersactivation_func2 = 'linear' # for output layer to output unbounded gene-effect scoresinit = 'he_uniform'dense_layer_dim = 250batch_size = 10000num_epoch = 100num_DepOI = 1298 # 1298 DepOIs as defined in our papernum_ccl = int(data_mut.shape[0]/num_DepOI)# 90% CCLs for training/validation, and 10% for testingid_rand = np.random.permutation(num_ccl)id_cell_train = id_rand[np.arange(0, round(num_ccl*0.9))]id_cell_test = id_rand[np.arange(round(num_ccl*0.9), num_ccl)]# print(id_cell_train)# prepare sample indices (selected CCLs x 1298 DepOIs)id_x=np.arange(0, 1298)id_y=id_cell_train[0]*1298id_train = np.arange(0, 1298) + id_cell_train[0]*1298for y in id_cell_train:id_train = np.union1d(id_train, np.arange(0, 1298) + y*1298)id_test = np.arange(0, 1298) + id_cell_test[0] * 1298for y in id_cell_test:id_test = np.union1d(id_test, np.arange(0, 1298) + y*1298)print("\n\nTraining/validation on %d samples (%d CCLs x %d DepOIs) and testing on %d samples (%d CCLs x %d DepOIs).\n\n" % (len(id_train), len(id_cell_train), num_DepOI, len(id_test), len(id_cell_test), num_DepOI))# Full 4-omic DeepDEP model, composed of 6 sub-networks:# model_mut, model_exp, model_cna, model_meth: to learn data embedding of each omics# model_gene: to learn data embedding of gene fingerprints (involvement of a gene in 3115 functions)# model_final: to merge the above 5 sub-networks and predict gene-effect scorest = time.time()# subnetwork of mutationsmodel_mut = models.Sequential()model_mut.add(Dense(output_dim=1000, input_dim=premodel_mut[0][0].shape[0], activation=activation_func,weights=premodel_mut[0], trainable=True))model_mut.add(Dense(output_dim=100, input_dim=1000, activation=activation_func, weights=premodel_mut[1],trainable=True))model_mut.add(Dense(output_dim=50, input_dim=100, activation=activation_func, weights=premodel_mut[2],trainable=True))# subnetwork of expressionmodel_exp = models.Sequential()model_exp.add(Dense(output_dim=500, input_dim=premodel_exp[0][0].shape[0], activation=activation_func,weights=premodel_exp[0], trainable=True))model_exp.add(Dense(output_dim=200, input_dim=500, activation=activation_func, weights=premodel_exp[1],trainable=True))model_exp.add(Dense(output_dim=50, input_dim=200, activation=activation_func, weights=premodel_exp[2],trainable=True))# subnetwork of copy number alterationsmodel_cna = models.Sequential()model_cna.add(Dense(output_dim=500, input_dim=premodel_cna[0][0].shape[0], activation=activation_func,weights=premodel_cna[0], trainable=True))model_cna.add(Dense(output_dim=200, input_dim=500, activation=activation_func, weights=premodel_cna[1],trainable=True))model_cna.add(Dense(output_dim=50, input_dim=200, activation=activation_func, weights=premodel_cna[2],trainable=True))# subnetwork of DNA methylationsmodel_meth = models.Sequential()model_meth.add(Dense(output_dim=500, input_dim=premodel_meth[0][0].shape[0], activation=activation_func,weights=premodel_meth[0], trainable=True))model_meth.add(Dense(output_dim=200, input_dim=500, activation=activation_func, weights=premodel_meth[1],trainable=True))model_meth.add(Dense(output_dim=50, input_dim=200, activation=activation_func, weights=premodel_meth[2],trainable=True))# subnetwork of gene fingerprintsmodel_gene = models.Sequential()model_gene.add(Dense(output_dim=1000, input_dim=data_fprint.shape[1], activation=activation_func, init=init,trainable=True))model_gene.add(Dense(output_dim=100, input_dim=1000, activation=activation_func, init=init, trainable=True))model_gene.add(Dense(output_dim=50, input_dim=100, activation=activation_func, init=init, trainable=True))# prediction networkmodel_final = models.Sequential()model_final.add(Merge([model_mut, model_exp, model_cna, model_meth, model_gene], mode='concat'))model_final.add(Dense(output_dim=dense_layer_dim, input_dim=250, activation=activation_func, init=init,trainable=True))model_final.add(Dense(output_dim=dense_layer_dim, input_dim=dense_layer_dim, activation=activation_func, init=init,trainable=True))model_final.add(Dense(output_dim=1, input_dim=dense_layer_dim, activation=activation_func2, init=init,trainable=True))# training with early stopping with 3 patiencehistory = EarlyStopping(monitor='val_loss', min_delta=0, patience=100, verbose=0, mode='min')model_final.compile(loss='mse', optimizer='adam')model_final.fit([data_mut[id_train], data_exp[id_train], data_cna[id_train], data_meth[id_train], data_fprint[id_train]],data_dep[id_train], nb_epoch=num_epoch, validation_split=1/9, batch_size=batch_size, shuffle=True,callbacks=[history])cost_testing = model_final.evaluate([data_mut[id_test], data_exp[id_test], data_cna[id_test], data_meth[id_test], data_fprint[id_test]],data_dep[id_test], verbose=0, batch_size=batch_size)print("\n\nFull DeepDEP model training completed in %.1f mins.\nloss:%.4f valloss:%.4f testloss:%.4f" % ((time.time() - t)/60,history.model.model.history.history['loss'][history.stopped_epoch],history.model.model.history.history['val_loss'][history.stopped_epoch], cost_testing))model_final.save(r'D:\DEPOI\results_cai/models/model_demo.h5')print("\n\nFull DeepDEP model saved in /results/models/model_demo.h5\n\n")
############################################################################################################################loss = history.model.model.history.history['loss']val_loss = history.model.model.history.history['val_loss']fig = plt.figure()plt.plot(loss, label="Training Loss")plt.plot(val_loss, label="Validation Loss")plt.title("Training and Validation Loss")plt.legend()fig.savefig("loss.png")plt.show()

预测:观察模型性能

# Predict TCGA (or other new) samples using a trained model
print("\n\nStarting to run PredictNewSamples.py with a demo example of 10 TCGA tumors...")import numpy as np
import pandas as pd
from keras import models
import time
import tensorflow as tf
import pickleif __name__ == '__main__':model_name = "model_demo"model_saved = models.load_model(r"D:\DEPOI\results_cai/models/%s.h5" % model_name)#D:\DEPOI\results_cai\models# model_paper is the full 4-omics DeepDEP model used in the paper# user can choose from single-omics, 2-omics, or full DeepDEP models from the# /data/full_results_models_paper/models/ directorywith open(r'D:\DEPOI/data/ccl_complete_data_28CCL_1298DepOI_36344samples_demo.pickle', 'rb') as f:data_mut, data_exp, data_cna, data_meth, data_dep, data_fprint = pickle.load(f)print("\n\nDatasets successfully loaded.\n\n")batch_size = 500# predict the first 10 samples for DEMO ONLY, for all samples please substitute 10 by data_mut_tcga.shape[0]# prediction results of all 8238 TCGA samples can be found in /data/full_results_models_paper/predictions/## t = time.time()y = data_depdata_pred_tmp = model_saved.predict([data_mut,data_exp,data_cna,data_meth,data_fprint], batch_size=batch_size, verbose=0)def MSE(y, t):return np.sum((y - t) ** 2)T = []T[:] = y[:, 0]P = []P[:] = data_pred_tmp[:,0]x =(MSE(np.array(P),np.array(T)).sum())X = x/(data_mut.shape[0])print(X)

这篇关于深度学习构建肿瘤依赖性图谱的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/664379

相关文章

Go学习记录之runtime包深入解析

《Go学习记录之runtime包深入解析》Go语言runtime包管理运行时环境,涵盖goroutine调度、内存分配、垃圾回收、类型信息等核心功能,:本文主要介绍Go学习记录之runtime包的... 目录前言:一、runtime包内容学习1、作用:① Goroutine和并发控制:② 垃圾回收:③ 栈和

基于Python构建一个高效词汇表

《基于Python构建一个高效词汇表》在自然语言处理(NLP)领域,构建高效的词汇表是文本预处理的关键步骤,本文将解析一个使用Python实现的n-gram词频统计工具,感兴趣的可以了解下... 目录一、项目背景与目标1.1 技术需求1.2 核心技术栈二、核心代码解析2.1 数据处理函数2.2 数据处理流程

Python FastMCP构建MCP服务端与客户端的详细步骤

《PythonFastMCP构建MCP服务端与客户端的详细步骤》MCP(Multi-ClientProtocol)是一种用于构建可扩展服务的通信协议框架,本文将使用FastMCP搭建一个支持St... 目录简介环境准备服务端实现(server.py)客户端实现(client.py)运行效果扩展方向常见问题结

详解如何使用Python构建从数据到文档的自动化工作流

《详解如何使用Python构建从数据到文档的自动化工作流》这篇文章将通过真实工作场景拆解,为大家展示如何用Python构建自动化工作流,让工具代替人力完成这些数字苦力活,感兴趣的小伙伴可以跟随小编一起... 目录一、Excel处理:从数据搬运工到智能分析师二、PDF处理:文档工厂的智能生产线三、邮件自动化:

Python中文件读取操作漏洞深度解析与防护指南

《Python中文件读取操作漏洞深度解析与防护指南》在Web应用开发中,文件操作是最基础也最危险的功能之一,这篇文章将全面剖析Python环境中常见的文件读取漏洞类型,成因及防护方案,感兴趣的小伙伴可... 目录引言一、静态资源处理中的路径穿越漏洞1.1 典型漏洞场景1.2 os.path.join()的陷

Android学习总结之Java和kotlin区别超详细分析

《Android学习总结之Java和kotlin区别超详细分析》Java和Kotlin都是用于Android开发的编程语言,它们各自具有独特的特点和优势,:本文主要介绍Android学习总结之Ja... 目录一、空安全机制真题 1:Kotlin 如何解决 Java 的 NullPointerExceptio

详解如何使用Python从零开始构建文本统计模型

《详解如何使用Python从零开始构建文本统计模型》在自然语言处理领域,词汇表构建是文本预处理的关键环节,本文通过Python代码实践,演示如何从原始文本中提取多尺度特征,并通过动态调整机制构建更精确... 目录一、项目背景与核心思想二、核心代码解析1. 数据加载与预处理2. 多尺度字符统计3. 统计结果可

一文教你Java如何快速构建项目骨架

《一文教你Java如何快速构建项目骨架》在Java项目开发过程中,构建项目骨架是一项繁琐但又基础重要的工作,Java领域有许多代码生成工具可以帮助我们快速完成这一任务,下面就跟随小编一起来了解下... 目录一、代码生成工具概述常用 Java 代码生成工具简介代码生成工具的优势二、使用 MyBATis Gen

Python使用Reflex构建现代Web应用的完全指南

《Python使用Reflex构建现代Web应用的完全指南》这篇文章为大家深入介绍了Reflex框架的设计理念,技术特性,项目结构,核心API,实际开发流程以及与其他框架的对比和部署建议,感兴趣的小伙... 目录什么是 ReFlex?为什么选择 Reflex?安装与环境配置构建你的第一个应用核心概念解析组件

Spring Boot拦截器Interceptor与过滤器Filter深度解析(区别、实现与实战指南)

《SpringBoot拦截器Interceptor与过滤器Filter深度解析(区别、实现与实战指南)》:本文主要介绍SpringBoot拦截器Interceptor与过滤器Filter深度解析... 目录Spring Boot拦截器(Interceptor)与过滤器(Filter)深度解析:区别、实现与实