Seurat数据集处理流程

2023-10-08 07:40
文章标签 数据 流程 处理 seurat

本文主要是介绍Seurat数据集处理流程,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

多数据集

pancreas数据集

suppressMessages(require(Seurat))
suppressMessages(require(ggplot2))
suppressMessages(require(cowplot))
suppressMessages(require(scater))
suppressMessages(require(scran))
suppressMessages(require(BiocParallel))
suppressMessages(require(BiocNeighbors))
setwd("/Users/xiaokangyu/Desktop/程序学习总结/库学习/Seurat/data/pancreas_v3_file")
pancreas.data <- readRDS(file = "pancreas_expression_matrix.rds")
metadata <- readRDS(file = "pancreas_metadata.rds")
pancreas <- CreateSeuratObject(pancreas.data, meta.data = metadata)
#注意这个和其他的情况是不一样的,这里只有metadata的标注信息的
#之前的数据集只是传入count matrix,而并没有同时传入meta.data# 标准化数据(Filter cells省略了,这个影响不大)
# Normalize and find variable features
pancreas <- NormalizeData(pancreas, verbose = FALSE)
pancreas <- FindVariableFeatures(pancreas, selection.method = "vst", nfeatures = 2000, verbose = FALSE)# Run the standard workflow for visualization and clustering
pancreas <- ScaleData(pancreas, verbose = FALSE)
pancreas <- RunPCA(pancreas, npcs = 30, verbose = FALSE)
pancreas <- RunUMAP(pancreas, reduction = "pca", dims = 1:30)
p1 <- DimPlot(pancreas, reduction = "umap", group.by = "tech")#画batch图
p2 <- DimPlot(pancreas, reduction = "umap", group.by = "celltype", label = TRUE, repel = TRUE) + NoLegend() #画celltype的图
print(p1+p2)

结果如下
在这里插入图片描述# 单数据集
``

rm(list=ls())
setwd("/Users/xiaokangyu/Desktop/程序学习总结/库学习/Seurat/")
library(Seurat)
library(dplyr)
library(aricode)
## load data (./Seurat/Koh.Rdata)
## Unnormalized data such as raw counts or TPMs
dataname = "./data/Koh.Rdata"
load(dataname)
colnames(label)="celltype"## create Seurat object
pbmc_small <- CreateSeuratObject(data,meta.data = label)## Normalize the count data present in a given assay.
pbmc_small <- NormalizeData(object = pbmc_small)## Identifies features that are outliers on a 'mean variability plot'.
pbmc_small <- FindVariableFeatures(object = pbmc_small)## Scales and centers features in the dataset. If variables are provided in vars.to.regress, they are individually regressed against each feautre, and the resulting residuals are then scaled and centered.
pbmc_small <- ScaleData(object = pbmc_small
)## Run a PCA dimensionality reduction. For details about stored PCA calculation parameters, see PrintPCAParams.
pbmc_small <- RunPCA(object = pbmc_small,pc.genes = pbmc_small@var.genes)
#runPCA和RunUMAP是同时等价地位的。
pbmc_small <- RunUMAP(pbmc_small, reduction = "pca", dims = 1:30)## Randomly permutes a subset of data, and calculates projected PCA scores for these 'random' genes. Then compares the PCA scores for the 'random' genes with the observed PCA scores to determine statistical signifance. End result is a p-value for each gene's association with each principal component.
pbmc_small <- JackStraw(object = pbmc_small)## Constructs a Shared Nearest Neighbor (SNN) Graph for a given dataset.
pbmc_small <- FindNeighbors(pbmc_small)##Clustering
res = FindClusters(object = pbmc_small)
#res$seurat_clusters
#这里需要注意一点,经过FindCluster后赋值的队象变成了res
#pbmc_small还是没有经过处理之前的,因此它没有$seurat_cluster的属性
#但是之前pbmc_small所具有的全部属性,res全部都有
p1 <- DimPlot(res, reduction = "umap",label = T)#画batch图
p2 <- DimPlot(res, reduction = "umap",group.by = "celltype")#画batch图
print(p1+p2)#最终画图可以直接使用+,并排显示

在这里插入图片描述

显示label

如果想达到和sc.pl.umap(adata,color=["celltype"],legend="on data")
那么在Seurat中需要设置label的参数

pp=DimPlot(data_seurat, reduction = "umap", group.by = "celltype", label.size = 5,label=T)+ggtitle("Integrated Celltype")+NoLegend()
print(pp)

结果如下
在这里插入图片描述

Integration

rm(list=ls())
options(future.globals.maxSize = 8000 * 1024^2)
suppressMessages(require(Seurat))
suppressMessages(require(ggplot2))
suppressMessages(require(cowplot))
#suppressMessages(require(scater))
#suppressMessages(require(scran))
#suppressMessages(require(BiocParallel))
#suppressMessages(require(BiocNeighbors))setwd("/home/zhangjingxiao/yxk/Seurat3")
start.time <- Sys.time()
pancreas.data <- readRDS(file = "/DATA2/zhangjingxiao/yxk/dataset/pancreas_v3/pancreas_expression_matrix.rds")
metadata <- readRDS(file = "/DATA2/zhangjingxiao/yxk/dataset/pancreas_v3/pancreas_metadata.rds")
#pancreas <- CreateSeuratObject(pancreas.data, meta.data = metadata)
#注意这个和其他的情况是不一样的,这里只有metadata的标注信息的
#之前的数据集只是传入count matrix,而并没有同时传入meta.data
# 标准化数据(Filter cells省略了,这个影响不大)
# Normalize and find variable features
# pancreas <- NormalizeData(pancreas, verbose = FALSE)
# pancreas <- FindVariableFeatures(pancreas, selection.method = "vst", nfeatures = 2000, verbose = FALSE)
# 
# # Run the standard workflow for visualization and clustering
# pancreas <- ScaleData(pancreas, verbose = FALSE)
# pancreas <- RunPCA(pancreas, npcs = 30, verbose = FALSE)
# pancreas <- RunUMAP(pancreas, reduction = "pca", dims = 1:30)
# 
# p1 <- DimPlot(pancreas, reduction = "umap", group.by = "tech")#画batch图
# p2 <- DimPlot(pancreas, reduction = "umap", group.by = "celltype", label = TRUE, repel = TRUE) + 
#   NoLegend() #画celltype的图
# print(p1+p2)
# # ggsave("vis_pancras.png",plot=p1+p2)print("===================Creating SeuraObject==========")
data_seurat= CreateSeuratObject(pancreas.data, meta.data = metadata)
print("===================Split SeuratOject============")
scRNAlist <- SplitObject(data_seurat, split.by = "tech")
print("===================Normalize SeuratObject=======")
scRNAlist <- lapply(scRNAlist, FUN = function(x) NormalizeData(x,verbose=F))
print("===================Find HVG=====================")
scRNAlist <- lapply(scRNAlist, FUN = function(x) FindVariableFeatures(x,verbose=F))
print("preprecessing done")print(scRNAlist)
data.anchors <- FindIntegrationAnchors(object.list =scRNAlist, dims = 1:20,verbose = F)
data.combined <- IntegrateData(anchorset = data.anchors, dims = 1:20,verbose = F)   DefaultAssay(data.combined) <- "integrated"
################### scale data  =====================
data.combined <- ScaleData(data.combined, verbose = FALSE)
data.combined <- RunPCA(data.combined, npcs = 30, verbose = FALSE)
# t-SNE and Clustering
data.combined <- RunUMAP(data.combined, reduction = "pca", dims = 1:20,verbose=F)p1=DimPlot(data.combined, reduction = "umap", group.by = "tech", label.size = 10)+ggtitle("Integrated Batch")
p2=DimPlot(data.combined, reduction = "umap", group.by = "celltype", label.size = 10)+ggtitle("Integrated Celltype")
p= p1 + p2 
print(p1+p2)# adata <- CreateSeuratObject(pancreas.data, meta.data = metadata)
# message('Preprocessing...')
# adata.list <- SplitObject(adata, split.by = "tech")
# 
# print(dim(adata)[2])
# if(dim(adata)[2] < 50000){
#   for (i in 1:length(adata.list)) {
#     adata.list[[i]] <- NormalizeData(adata.list[[i]], verbose = FALSE)
#     adata.list[[i]] <- FindVariableFeatures(adata.list[[i]], selection.method = "vst", nfeatures = args$n_top_features, verbose = FALSE)
#   }
#   message('FindIntegrationAnchors...')
#   adata.anchors <- FindIntegrationAnchors(object.list = adata.list, dims = 1:30,verbose =FALSE,k.filter = 30)
#   #     adata.anchors <- FindIntegrationAnchors(object.list = adata.list, dims = 1:30,verbose =FALSE,k.filter = 100)
#   
#   message('IntegrateData...')
#   adata.integrated <- IntegrateData(anchorset = adata.anchors, dims = 1:30, verbose = FALSE)
# }else{
#   adata.list <- future_lapply(X = adata.list, FUN = function(x) {
#     x <- NormalizeData(x, verbose = FALSE)
#     x <- FindVariableFeatures(x, nfeatures = args$n_top_features, verbose = FALSE)
#   })
#   
#   features <- SelectIntegrationFeatures(object.list = adata.list)
#   adata.list <- future_lapply(X = adata.list, FUN = function(x) {
#     x <- ScaleData(x, features = features, verbose = FALSE)
#     x <- RunPCA(x, features = features, verbose = FALSE)
#   })
#   message('FindIntegrationAnchors...')
#   adata.anchors <- FindIntegrationAnchors(object.list = adata.list, dims = 1:30, verbose =FALSE, reduction = 'rpca', reference = c(1, 2))
#   message('IntegrateData...')
#   adata.integrated <- IntegrateData(anchorset = adata.anchors, dims = 1:30, verbose = FALSE)
# }
# 
# if (!file.exists(args$output_path)){
#   dir.create(file.path(args$output_path),recursive = TRUE)
# }

这篇关于Seurat数据集处理流程的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/163758

相关文章

SpringBoot整合Flowable实现工作流的详细流程

《SpringBoot整合Flowable实现工作流的详细流程》Flowable是一个使用Java编写的轻量级业务流程引擎,Flowable流程引擎可用于部署BPMN2.0流程定义,创建这些流程定义的... 目录1、流程引擎介绍2、创建项目3、画流程图4、开发接口4.1 Java 类梳理4.2 查看流程图4

电脑提示xlstat4.dll丢失怎么修复? xlstat4.dll文件丢失处理办法

《电脑提示xlstat4.dll丢失怎么修复?xlstat4.dll文件丢失处理办法》长时间使用电脑,大家多少都会遇到类似dll文件丢失的情况,不过,解决这一问题其实并不复杂,下面我们就来看看xls... 在Windows操作系统中,xlstat4.dll是一个重要的动态链接库文件,通常用于支持各种应用程序

SQL Server修改数据库名及物理数据文件名操作步骤

《SQLServer修改数据库名及物理数据文件名操作步骤》在SQLServer中重命名数据库是一个常见的操作,但需要确保用户具有足够的权限来执行此操作,:本文主要介绍SQLServer修改数据... 目录一、背景介绍二、操作步骤2.1 设置为单用户模式(断开连接)2.2 修改数据库名称2.3 查找逻辑文件名

SQL Server数据库死锁处理超详细攻略

《SQLServer数据库死锁处理超详细攻略》SQLServer作为主流数据库管理系统,在高并发场景下可能面临死锁问题,影响系统性能和稳定性,这篇文章主要给大家介绍了关于SQLServer数据库死... 目录一、引言二、查询 Sqlserver 中造成死锁的 SPID三、用内置函数查询执行信息1. sp_w

Java对异常的认识与异常的处理小结

《Java对异常的认识与异常的处理小结》Java程序在运行时可能出现的错误或非正常情况称为异常,下面给大家介绍Java对异常的认识与异常的处理,本文给大家介绍的非常详细,对大家的学习或工作具有一定的参... 目录一、认识异常与异常类型。二、异常的处理三、总结 一、认识异常与异常类型。(1)简单定义-什么是

canal实现mysql数据同步的详细过程

《canal实现mysql数据同步的详细过程》:本文主要介绍canal实现mysql数据同步的详细过程,本文通过实例图文相结合给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值,需要的... 目录1、canal下载2、mysql同步用户创建和授权3、canal admin安装和启动4、canal

java Long 与long之间的转换流程

《javaLong与long之间的转换流程》Long类提供了一些方法,用于在long和其他数据类型(如String)之间进行转换,本文将详细介绍如何在Java中实现Long和long之间的转换,感... 目录概述流程步骤1:将long转换为Long对象步骤2:将Longhttp://www.cppcns.c

使用SpringBoot整合Sharding Sphere实现数据脱敏的示例

《使用SpringBoot整合ShardingSphere实现数据脱敏的示例》ApacheShardingSphere数据脱敏模块,通过SQL拦截与改写实现敏感信息加密存储,解决手动处理繁琐及系统改... 目录痛点一:痛点二:脱敏配置Quick Start——Spring 显示配置:1.引入依赖2.创建脱敏

Golang 日志处理和正则处理的操作方法

《Golang日志处理和正则处理的操作方法》:本文主要介绍Golang日志处理和正则处理的操作方法,本文通过实例代码给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值,需要的朋友参考... 目录1、logx日志处理1.1、logx简介1.2、日志初始化与配置1.3、常用方法1.4、配合defer

springboot加载不到nacos配置中心的配置问题处理

《springboot加载不到nacos配置中心的配置问题处理》:本文主要介绍springboot加载不到nacos配置中心的配置问题处理,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑... 目录springboot加载不到nacos配置中心的配置两种可能Spring Boot 版本Nacos