purr map walk 学习教程 完整版教程学习

2023-10-12 02:52

本文主要是介绍purr map walk 学习教程 完整版教程学习,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Function reference • purrricon-default.png?t=N7T8https://purrr.tidyverse.org/reference/index.htmlMap over multiple input simultaneously (in "parallel") — pmap • purrr

11 Other purrr functions | Functional Programming (stanford.edu)

关注微信:生信小博士

11.1 Map functions that output tibbles

Instead of creating an atomic vector or list, the map variants map_dfr() and map_dfc() create a tibble.

With these map functions, the assembly line worker creates a tibble for each input element, and the output conveyor belt ends up with a collection of tibbles.

The worker then combines all the small tibbles into a single, larger tibble. There are multiple ways to combine smaller tibbles into a larger tibble. map_dfr() (r for rows) stacks the smaller tibbles on top of each other.

map_dfc() (c for columns) stacks them side-by-side.

There are _dfr and _dfc variants of pmap() and map2() as well. In the following sections, we’ll cover map_dfr() and map_dfc() in more detail.

11.1.1 _dfr

map_dfr() is useful when reading in data from multiple files. The following code reads in several very simple csv files, each of which contains the name of a different dinosaur genus.

read_csv("data/purrr-extras/file_001.csv")
#> # A tibble: 1 × 2
#>      id genus        
#>   <dbl> <chr>        
#> 1     1 Hoplitosaurusread_csv("data/purrr-extras/file_002.csv")
#> # A tibble: 1 × 2
#>      id genus        
#>   <dbl> <chr>        
#> 1     2 Herrerasaurusread_csv("data/purrr-extras/file_003.csv")
#> # A tibble: 1 × 2
#>      id genus      
#>   <dbl> <chr>      
#> 1     3 Coelophysis

read_csv() produces a tibble, and so we can use map_dfr() to map over all three file names and bind the resulting individual tibbles into a single tibble.

files <- str_glue("data/purrr-extras/file_00{1:3}.csv")
files
#> data/purrr-extras/file_001.csv
#> data/purrr-extras/file_002.csv
#> data/purrr-extras/file_003.csvfiles %>% map_dfr(read_csv)
#> # A tibble: 3 × 2
#>      id genus        
#>   <dbl> <chr>        
#> 1     1 Hoplitosaurus
#> 2     2 Herrerasaurus
#> 3     3 Coelophysis

The result is a tibble with three rows and two columns, because map_dfr() aligns the columns of the individual tibbles by name.

The individual tibbles can have different numbers of rows or columns. map_dfr() just creates a column for each unique column name. If some of the individual tibbles lack a column that others have, map_dfr() fills in with NA values.

read_csv("data/purrr-extras/file_004.csv")
#> # A tibble: 2 × 3
#>      id genus         start_period 
#>   <dbl> <chr>         <chr>        
#> 1     4 Dilophosaurus Sinemurian   
#> 2     5 Segisaurus    Pliensbachianc(files, "data/purrr-extras/file_004.csv") %>% map_dfr(read_csv)
#> # A tibble: 5 × 3
#>      id genus         start_period 
#>   <dbl> <chr>         <chr>        
#> 1     1 Hoplitosaurus <NA>         
#> 2     2 Herrerasaurus <NA>         
#> 3     3 Coelophysis   <NA>         
#> 4     4 Dilophosaurus Sinemurian   
#> 5     5 Segisaurus    Pliensbachian

11.1.2 _dfc

map_dfc() is typically less useful than map_dfr() because it relies on row position to stack the tibbles side-by-side. Row position is prone to error, and it will often be difficult to check if the data in each row is aligned correctly. However, if you have data with variables in different places and are positive the rows are aligned, map_dfc() may be appropriate.

Unfortunately, even if the individual tibbles contain a unique identifier for each row, map_dfc() doesn’t use the identifiers to verify that the rows are aligned correctly, nor does it combine identically named columns.

read_csv("data/purrr-extras/file_005.csv")
#> # A tibble: 1 × 3
#>      id diet      start_period
#>   <dbl> <chr>     <chr>       
#> 1     1 herbivore Barremianc("data/purrr-extras/file_001.csv", "data/purrr-extras/file_005.csv") %>% map_dfc(read_csv)
#> # A tibble: 1 × 5
#>   id...1 genus         id...3 diet      start_period
#>    <dbl> <chr>          <dbl> <chr>     <chr>       
#> 1      1 Hoplitosaurus      1 herbivore Barremian

Instead, you end up with a duplicated column (id...1 and id...3).

If you have a unique identifier for each row, it is much better to join on that identifier.

left_join(read_csv("data/purrr-extras/file_001.csv"),read_csv("data/purrr-extras/file_005.csv"),by = "id"
)
#> # A tibble: 1 × 4
#>      id genus         diet      start_period
#>   <dbl> <chr>         <chr>     <chr>       
#> 1     1 Hoplitosaurus herbivore Barremian

Also, because map_dfc() combines tibbles by row position, the tibbles can have different numbers of columns, but they should have the same number of rows.

11.2 Walk

The walk functions work similarly to the map functions, but you use them when you’re interested in applying a function that performs an action instead of producing data (e.g., print()).

The walk functions are useful for performing actions like writing files and printing plots. For example, say we used purrr to generate a list of plots.

set.seed(745)plot_rnorm <- function(sd) {tibble(x = rnorm(n = 5000, mean = 0, sd = sd)) %>% ggplot(aes(x)) +geom_histogram(bins = 40) +geom_vline(xintercept = 0, color = "blue")
}plots <-c(5, 1, 9) %>% map(plot_rnorm)

We can now use walk() to print them out.

plots %>% walk(print)

这篇关于purr map walk 学习教程 完整版教程学习的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/192734

相关文章

python使用Akshare与Streamlit实现股票估值分析教程(图文代码)

《python使用Akshare与Streamlit实现股票估值分析教程(图文代码)》入职测试中的一道题,要求:从Akshare下载某一个股票近十年的财务报表包括,资产负债表,利润表,现金流量表,保存... 目录一、前言二、核心知识点梳理1、Akshare数据获取2、Pandas数据处理3、Matplotl

Python pandas库自学超详细教程

《Pythonpandas库自学超详细教程》文章介绍了Pandas库的基本功能、安装方法及核心操作,涵盖数据导入(CSV/Excel等)、数据结构(Series、DataFrame)、数据清洗、转换... 目录一、什么是Pandas库(1)、Pandas 应用(2)、Pandas 功能(3)、数据结构二、安

2025版mysql8.0.41 winx64 手动安装详细教程

《2025版mysql8.0.41winx64手动安装详细教程》本文指导Windows系统下MySQL安装配置,包含解压、设置环境变量、my.ini配置、初始化密码获取、服务安装与手动启动等步骤,... 目录一、下载安装包二、配置环境变量三、安装配置四、启动 mysql 服务,修改密码一、下载安装包安装地

电脑提示d3dx11_43.dll缺失怎么办? DLL文件丢失的多种修复教程

《电脑提示d3dx11_43.dll缺失怎么办?DLL文件丢失的多种修复教程》在使用电脑玩游戏或运行某些图形处理软件时,有时会遇到系统提示“d3dx11_43.dll缺失”的错误,下面我们就来分享超... 在计算机使用过程中,我们可能会遇到一些错误提示,其中之一就是缺失某个dll文件。其中,d3dx11_4

Linux下在线安装启动VNC教程

《Linux下在线安装启动VNC教程》本文指导在CentOS7上在线安装VNC,包含安装、配置密码、启动/停止、清理重启步骤及注意事项,强调需安装VNC桌面以避免黑屏,并解决端口冲突和目录权限问题... 目录描述安装VNC安装 VNC 桌面可能遇到的问题总结描js述linux中的VNC就类似于Window

Go语言编译环境设置教程

《Go语言编译环境设置教程》Go语言支持高并发(goroutine)、自动垃圾回收,编译为跨平台二进制文件,云原生兼容且社区活跃,开发便捷,内置测试与vet工具辅助检测错误,依赖模块化管理,提升开发效... 目录Go语言优势下载 Go  配置编译环境配置 GOPROXYIDE 设置(VS Code)一些基本

Kotlin Map映射转换问题小结

《KotlinMap映射转换问题小结》文章介绍了Kotlin集合转换的多种方法,包括map(一对一转换)、mapIndexed(带索引)、mapNotNull(过滤null)、mapKeys/map... 目录Kotlin 集合转换:map、mapIndexed、mapNotNull、mapKeys、map

Windows环境下解决Matplotlib中文字体显示问题的详细教程

《Windows环境下解决Matplotlib中文字体显示问题的详细教程》本文详细介绍了在Windows下解决Matplotlib中文显示问题的方法,包括安装字体、更新缓存、配置文件设置及编码調整,并... 目录引言问题分析解决方案详解1. 检查系统已安装字体2. 手动添加中文字体(以SimHei为例)步骤

Java JDK1.8 安装和环境配置教程详解

《JavaJDK1.8安装和环境配置教程详解》文章简要介绍了JDK1.8的安装流程,包括官网下载对应系统版本、安装时选择非系统盘路径、配置JAVA_HOME、CLASSPATH和Path环境变量,... 目录1.下载JDK2.安装JDK3.配置环境变量4.检验JDK官网下载地址:Java Downloads

使用Docker构建Python Flask程序的详细教程

《使用Docker构建PythonFlask程序的详细教程》在当今的软件开发领域,容器化技术正变得越来越流行,而Docker无疑是其中的佼佼者,本文我们就来聊聊如何使用Docker构建一个简单的Py... 目录引言一、准备工作二、创建 Flask 应用程序三、创建 dockerfile四、构建 Docker