配置Hadoop2.x的HDFS、MapReduce来运行WordCount程序

2024-06-14 06:18

本文主要是介绍配置Hadoop2.x的HDFS、MapReduce来运行WordCount程序,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

主机HDFSMapReduce
node1NameNodeResourceManager
node2SecondaryNameNode & DataNodeNodeManager
node3DataNodeNodeManager
node4DataNodeNodeManager

1.配置hadoop-env.sh

export JAVA_HOME=/csh/link/jdk

2.配置core-site.xml

<property><name>fs.defaultFS</name><value>hdfs://node1:9000</value>
</property>
<property><name>hadoop.tmp.dir</name><value>/csh/hadoop/hadoop2.7.2/tmp</value>
</property>

3.配置hdfs-site.xml

<property><name>dfs.namenode.http-address</name><value>node1:50070</value>
</property>
<property><name>dfs.namenode.secondary.http-address</name><value>node2:50090</value>
</property>
<property><name>dfs.namenode.name.dir</name><value>/csh/hadoop/hadoop2.7.2/name</value>
</property>
<property><name>dfs.datanode.data.dir</name><value>/csh/hadoop/hadoop2.7.2/data</value>
</property>
<property><name>dfs.replication</name><value>3</value>
</property>

4.配置mapred-site.xml

<property><name>mapreduce.framework.name</name><value>yarn</value>
</property>

5.配置yarn-site.xml

<property><name>yarn.resourcemanager.hostname</name><value>node1</value>
</property>
<property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value>
</property>
<property><name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

6.配置masters

node2

7.配置slaves

node2
node3
node4

8.启动Hadoop

bin/hadoop namenode -format
sbin/start-dfs.sh
sbin/start-yarn.sh

9.运行WordCount程序

//创建文件wc.txt
echo "I love Java I love Hadoop I love BigData Good Good Study, Day Day Up" > wc.txt
//创建HDFS中的文件
hdfs dfs -mkdir -p /input/wordcount/
//将wc.txt上传到HDFS中
hdfs dfs -put wc.txt /input/wordcount
//运行WordCount程序
hadoop jar /csh/software/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input/wordcount/ /output/wordcount/

10.结果

[root@node1 sbin]# hadoop jar /csh/software/hadoop-2.7.2/share/hadoop/mapreduce/hadoapreduce-examples-2.7.2.jar wordcount /input/wordcount/ /output/wordcount/
16/03/24 19:26:48 INFO client.RMProxy: Connecting to ResourceManager at node1/192.161.11:8032
16/03/24 19:26:56 INFO input.FileInputFormat: Total input paths to process : 1
16/03/24 19:26:56 INFO mapreduce.JobSubmitter: number of splits:1
16/03/24 19:26:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_145887175_0001
16/03/24 19:26:59 INFO impl.YarnClientImpl: Submitted application application_145887175_0001
16/03/24 19:27:00 INFO mapreduce.Job: The url to track the job: http://node1:8088/prapplication_1458872237175_0001/
16/03/24 19:27:00 INFO mapreduce.Job: Running job: job_1458872237175_0001
16/03/24 19:28:13 INFO mapreduce.Job: Job job_1458872237175_0001 running in uber modfalse
16/03/24 19:28:13 INFO mapreduce.Job:  map 0% reduce 0%
16/03/24 19:30:07 INFO mapreduce.Job:  map 100% reduce 0%
16/03/24 19:31:13 INFO mapreduce.Job:  map 100% reduce 33%
16/03/24 19:31:16 INFO mapreduce.Job:  map 100% reduce 100%
16/03/24 19:31:23 INFO mapreduce.Job: Job job_1458872237175_0001 completed successfu
16/03/24 19:31:24 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=106FILE: Number of bytes written=235387FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=174HDFS: Number of bytes written=64HDFS: Number of read operations=6HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=116501Total time spent by all reduces in occupied slots (ms)=53945Total time spent by all map tasks (ms)=116501Total time spent by all reduce tasks (ms)=53945Total vcore-milliseconds taken by all map tasks=116501Total vcore-milliseconds taken by all reduce tasks=53945Total megabyte-milliseconds taken by all map tasks=119297024Total megabyte-milliseconds taken by all reduce tasks=55239680Map-Reduce FrameworkMap input records=4Map output records=15Map output bytes=129Map output materialized bytes=106Input split bytes=105Combine input records=15Combine output records=9Reduce input groups=9Reduce shuffle bytes=106Reduce input records=9Reduce output records=9Spilled Records=18Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=1468CPU time spent (ms)=6780Physical memory (bytes) snapshot=230531072Virtual memory (bytes) snapshot=4152713216Total committed heap usage (bytes)=134795264Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=69File Output Format Counters Bytes Written=64
[root@node1 sbin]# hdfs dfs -cat /output/wordcount/*
BigData 1
Day 2
Good    2
Hadoop  1
I   3
Java    1
Study,  1
Up  1
love    3

个人博客原文:
配置Hadoop2.x的HDFS、MapReduce来运行WordCount程序

这篇关于配置Hadoop2.x的HDFS、MapReduce来运行WordCount程序的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1059611

相关文章

SpringBoot多环境配置数据读取方式

《SpringBoot多环境配置数据读取方式》SpringBoot通过环境隔离机制,支持properties/yaml/yml多格式配置,结合@Value、Environment和@Configura... 目录一、多环境配置的核心思路二、3种配置文件格式详解2.1 properties格式(传统格式)1.

Debian系和Redhat系防火墙配置方式

《Debian系和Redhat系防火墙配置方式》文章对比了Debian系UFW和Redhat系Firewalld防火墙的安装、启用禁用、端口管理、规则查看及注意事项,强调SSH端口需开放、规则持久化,... 目录Debian系UFW防火墙1. 安装2. 启用与禁用3. 基本命令4. 注意事项5. 示例配置R

PyCharm中配置PyQt的实现步骤

《PyCharm中配置PyQt的实现步骤》PyCharm是JetBrains推出的一款强大的PythonIDE,结合PyQt可以进行pythion高效开发桌面GUI应用程序,本文就来介绍一下PyCha... 目录1. 安装China编程PyQt1.PyQt 核心组件2. 基础 PyQt 应用程序结构3. 使用 Q

Redis MCP 安装与配置指南

《RedisMCP安装与配置指南》本文将详细介绍如何安装和配置RedisMCP,包括快速启动、源码安装、Docker安装、以及相关的配置参数和环境变量设置,感兴趣的朋友一起看看吧... 目录一、Redis MCP 简介二、安www.chinasem.cn装 Redis MCP 服务2.1 快速启动(推荐)2.

MySQL多实例管理如何在一台主机上运行多个mysql

《MySQL多实例管理如何在一台主机上运行多个mysql》文章详解了在Linux主机上通过二进制方式安装MySQL多实例的步骤,涵盖端口配置、数据目录准备、初始化与启动流程,以及排错方法,适用于构建读... 目录一、什么是mysql多实例二、二进制方式安装MySQL1.获取二进制代码包2.安装基础依赖3.清

Spring Boot配置和使用两个数据源的实现步骤

《SpringBoot配置和使用两个数据源的实现步骤》本文详解SpringBoot配置双数据源方法,包含配置文件设置、Bean创建、事务管理器配置及@Qualifier注解使用,强调主数据源标记、代... 目录Spring Boot配置和使用两个数据源技术背景实现步骤1. 配置数据源信息2. 创建数据源Be

在IntelliJ IDEA中高效运行与调试Spring Boot项目的实战步骤

《在IntelliJIDEA中高效运行与调试SpringBoot项目的实战步骤》本章详解SpringBoot项目导入IntelliJIDEA的流程,教授运行与调试技巧,包括断点设置与变量查看,奠定... 目录引言:为良驹配上好鞍一、为何选择IntelliJ IDEA?二、实战:导入并运行你的第一个项目步骤1

Spring Boot Maven 插件如何构建可执行 JAR 的核心配置

《SpringBootMaven插件如何构建可执行JAR的核心配置》SpringBoot核心Maven插件,用于生成可执行JAR/WAR,内置服务器简化部署,支持热部署、多环境配置及依赖管理... 目录前言一、插件的核心功能与目标1.1 插件的定位1.2 插件的 Goals(目标)1.3 插件定位1.4 核

RabbitMQ消息总线方式刷新配置服务全过程

《RabbitMQ消息总线方式刷新配置服务全过程》SpringCloudBus通过消息总线与MQ实现微服务配置统一刷新,结合GitWebhooks自动触发更新,避免手动重启,提升效率与可靠性,适用于配... 目录前言介绍环境准备代码示例测试验证总结前言介绍在微服务架构中,为了更方便的向微服务实例广播消息,

nginx 负载均衡配置及如何解决重复登录问题

《nginx负载均衡配置及如何解决重复登录问题》文章详解Nginx源码安装与Docker部署,介绍四层/七层代理区别及负载均衡策略,通过ip_hash解决重复登录问题,对nginx负载均衡配置及如何... 目录一:源码安装:1.配置编译参数2.编译3.编译安装 二,四层代理和七层代理区别1.二者混合使用举例