mapreduce | 自定义Partition分区（案例1）

本文主要是介绍mapreduce | 自定义Partition分区（案例1），希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

1.需求

将学生成绩，按照各个成绩降序排序，各个科目成绩单独输出。

# 自定义partition 将下面数据分区处理：

人名科目成绩

张三语文 10

李四数学 30

王五语文 20

赵6 英语 40

张三数据 50

李四语文 10

张三英语 70

李四英语 80

王五英语 45

王五数学 10

赵6 数学 10

赵6 语文 100

2.思路分析

# 自定义分区

1. 编写自定义分区类，继承Partitioner覆盖getPartition方法注意：分区号从0开始算。

2. 给job注册分区类【覆盖默认分区】 job.setPartitionerClass(自定义Partitioner.class); 3. 设置ReduceTask个数(开启分区) job.setNumReduceTasks(数字);//reduceTask数量要和分区数量一样。

3.Idea代码

DefinePartitionJob

package demo7;import demo5.DescIntWritable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;import java.io.IOException;public class DefinePartitionJob {public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {Configuration conf = new Configuration();conf.set("fs.defaultFS","hdfs://hadoop10:8020");Job job = Job.getInstance(conf);job.setJarByClass(DefinePartitionJob.class);job.setInputFormatClass(TextInputFormat.class);job.setOutputFormatClass(TextOutputFormat.class);TextInputFormat.addInputPath(job,new Path("/mapreduce/demo10"));TextOutputFormat.setOutputPath(job,new Path("/mapreduce/demo10/out"));job.setMapperClass(DefinePartitonMapper.class);job.setReducerClass(DefinePartitonReducer.class);//map输出的键与值类型job.setMapOutputKeyClass(DescIntWritable.class);job.setMapOutputValueClass(Subject.class);//reducer输出的键与值类型job.setOutputKeyClass(Subject.class);job.setOutputValueClass(DescIntWritable.class);//设置reduceTask的个数job.setNumReduceTasks(4);//设置自定义分区job.setPartitionerClass(MyPartition.class);boolean b = job.waitForCompletion(true);System.out.println(b);}static class DefinePartitonMapper extends Mapper<LongWritable, Text, DescIntWritable,Subject> {@Overrideprotected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException {String[] arr = value.toString().split("\t");context.write(new DescIntWritable(Integer.parseInt(arr[2])),new Subject(arr[0],arr[1]));}}static class DefinePartitonReducer extends Reducer<DescIntWritable,Subject,Subject,DescIntWritable> {@Overrideprotected void reduce(DescIntWritable key, Iterable<Subject> values, Context context) throws IOException, InterruptedException {for (Subject subject : values) {context.write(subject, key);}}}}

MyPartition

package demo7;import demo5.DescIntWritable;
import org.apache.hadoop.mapreduce.Partitioner;public class MyPartition extends Partitioner<DescIntWritable,Subject> {@Overridepublic int getPartition(DescIntWritable key, Subject value, int numPartitions) {if ("语文".equals(value.getKemu())){return 0;}else if ("数学".equals(value.getKemu())) {return 1;}else if ("英语".equals(value.getKemu())) {return 2;}return 3;}
}

Subject

package demo7;import org.apache.hadoop.io.Writable;import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;public class Subject implements Writable{private String name;private String kemu;public Subject() {}public Subject(String name, String kemu) {this.name = name;this.kemu = kemu;}public String getName() {return name;}public void setName(String name) {this.name = name;}public String getKemu() {return kemu;}public void setKemu(String kemu) {this.kemu = kemu;}@Overridepublic void write(DataOutput out) throws IOException {out.writeUTF(name);out.writeUTF(kemu);}@Overridepublic void readFields(DataInput in) throws IOException {this.name = in.readUTF();this.kemu = in.readUTF();}@Overridepublic String toString() {return name + " " +kemu;}
}

4.在hdfs查看结果

不要去争辩，多提升自己~

这篇关于mapreduce | 自定义Partition分区（案例1）的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

原文地址:
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.chinasem.cn/article/984667。如若内容造成侵权/违法违规/事实不符，请联系我们进行投诉反馈，一经查实，立即删除！我们的邮箱：23002807@qq.com

mapreduce | 自定义Partition分区（案例1）

1.需求

2.思路分析

3.Idea代码

DefinePartitionJob

MyPartition

Subject

4.在hdfs查看结果

相关文章

Python通用唯一标识符模块uuid使用案例详解

PostgreSQL的扩展dict_int应用案例解析

MySQL 定时新增分区的实现示例

Python中re模块结合正则表达式的实际应用案例

Python get()函数用法案例详解

MySQL中的索引结构和分类实战案例详解

Java实现自定义table宽高的示例代码

一文详解Java Stream的sorted自定义排序

从入门到精通MySQL 数据库索引(实战案例)

HTML中meta标签的常见使用案例(示例详解)