【Hadoop】Hadoop官方文档翻译—— YARN ResourceManager High Availability 2.7.3

本文主要是介绍【Hadoop】Hadoop官方文档翻译—— YARN ResourceManager High Availability 2.7.3,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

ResourceManager High Availability (RM高可用)

 

    • Introduction(简介)
    • Architecture(架构)
      • RM Failover(RM 故障切换)
      • Recovering prevous active-RM’s state(恢复之前活动的RM的状态)
    • Deployment(部署)
      • Configurations(配置)
      • Admin commands(管理命令)
      • ResourceManager Web UI services(RM Web UI服务)
      • Web Services(Web 服务)

Introduction

This guide provides an overview of High Availability of YARN’s ResourceManager, and details how to configure and use this feature. The ResourceManager (RM) is responsible for tracking the resources in a cluster, and scheduling applications (e.g., MapReduce jobs). Prior to Hadoop 2.4, the ResourceManager is the single point of failure in a YARN cluster. The High Availability feature adds redundancy in the form of an Active/Standby ResourceManager pair to remove this otherwise single point of failure.

这个指导提供YARN的ResourceManager的高可用综述,和如何配置和使用这个特性的细节。RM负责跟踪集群中的资源和调度应用(例如 MapReduce作业)。在Hadoop2.4之前,RM是YARN集群中的一个单点故障。这个高可用特性以活动/备用 RM对的形式增加了冗余来移除这个潜在的单点故障。

Architecture(架构)

RM Failover(RM故障切换)

ResourceManager HA is realized through an Active/Standby architecture - at any point of time, one of the RMs is Active, and one or more RMs are in Standby mode waiting to take over should anything happen to the Active. The trigger to transition-to-active comes from either the admin (through CLI) or through the integrated failover-controller when automatic-failover is enabled.

RM的高可用特性通过任何时间点的主/备架构来实现的,一个RM作为活动,而其他RMs进入备用模式随时等待接管出事的活动的RM。备用转活跃的触发可以通过管理员用命令行或者通过集成的故障切换控制器配置允许自动故障切换。

Manual transitions and failover(手动切换和故障切换)

When automatic failover is not enabled, admins have to manually transition one of the RMs to Active. To failover from one RM to the other, they are expected to first transition the Active-RM to Standby and transition a Standby-RM to Active. All this can be done using the “yarn rmadmin” CLI.

当自动故障切换没有被激活时,管理员必须手动地转换RMs中的一个为活跃。RM的故障切换时首先将活跃的RM切换为备用然后将一个备用的RM切换为活跃状态。这些都可以用“yarn rmadmin”命令行来实现。

Automatic failover(自动故障切换)

The RMs have an option to embed the Zookeeper-based ActiveStandbyElector to decide which RM should be the Active. When the Active goes down or becomes unresponsive, another RM is automatically elected to be the Active which then takes over. Note that, there is no need to run a separate ZKFC daemon as is the case for HDFS because ActiveStandbyElector embedded in RMs acts as a failure detector and a leader elector instead of a separate ZKFC deamon.

RMa有个选项来嵌入基于Zookeepper的主备选举机制来决定哪个RM是活跃的。当活跃的RM失效或者反应迟钝,另一个RM会被自动选举为主用然后接管工作。需要注意的是,没必要为HDFS运行一个单独的ZKFC进程因为主备选举机制内嵌到RMs作为一个失效检查器和选举器来替代一个单独的ZKFC进程。

Client, ApplicationMaster and NodeManager on RM failover(客户端、应用控制器和节点管理器在RM的故障切换下的转移)

When there are multiple RMs, the configuration (yarn-site.xml) used by clients and nodes is expected to list all the RMs. Clients, ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the RMs in a round-robin fashion until they hit the Active RM. If the Active goes down, they resume the round-robin polling until they hit the “new” Active. This default retry logic is implemented as org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider. You can override the logic by implementing org.apache.hadoop.yarn.client.RMFailoverProxyProvider and setting the value of yarn.client.failover-proxy-provider to the class name.

当有多个RM,客户端和节点可以通过配置(yarn-site.xml)来获得RM的列表。客户端、应用控制器和节点管理器采用循环的方式来试图连上RM直到他们连上活跃RM。如果活跃的RM失效了,它们重新开始以循环的方式去连接RM直到他们连上新的活跃RM。这个默认的重试逻辑是org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider实现的。你可以通过实现 org.apache.hadoop.yarn.client.RMFailoverProxyProvider 来覆盖这个逻辑并将yarn.client.failover-proxy-provider的值设为该类名。

Recovering prevous active-RM’s state(恢复到之前活跃RM的状态)

With the ResourceManger Restart enabled, the RM being promoted to an active state loads the RM internal state and continues to operate from where the previous active left off as much as possible depending on the RM restart feature. A new attempt is spawned for each managed application previously submitted to the RM. Applications can checkpoint periodically to avoid losing any work. The state-store must be visible from the both of Active/Standby RMs. Currently, there are two RMStateStore implementations for persistence - FileSystemRMStateStore and ZKRMStateStore. The ZKRMStateStore implicitly allows write access to a single RM at any point in time, and hence is the recommended store to use in an HA cluster. When using the ZKRMStateStore, there is no need for a separate fencing mechanism to address a potential split-brain situation where multiple RMs can potentially assume the Active role. When using the ZKRMStateStore, it is advisable to NOT set the “zookeeper.DigestAuthenticationProvider.superDigest” property on the Zookeeper cluster to ensure that the zookeeper admin does not have access to YARN application/user credential information.

如果RM重启是被激活可用的,依靠RM的重启特性一个RM被提升为活跃RM状态时加载前面那个活跃RM留下尽可能多的RM的内部状态和操作。应用可以周期的检查来避免丢失任何工作。状态仓库对主用/备用RM都是可见的。目前,有两个实现的持久化RM状态仓库- FileSystemRMStateStore和ZKRMStateStore。ZKRMStateStore允许在任何一个时间点只对一个RM可写,因此推荐在HA集群中使用这个仓库。当使用ZKRMStateStore作为状态仓库,建议不要在Zookepper集群中设置zookeeper.DigestAuthenticationProvider.superDigest属性确保zookepper管理员没有进入YARN 应用和用户的权限信息。

Deployment(部署)

Configurations(配置)

Most of the failover functionality is tunable using various configuration properties. Following is a list of required/important ones. yarn-default.xml carries a full-list of knobs. See yarn-default.xml for more information including default values. See the document for ResourceManger Restart also for instructions on setting up the state-store.

大部分的故障切换功能都可以用各样的配置属性来调用。下面是属性中需要的/重要的部分列表。yarn-default.xml是完整的开关列表。去查看 yarn-default.xml 获取更多信息包括默认值。看ResourceManger Restart 文档也可以得到状态仓库的设置信息。

Configuration Properties

Description

yarn.resourcemanager.zk-address

Address of the ZK-quorum. Used both for the state-store and embedded leader-election.

yarn.resourcemanager.ha.enabled

Enable RM HA.

RM高可用激活

yarn.resourcemanager.ha.rm-ids

List of logical IDs for the RMs. e.g., “rm1,rm2”.

RMs的逻辑ID列表

yarn.resourcemanager.hostname.rm-id

For each rm-id, specify the hostname the RM corresponds to. Alternately, one could set each of the RM’s service addresses.

为每个RM-id指定一个主机名。或者可以设置每个RM的服务地址

yarn.resourcemanager.address.rm-id

For each rm-id, specify host:port for clients to submit jobs. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.

为每个rm-id设置主机:端口用来提交作业。如果设置,将覆盖yarn.resourcemanager.hostname.rm-id的设置

yarn.resourcemanager.scheduler.address.rm-id

For each rm-id, specify scheduler host:port for ApplicationMasters to obtain resources. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.

为每个rm-id指定AM的主机:端口来获取资源。如果设置了将覆盖yarn.resourcemanager.hostname.rm-id的设置

yarn.resourcemanager.resource-tracker.address.rm-id

For each rm-id, specify host:port for NodeManagers to connect. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.

为每个rm-id指定NodeManagers的连接的主机:端口。如果设置将覆盖yarn.resourcemanager.hostname.rm-id的设置

yarn.resourcemanager.admin.address.rm-id

For each rm-id, specify host:port for administrative commands. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.

为每个rm-id设置管理命令行的主机:端口。如果设置了将覆盖yarn.resourcemanager.hostname.rm-id的设置

yarn.resourcemanager.webapp.address.rm-id

For each rm-id, specify host:port of the RM web application corresponds to. You do not need this if you set yarn.http.policy to HTTPS_ONLY. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.

为每个rm-id指定用于RMweb应用通讯的主机:端口。如果你设置了yarn.http.policy to HTTPS_ONLY那就没必要设置了。如果设置了将覆盖yarn.resourcemanager.hostname.rm-id的设置

yarn.resourcemanager.webapp.https.address.rm-id

For each rm-id, specify host:port of the RM https web application corresponds to. You do not need this if you set yarn.http.policy to HTTP_ONLY. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.

为每个rm-id指定用于RM https web应用通讯的主机:端口。如果你设置了yarn.http.policy to HTTPS_ONLY那就没必要设置了。如果设置了将覆盖yarn.resourcemanager.hostname.rm-id的设置

yarn.resourcemanager.ha.id

Identifies the RM in the ensemble. This is optional; however, if set, admins have to ensure that all the RMs have their own IDs in the config.

定义一个RM的集合ID.这是可选的;然而,如果设置了,管理员将要确保所有的RM所有自己的ID

yarn.resourcemanager.ha.automatic-failover.enabled

Enable automatic failover; By default, it is enabled only when HA is enabled.

故障切换激活;默认的,在HA激活下可用。

yarn.resourcemanager.ha.automatic-failover.embedded

Use embedded leader-elector to pick the Active RM, when automatic failover is enabled. By default, it is enabled only when HA is enabled.

当自动故障切换可用时,使用内嵌的选举器来选择活跃RM。默认的,在HA激活下可用。

yarn.resourcemanager.cluster-id

Identifies the cluster. Used by the elector to ensure an RM doesn’t take over as Active for another cluster.

定义集群的ID。被选举器使用确保RM不会在其他集群中接管称为活跃RM

yarn.client.failover-proxy-provider

The class to be used by Clients, AMs and NMs to failover to the Active RM.

这个类用于将客户端、AMs和NMs转移到活跃的RM

yarn.client.failover-max-attempts

The max number of times FailoverProxyProvider should attempt failover.

尝试故障切换的最大尝试次数。

yarn.client.failover-sleep-base-ms

The sleep base (in milliseconds) to be used for calculating the exponential delay between failovers.

yarn.client.failover-sleep-max-ms

The maximum sleep time (in milliseconds) between failovers.

故障切换之间的最大休眠时间

yarn.client.failover-retries

The number of retries per attempt to connect to a ResourceManager.

每个尝试连接RM的重连次数

yarn.client.failover-retries-on-socket-timeouts

The number of retries per attempt to connect to a ResourceManager on socket timeouts.

每个尝试连接RM的重连次数的socket超时

Sample configurations(配置例子)

Here is the sample of minimal setup for RM failover.

<property><name>yarn.resourcemanager.ha.enabled</name><value>true</value>
</property>
<property><name>yarn.resourcemanager.cluster-id</name><value>cluster1</value>
</property>
<property><name>yarn.resourcemanager.ha.rm-ids</name><value>rm1,rm2</value>
</property>
<property><name>yarn.resourcemanager.hostname.rm1</name><value>master1</value>
</property>
<property><name>yarn.resourcemanager.hostname.rm2</name><value>master2</value>
</property>
<property><name>yarn.resourcemanager.webapp.address.rm1</name><value>master1:8088</value>
</property>
<property><name>yarn.resourcemanager.webapp.address.rm2</name><value>master2:8088</value>
</property>
<property><name>yarn.resourcemanager.zk-address</name><value>zk1:2181,zk2:2181,zk3:2181</value>
</property>

Admin commands(管理员命令)

yarn rmadmin has a few HA-specific command options to check the health/state of an RM, and transition to Active/Standby. Commands for HA take service id of RM set by yarn.resourcemanager.ha.rm-ids as argument.

 $ yarn rmadmin -getServiceState rm1active$ yarn rmadmin -getServiceState rm2standby

If automatic failover is enabled, you can not use manual transition command. Though you can override this by –forcemanual flag, you need caution.

 $ yarn rmadmin -transitionToStandby rm1Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@1d8299fdRefusing to manually manage HA state, since it may causea split-brain scenario or other incorrect state.If you are very sure you know what you are doing, pleasespecify the forcemanual flag.

See YarnCommands for more details.

ResourceManager Web UI services

Assuming a standby RM is up and running, the Standby automatically redirects all web requests to the Active, except for the “About” page.

假设一个备用RM被提升为活跃,该备用RM会自动重定向到所有提到活跃RM的请求,除了“About”页面

Web Services

Assuming a standby RM is up and running, RM web-services described at ResourceManager REST APIs when invoked on a standby RM are automatically redirected to the Active RM.

假设一个备用RM被提升为活跃,RM web-service在ResourceManager REST APIs 描述的用来将一个备用RM自动重定向活跃RM。


*由于译者本身能力有限,所以译文中肯定会出现表述不正确的地方,请大家多多包涵,也希望大家能够指出文中翻译得不对或者不准确的地方,共同探讨进步,谢谢

这篇关于【Hadoop】Hadoop官方文档翻译—— YARN ResourceManager High Availability 2.7.3的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1070219

相关文章

C#实现将Office文档(Word/Excel/PDF/PPT)转为Markdown格式

《C#实现将Office文档(Word/Excel/PDF/PPT)转为Markdown格式》Markdown凭借简洁的语法、优良的可读性,以及对版本控制系统的高度兼容性,逐渐成为最受欢迎的文档格式... 目录为什么要将文档转换为 Markdown 格式使用工具将 Word 文档转换为 Markdown(.

详解如何使用Python构建从数据到文档的自动化工作流

《详解如何使用Python构建从数据到文档的自动化工作流》这篇文章将通过真实工作场景拆解,为大家展示如何用Python构建自动化工作流,让工具代替人力完成这些数字苦力活,感兴趣的小伙伴可以跟随小编一起... 目录一、Excel处理:从数据搬运工到智能分析师二、PDF处理:文档工厂的智能生产线三、邮件自动化:

Python实现自动化Word文档样式复制与内容生成

《Python实现自动化Word文档样式复制与内容生成》在办公自动化领域,高效处理Word文档的样式和内容复制是一个常见需求,本文将展示如何利用Python的python-docx库实现... 目录一、为什么需要自动化 Word 文档处理二、核心功能实现:样式与表格的深度复制1. 表格复制(含样式与内容)2

Maven项目中集成数据库文档生成工具的操作步骤

《Maven项目中集成数据库文档生成工具的操作步骤》在Maven项目中,可以通过集成数据库文档生成工具来自动生成数据库文档,本文为大家整理了使用screw-maven-plugin(推荐)的完... 目录1. 添加插件配置到 pom.XML2. 配置数据库信息3. 执行生成命令4. 高级配置选项5. 注意事

Python使用python-docx实现自动化处理Word文档

《Python使用python-docx实现自动化处理Word文档》这篇文章主要为大家展示了Python如何通过代码实现段落样式复制,HTML表格转Word表格以及动态生成可定制化模板的功能,感兴趣的... 目录一、引言二、核心功能模块解析1. 段落样式与图片复制2. html表格转Word表格3. 模板生

ubuntu系统使用官方操作命令升级Dify指南

《ubuntu系统使用官方操作命令升级Dify指南》Dify支持自动化执行、日志记录和结果管理,适用于数据处理、模型训练和部署等场景,今天我们就来看看ubuntu系统中使用官方操作命令升级Dify的方... Dify 是一个基于 docker 的工作流管理工具,旨在简化机器学习和数据科学领域的多步骤工作流。

浅谈Redis Key 命名规范文档

《浅谈RedisKey命名规范文档》本文介绍了Redis键名命名规范,包括命名格式、具体规范、数据类型扩展命名、时间敏感型键名、规范总结以及实际应用示例,感兴趣的可以了解一下... 目录1. 命名格式格式模板:示例:2. 具体规范2.1 小写命名2.2 使用冒号分隔层级2.3 标识符命名3. 数据类型扩展命

如何为Yarn配置国内源的详细教程

《如何为Yarn配置国内源的详细教程》在使用Yarn进行项目开发时,由于网络原因,直接使用官方源可能会导致下载速度慢或连接失败,配置国内源可以显著提高包的下载速度和稳定性,本文将详细介绍如何为Yarn... 目录一、查询当前使用的镜像源二、设置国内源1. 设置为淘宝镜像源2. 设置为其他国内源三、还原为官方

使用Python从PPT文档中提取图片和图片信息(如坐标、宽度和高度等)

《使用Python从PPT文档中提取图片和图片信息(如坐标、宽度和高度等)》PPT是一种高效的信息展示工具,广泛应用于教育、商务和设计等多个领域,PPT文档中常常包含丰富的图片内容,这些图片不仅提升了... 目录一、引言二、环境与工具三、python 提取PPT背景图片3.1 提取幻灯片背景图片3.2 提取

Android实现在线预览office文档的示例详解

《Android实现在线预览office文档的示例详解》在移动端展示在线Office文档(如Word、Excel、PPT)是一项常见需求,这篇文章为大家重点介绍了两种方案的实现方法,希望对大家有一定的... 目录一、项目概述二、相关技术知识三、实现思路3.1 方案一:WebView + Office Onl