【Hadoop】Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)中一些知识点

本文主要是介绍【Hadoop】Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)中一些知识点,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)(一)

Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)(二)

Flume Properties

Property Name

Default 

Description

flume.called.from.service

–                                                      

If this property is specified then the Flume agent will continue polling for the config file even if the config file is not found at the expected location. Otherwise, the Flume agent will terminate if the config doesn’t exist at the expected location. No property value is needed when setting this property (eg, just specifying -Dflume.called.from.service is enough)

如果这个属性被指定了,那么Flume agent会轮询配置文档即使在指定路径找不到该文档。此外,FLume agent将会结束如果配置文档不在指定位置上。这个属性不需要设置值(例如,只是指定-Dflume.called.from.service就足够了)

Property: flume.called.from.service

Flume periodically polls, every 30 seconds, for changes to the specified config file. A Flume agent loads a new configuration from the config file if either an existing file is polled for the first time, or if an existing file’s modification date has changed since the last time it was polled. Renaming or moving a file does not change its modification time. When a Flume agent polls a non-existent file then one of two things happens: 1. When the agent polls a non-existent config file for the first time, then the agent behaves according to the flume.called.from.service property. If the property is set, then the agent will continue polling (always at the same period – every 30 seconds). If the property is not set, then the agent immediately terminates. ...OR... 2. When the agent polls a non-existent config file and this is not the first time the file is polled, then the agent makes no config changes for this polling period. The agent continues polling rather than terminating.

Flume每30秒周期轮询配置文档是否改变。如果一个文档是第一次被轮询到或者在上次轮询后修改时间被改变了,那么Flume agent会加载新的配置文档。重命名和移动一个文档不会改变文档的修改时间。当一个Flume agent轮询一个不存在的文档会出现以下两种情况的一种:1. 当在指定目录下轮询不到配置文件时,agent会根据flume.called.from.service property这个属性来决定他的行为。如果这个属性设置了,那么会以30秒为周期地进行轮询;如果没设置,那么找不到就立即停止。2. 如果agent在加载过配置文件后在指定路径轮询不到文件的话,那么将不会做任何改变,然后继续轮询。

Log4J Appender(Log4J 日志存储器)

Appends Log4j events to a flume agent’s avro source. A client using this appender must have the flume-ng-sdk in the classpath (eg, flume-ng-sdk-1.8.0-SNAPSHOT.jar). Required properties are in bold.

将Log4j events 添加到一个flume agent的avro source。一个客户端想要使用这个appender必须要有 flume-ng-sdk在类路径下(例如flume-ng-sdk-1.8.0-SNAPSHOT.jar)。必须要的属性用黑体加粗。

Property Name

Default

Description

Hostname

–             

The hostname on which a remote Flume agent is running with an avro source.

运行avro source的远程Flumeagent的主机名

Port

The port at which the remote Flume agent’s avro source is listening.

远程Flume agent的avro source所监听的端口

UnsafeMode

false           

If true, the appender will not throw exceptions on failure to send the events.

如果设为真,appender将不会在发送events失败时抛出异常。

AvroReflectionEnabled

false

Use Avro Reflection to serialize Log4j events. (Do not use when users log strings)

使用 Avro反射来序列化 Log4j events。

AvroSchemaUrl

A URL from which the Avro schema can be retrieved.

一个用来恢复数据的URL,该URL是从 Avro schema来的。

Sample log4j.properties file:

#...log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppenderlog4j.appender.flume.Hostname = example.comlog4j.appender.flume.Port = 41414log4j.appender.flume.UnsafeMode = true# configure a class's logger to output to the flume appenderlog4j.logger.org.example.MyClass = DEBUG,flume#...

By default each event is converted to a string by calling toString(), or by using the Log4j layout, if specified.

If the event is an instance of org.apache.avro.generic.GenericRecord, org.apache.avro.specific.SpecificRecord, or if the property AvroReflectionEnabled  is set to true then the event will be serialized using Avro serialization.

Serializing every event with its Avro schema is inefficient, so it is good practice to provide a schema URL from which the schema can be retrieved by the downstream sink, typically the HDFS sink. If AvroSchemaUrl is not specified, then the schema will be included as a Flume header.

Sample log4j.properties file configured to use Avro serialization:

每个event默认都可以通过toString()来转换成字符串,或者有指定的话可用Log4j layout。

如果event是一个org.apache.avro.generic.GenericRecord, org.apache.avro.specific.SpecificRecord类的实例,或者它的属性AvroReflectionEnabled的值为true,那么会使用Avro serialization进行序列化。

对每个event和它的Avro schema进行序列化是低效的,所以,一个好的实践是提供一个可以从下流的sink中恢复的schemaURL,通常是HDFS sink。如果没有指定AvroSchemaUrl的话,schema会被纳入到Flume haeder。

一个使用Avro serialization的log4j属性文档的例子如下:

#...log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppenderlog4j.appender.flume.Hostname = example.comlog4j.appender.flume.Port = 41414log4j.appender.flume.AvroReflectionEnabled = truelog4j.appender.flume.AvroSchemaUrl = hdfs://namenode/path/to/schema.avsc# configure a class's logger to output to the flume appenderlog4j.logger.org.example.MyClass = DEBUG,flume#...

Load Balancing Log4J Appender

Appends Log4j events to a list of flume agent’s avro source. A client using this appender must have the flume-ng-sdk in the classpath (eg, flume-ng-sdk-1.8.0-SNAPSHOT.jar). This appender supports a round-robin and random scheme for performing the load balancing. It also supports a configurable backoff timeout so that down agents are removed temporarily from the set of hosts .Required properties are in bold.

将Log4j events 添加到一个flume agent的avro source。一个客户端想要使用这个appender必须要有 flume-ng-sdk在类路径下(例如flume-ng-sdk-1.8.0-SNAPSHOT.jar)。这个日志存储器支持一个循环和随机计划来执行负载均衡。它也支持一个可配置的退避超时以便将在冲突中被击败的agent被暂时移除。黑体字标注的属性是必须要的。

Property Name

Default

Description

Hosts

A space-separated list of host:port at which Flume (through an AvroSource) is listening for events

列出监听events的主机列表,每个host:port用空格隔开。

Selector

ROUND_ROBIN 

Selection mechanism. Must be either ROUND_ROBIN, RANDOM or custom FQDN to class that inherits from LoadBalancingSelector.

选择机制。必须从ROUND_ROBIN,RANDOM或者继承LoadBalancingSelector的自定义FQDN类。

MaxBackoff

–                       

A long value representing the maximum amount of time in milliseconds the Load balancing client will backoff from a node that has failed to consume an event. Defaults to no backoff.

这个值代表以毫秒为单位的退避超时最大值,也就是当一个节点在消费event时失效了,等待超时时间再进行重发event。默认是没有退避的

UnsafeMode

false

If true, the appender will not throw exceptions on failure to send the events.

如果设为真,appender将不会在发送events失败时抛出异常。

AvroReflectionEnabled

false

Use Avro Reflection to serialize Log4j events.

使用 Avro反射来序列化 Log4j events。

AvroSchemaUrl

A URL from which the Avro schema can be retrieved.

一个用来恢复数据的URL,该URL是从 Avro schema来的。

Sample log4j.properties file configured using defaults:

#...log4j.appender.out2 = org.apache.flume.clients.log4jappender.LoadBalancingLog4jAppenderlog4j.appender.out2.Hosts = localhost:25430 localhost:25431# configure a class's logger to output to the flume appenderlog4j.logger.org.example.MyClass = DEBUG,flume#...Sample log4j.properties file configured using RANDOM load balancing:#...log4j.appender.out2 = org.apache.flume.clients.log4jappender.LoadBalancingLog4jAppenderlog4j.appender.out2.Hosts = localhost:25430 localhost:25431log4j.appender.out2.Selector = RANDOM# configure a class's logger to output to the flume appenderlog4j.logger.org.example.MyClass = DEBUG,flume#...Sample log4j.properties file configured using backoff:#...log4j.appender.out2 = org.apache.flume.clients.log4jappender.LoadBalancingLog4jAppenderlog4j.appender.out2.Hosts = localhost:25430 localhost:25431 localhost:25432log4j.appender.out2.Selector = ROUND_ROBINlog4j.appender.out2.MaxBackoff = 30000# configure a class's logger to output to the flume appenderlog4j.logger.org.example.MyClass = DEBUG,flume#...

Security(安全性)

The HDFS sink, HBase sink, Thrift source, Thrift sink and Kite Dataset sink all support Kerberos authentication. Please refer to the corresponding sections for configuring the Kerberos-related options.

Flume agent will authenticate to the kerberos KDC as a single principal, which will be used by different components that require kerberos authentication. The principal and keytab configured for Thrift source, Thrift sink, HDFS sink, HBase sink and DataSet sink should be the same, otherwise the component will fail to start.

HDFS sink、HBase sink、Thrift source、Thrift sink和Kite Dataset sink支持Kerberos认证。请参考配置Kerberos相关选项的章节。

当agent中的不同组件需要kerberos验证,Flume agent会作为kerberos KDC验证的主体。Thrift source, Thrift sink, HDFS sink, HBase sink and DataSet sink的密钥和主体都应该相同,否则组件无法启动。

这篇关于【Hadoop】Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)中一些知识点的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1070222

相关文章

C#实现将Office文档(Word/Excel/PDF/PPT)转为Markdown格式

《C#实现将Office文档(Word/Excel/PDF/PPT)转为Markdown格式》Markdown凭借简洁的语法、优良的可读性,以及对版本控制系统的高度兼容性,逐渐成为最受欢迎的文档格式... 目录为什么要将文档转换为 Markdown 格式使用工具将 Word 文档转换为 Markdown(.

详解如何使用Python构建从数据到文档的自动化工作流

《详解如何使用Python构建从数据到文档的自动化工作流》这篇文章将通过真实工作场景拆解,为大家展示如何用Python构建自动化工作流,让工具代替人力完成这些数字苦力活,感兴趣的小伙伴可以跟随小编一起... 目录一、Excel处理:从数据搬运工到智能分析师二、PDF处理:文档工厂的智能生产线三、邮件自动化:

Python实现自动化Word文档样式复制与内容生成

《Python实现自动化Word文档样式复制与内容生成》在办公自动化领域,高效处理Word文档的样式和内容复制是一个常见需求,本文将展示如何利用Python的python-docx库实现... 目录一、为什么需要自动化 Word 文档处理二、核心功能实现:样式与表格的深度复制1. 表格复制(含样式与内容)2

Maven项目中集成数据库文档生成工具的操作步骤

《Maven项目中集成数据库文档生成工具的操作步骤》在Maven项目中,可以通过集成数据库文档生成工具来自动生成数据库文档,本文为大家整理了使用screw-maven-plugin(推荐)的完... 目录1. 添加插件配置到 pom.XML2. 配置数据库信息3. 执行生成命令4. 高级配置选项5. 注意事

Python使用python-docx实现自动化处理Word文档

《Python使用python-docx实现自动化处理Word文档》这篇文章主要为大家展示了Python如何通过代码实现段落样式复制,HTML表格转Word表格以及动态生成可定制化模板的功能,感兴趣的... 目录一、引言二、核心功能模块解析1. 段落样式与图片复制2. html表格转Word表格3. 模板生

ubuntu系统使用官方操作命令升级Dify指南

《ubuntu系统使用官方操作命令升级Dify指南》Dify支持自动化执行、日志记录和结果管理,适用于数据处理、模型训练和部署等场景,今天我们就来看看ubuntu系统中使用官方操作命令升级Dify的方... Dify 是一个基于 docker 的工作流管理工具,旨在简化机器学习和数据科学领域的多步骤工作流。

浅谈Redis Key 命名规范文档

《浅谈RedisKey命名规范文档》本文介绍了Redis键名命名规范,包括命名格式、具体规范、数据类型扩展命名、时间敏感型键名、规范总结以及实际应用示例,感兴趣的可以了解一下... 目录1. 命名格式格式模板:示例:2. 具体规范2.1 小写命名2.2 使用冒号分隔层级2.3 标识符命名3. 数据类型扩展命

使用Python从PPT文档中提取图片和图片信息(如坐标、宽度和高度等)

《使用Python从PPT文档中提取图片和图片信息(如坐标、宽度和高度等)》PPT是一种高效的信息展示工具,广泛应用于教育、商务和设计等多个领域,PPT文档中常常包含丰富的图片内容,这些图片不仅提升了... 目录一、引言二、环境与工具三、python 提取PPT背景图片3.1 提取幻灯片背景图片3.2 提取

Android实现在线预览office文档的示例详解

《Android实现在线预览office文档的示例详解》在移动端展示在线Office文档(如Word、Excel、PPT)是一项常见需求,这篇文章为大家重点介绍了两种方案的实现方法,希望对大家有一定的... 目录一、项目概述二、相关技术知识三、实现思路3.1 方案一:WebView + Office Onl

Python实现word文档内容智能提取以及合成

《Python实现word文档内容智能提取以及合成》这篇文章主要为大家详细介绍了如何使用Python实现从10个左右的docx文档中抽取内容,再调整语言风格后生成新的文档,感兴趣的小伙伴可以了解一下... 目录核心思路技术路径实现步骤阶段一:准备工作阶段二:内容提取 (python 脚本)阶段三:语言风格调