Nifi

来源:互联网 发布:淘宝账号怎么升级心 编辑:程序博客网 时间:2024/06/07 01:19

When creating a Processor, the developer is able to provide hints to the framework about how to utilize the Processor most effectively. This is done by applying annotations to the Processor’s class. The annotations that can be applied to a Processor exist in three sub-packages of org.apache.nifi.annotations. Those in the documentation sub-package are used to provide documentation to the user. Those in the lifecycle sub-package instruct the framework which methods should be called on the Processor in order to respond to the appropriate life-cycle events. Those in the behavior package help the framework understand how to interact with the Processor in terms of scheduling and general behavior.
在创建Processor时,框架需要开发者告知如何才能高效地使用该处理器。这部分的工作是通过在Processor内部使用注解实现的,能够运用于Processor的注解位于org.apache.nifi.annotations中的一个子包内。其他的注解是用于实现Processor类的生命周期类事件。behavior包内的这些东西是为了帮助框架去理解如何在调度和一般行为方面与处理器交互。
The following annotations from the org.apache.nifi.annotations.behavior package can be used to modify how the framework will handle your Processor:
以下的标注目的是帮助框架了解如何去使用你的Processor
EventDriven: Instructs the framework that the Processor can be scheduled using the Event-Driven scheduling strategy. This strategy is still experimental at this point, but can result in reduced resource utilization on dataflows that do not handle extremely high data rates.
@EventDriven:提示框架这个处理器能够使用事件驱动的调度策略,这个调度策略暂时还是在试验阶段,但可以减少低的数据传输速率的数据流的资源需要。

SideEffectFree: Indicates that the Processor does not have any side effects external to NiFi. As a result, the framework is free to invoke the Processor many times with the same input without causing any unexpected results to occur. This implies idempotent behavior. This can be used by the framework to improve efficiency by performing actions such as transferring a ProcessSession from one Processor to another, such that if a problem occurs many Processors’ actions can be rolled back and performed again.
@SideEffectFree:提示框架这个处理器对Nifi没有任何的副作用,框架可以多次无压力地调用与启动Processor,只要输入的数据相同,得到的结果就会一定是相同的,而不会导致任何不可预料的影响。这意味着所谓的幂等效应。可以由框架用于执行操作如从一个处理器到另一个传输process session以提高效率,这样如果出现问题,很多处理器的行动可以被回滚并再次执行。

SupportsBatching: This annotation indicates that it is okay for the framework to batch together multiple ProcessSession commits into a single commit. If this annotation is present, the user will be able to choose whether they prefer high throughput or lower latency in the Processor’s Scheduling tab. This annotation should be applied to most Processors, but it comes with a caveat: if the Processor calls ProcessSession.commit, there is no guarantee that the data has been safely stored in NiFi’s Content, FlowFile, and Provenance Repositories. As a result, it is not appropriate for those Processors that receive data from an external source, commit the session, and then delete the remote data or confirm a transaction with a remote resource.
@SupportsBatching:该标注代表框架可以批量处理多个ProcessorSession。如果此注释存在,用户将能够选择他们高吞吐量或更低的延迟。该标注可以被应用于大多数处理器。
TriggerSerially: When this annotation is present, the framework will not allow the user to schedule more than one concurrent thread to execute the onTrigger method at a time. Instead, the number of thread (“Concurrent Tasks”) will always be set to 1. This does not, however, mean that the Processor does not have to be thread-safe, as the thread that is executing onTrigger may change between invocations.

TriggerWhenAnyDestinationAvailable: By default, NiFi will not schedule a Processor to run if any of its outbound queues is full. This allows back-pressure to be applied all the way a chain of Processors. However, some Processors may need to run even if one of the outbound queues is full. This annotations indicates that the Processor should run if any Relationship is “available.” A Relationship is said to be “available” if none of the connections that use that Relationship is full. For example, the DistributeLoad Processor makes use of this annotation. If the “round robin” scheduling strategy is used, the Processor will not run if any outbound queue is full. However, if the “next available” scheduling strategy is used, the Processor will run if any Relationship at all is available and will route FlowFiles only to those relationships that are available.

TriggerWhenEmpty: The default behavior is to trigger a Processor to run only if its input queue has at least one FlowFile or if the Processor has no input queues (which is typical of a “source” Processor). Applying this annotation will cause the framework to ignore the size of the input queues and trigger the Processor regardless of whether or not there is any data on an input queue. This is useful, for example, if the Processor needs to be triggered to run periodically to time out a network connection.

InputRequirement: By default, all Processors will allow users to create incoming connections for the Processor, but if the user does not create an incoming connection, the Processor is still valid and can be scheduled to run. For Processors that are expected to be used as a “Source Processor,” though, this can be confusing to the user, and the user may attempt to send FlowFiles to that Processor, only for the FlowFiles to queue up without being processed. Conversely, if the Processor expects incoming FlowFiles but does not have an input queue, the Processor will be scheduled to run but will perform no work, as it will receive no FlowFile, and this leads to confusion as well. As a result, we can use the @InputRequirement annotation and provide it a value of INPUT_REQUIRED, INPUT_ALLOWED, or INPUT_FORBIDDEN. This provides information to the framework about when the Processor should be made invalid, or whether or not the user should even be able to draw a Connection to the Processor. For instance, if a Processor is annotated with InputRequirement(Requirement.INPUT_FORBIDDEN), then the user will not even be able to create a Connection with that Processor as the destination.