The Role of Delegation Tokens in Apache Hadoop Security
来源:互联网 发布:商业源码 编辑:程序博客网 时间:2024/05/14 21:28
Delegation tokens play a critical part in Apache Hadoop security, and understanding their design and use is important for comprehending Hadoop’s security model.
Authentication in Apache Hadoop
Apache Hadoop provides strong authentication for HDFS data. All HDFS accesses must be authenticated:
1. Access from users logged in on cluster gateways
2. Access from any other service or daemon (e.g. HCatalog server)
3. Access from MapReduce tasks
Hadoop relies on Kerberos, a three party authentication protocol, to do the authentication for #1 and #2 above. Users and services use their Kerberos credentials to talk securely with the NameNode. For #3, one of the conscious decisions was not to use Kerberos. I won’t cover all of the reasons but they had to do with secure delegation of credentials, performance and reliability. A technical report on security design in Apache Hadoop will be published soon with more detail.
Delegation Tokens
Delegation token authentication is a two-party authentication protocol based on Java SASL Digest-MD5. The token is obtained during job submissions and submitted to the JobTracker as part of the job submission. The typical steps are:
1. User authenticates herself to the JobTracker using Kerberos.
2. User authenticates herself (using Kerberos) to the NameNode(s) that the tasks would interact with at runtime. User then gets a delegation token from each of the NameNodes.
3. User passes the tokens to the JobTracker as part of the job submission.
All TaskTrackers running the jobs’ tasks get a copy of the tokens (via an HDFS location that is private to the userthat the MapReduce daemons runs as). The tokens are written to a file in a private area (visible to the job-owner user) on the TaskTracker machine.
As part of launching the task, the TaskTracker exports the location of the token file as an environment variable. The task process loads the tokens in memory (the file is read as a part of the static initialization of the UserGroupInformation class). This information is useful for the RPC client.
In the mode where security is enabled, the Apache Hadoop RPC client can talk securely with a server using either tokens or Kerberos. The RPC client is programmed in such a way that if a token exists for a service, it will be used for secure communication. If a token doesn’t exist, Kerberos is used.
Any Apache Hadoop process that is launched from the task (for example, a Hadoop Streaming process running a standard HDFS CLI command) can get access to those tokens since the environment variable is also visible to the child processes. Using the tokens, these processes can transparently authenticate themselves with the NameNodes (since the CLI implementation uses the standard security aware UserGroupInformation and RPC client classes).
Lifecycle of a Token
A token has a current life, and a maximum renewable life (similar to Kerberos tickets). By default, tokens must be renewed once every 24 hours for up to 7 days. Tokens can also be cancelled explicitly.
The NameNode permits only designated renewers to do the renewal. For MapReduce jobs, the renewer is specified as the JobTracker. The JobTracker keeps track of tokens’ lifetimes and when they are about to expire, the JobTracker renews them with the respective NameNodes. When a job is done, JobTracker cancels the tokens associated with the job.
Other Uses of Tokens
Token has uses beyond HDFS. The token abstractions are fairly open. Thus, the token-related classes could easily be reused and overridden for various other uses. For example, the Token selector can be defined on a per token-type basis. A selector is associated with an RPC protocol via an annotation (as an example, ClientProtocol is annotated with DelegationTokenSelector). The selector usually has logic that looks at all the passed tokens and returns one that is meant for the protocol/service in question. Every token contains information to indicate its intended service (IP-address:port, or, some application defined data appropriate to the context in which the token is used).
One use case outside HDFS of delegation tokens is the MapReduce delegation Token case. For jobs that in turn submit other jobs (for example, the launcher job in Oozie), the tasks of the first job need to talk securely with the JobTracker. Oozie uses MapReduce delegation tokens for this authentication. Before the launcher job is submitted, a request is made to the JobTracker for a MapReduce delegation token, and that token is added to the list of tokens passed to the JobTracker during the launcher job submission.
Another use case is that of HCatalog. MapReduce tasks that use the HCatalog service requires tokens (issued by the HCatalog server during job submission) to talk securely with the HCatalog server.
Some Notes on Issuing Delegation Tokens
The delegation tokens should be issued on Kerberos authenticated channels only. The problem that we are trying to avert is this: if a delegation token is compromised, the compromised delegation token can then be used to get more delegation tokens. Plus, the malicious user can stay connected to the server for a long period of time.
For services such as Oozie that act on behalf of other users, more security checks are in place in Apache Hadoop. An Oozie request for a delegation token is honored only when Oozie asks a delegation token for a user belonging to one of a certain set of user-groups. Furthermore, the Hadoop service checks whether the request for delegation token came from a host that it trusts. Both these can be configured in Apache Hadoop.
Conclusion
Delegation tokens are a fundamental building block of the Apache Hadoop Security design and architecture. The design of delegation token abstractions that are defined in Apache Hadoop have proven to be robust and easily extensible to use cases such as HDFS, MapReduce, HCatalog.
Use of delegation tokens in the core Hadoop infrastructure has helped to both reduce Kerberos traffic significantly (and thereby putting less load on the Kerberos infrastructure) and keep the performance regressions of MapReduce jobs under control (because token-based authentication, which accounts for the majority of the authentication in MapReduce jobs, is a much simpler 2-party authentication as compared to 3-party authentication found in Kerberos).
I should note that Kan Zhang, who was a part of the original Apache Hadoop security development team, was responsible for coming up with the delegation token concept.
原文地址:http://hortonworks.com/blog/the-role-of-delegation-tokens-in-apache-hadoop-security/
- The Role of Delegation Tokens in Apache Hadoop Security
- OpenWRT in the role of upstream
- Significant Changes in the Role of Women
- the advantages of event delegation
- The role of Roles
- The role of JNDI in J2EE(FROM IBM)
- Chapter 1 The Role of Algorithms in Computing
- 论文:The Role of Emotions in Context-aware Recommendation总结
- 算法导论 CH01 The Role of Algorithms in Computing
- 算法导论 CH01 The Role of Algorithms in Computing
- the role of project manager
- Note of big data dummies:Understanding the role of relational databases in big data
- Note of big data dummies:Understanding the role of a CMS in big data management
- Introduction to Algorithm - Summary of Chapter 1 - The Role of Algorithm in Computing
- Matthew Mcconaughey 's Speech in Oscar for the Award of Actor in a Leading Role
- The Next Generation of Apache Hadoop MapReduce
- The Next Generation of Apache Hadoop MapReduce
- Wikipedia translation-Sytra-The Invisible in Translation The Role of Text Structure
- [处男作]SpriteSheetViewer 动画查看器V1.0发布!
- 利用闪回query 恢复删除的数据及存储过程
- 策略模式-鸭子的变化
- 如何收集EBS SOA Log
- 合成析构函数与虚析构函数
- The Role of Delegation Tokens in Apache Hadoop Security
- 零下一度的青春
- AS3使用过程中问题总结
- 我的第一个RootKit,支持XP、Vista、Win7、Win8 RTM 32位
- 一度青春
- hdu(1418) 抱歉
- 文件上传下载常用测试点
- Android4.2 Settings默认值的浅析(ubootenv.var.has.accelerometer )
- 大风过后