Condor手册 - 1.3 - Exceptional Features 特色功能

来源:互联网 发布:微商货源网站源码 编辑:程序博客网 时间:2024/04/30 00:51

1.3 Exceptional Features
额外特性

Checkpoint and Migration.
检查点和任务迁移

Where programs can be linked with Condor libaries, users of Condor may be assured that their jobs will eventually complete, even in the ever changing environment that Condor utilizes. As a machine running a job submitted to Condor becomes unavailable, the job can be checkpointed. The job may continue after migrating to another machine. Condor’s periodic checkpoint feature periodically checkpoints a job even in lieu of migration in order to safeguard the accumulated computation time on a job from being lost in the event of a system failure such as the machine being shutdown or a crash.
对于可以同Condor库相连接的程序,即使Condor所利用的资源环境不断发生改变,Condor用户也大可放心他们的任务最终能够完成。因为当正在运行某项任务的机器变为失效状态的时候,系统已经为这项任务建立了检查点。在迁移到另一台机器之后任务就可以继续运行。Condor的周期性检查机制会定期对任务建立检查点以备任务迁移只需,由此确保了一项任务的累积计算成果不会因为某次系统故障(比如关机或者死机)而丢失。

Remote System Calls.
远程系统调用

Despite running jobs on remote machines, the Condor standard universe execution mode preserves the local execution environment via remote system calls. Users do not have to worry about making data files available to remote workstations or even obtaining a login account on remote workstations before Condor executes their programs there. The program behaves under Condor as if it were running as the user that submitted the job on the workstation where it was originally submitted, no matter on which machine it really ends up executing on.
不管任务是不是运行在远程机器上,Condor标准运行模式都会通过远程系统调用来保持本地执行环境。用户不必担心会把数据文件开放给远程机器,也不需要在执行Condor任务的远程机器上建立任何登陆账号。Condor下的程序,不管这个任务究竟是在哪台机器上实际运行,其运行方式就好像它一直是以提交该任务的用户身份在提交该任务的原始机器上运行一样。

No Changes Necessary to User’s Source Code.
无需更改用户的源代码

No special programming is required to use Condor. Condor is able to run non-interactive programs. The checkpoint and migration of programs by Condor is transparent and automatic, as is the use of remote system calls. If these facilities are desired, the user only re-links the program. The code is neither recompiled nor changed.
应用Condor不需要负担额外的编程。Condor可以运行非交互式程序。由Condor对程序实施的检查和迁移,与使用远程系统调用一样,都是透明和自动的。用户只要重连接程序即可获得这些便利,而代码则无需重编译或更改。

Pools of Machines can be Hooked Together.
机群可以挂接在一起

Flocking is a feature of Condor that allows jobs submitted within a first pool of Condor machines to execute on a second pool. The mechanism is flexible, following requests from the job submission, while allowing the second pool, or a subset of machines within the second pool to set policies over the conditions under which jobs are executed.
聚结是Condor的一项独特功能,它允许任务在Condor的第一个机群中被提交而在第二个机群中去执行。这种机制很灵活,根据任务提交中的相关请求,系统允许第二个机群,或者第二个机群内的某个机器子集来运行任务。

Jobs can be Ordered.
任务可以被排序

The ordering of job execution required by dependencies among jobs in a set is easily handled. The set of jobs is specified using a directed acyclic graph, where each job is a node in the graph. Jobs are submitted to Condor following the dependencies given by the graph.
由任务间的依赖关系所取定的任务执行顺序是容易控制的。任务集是通过一个有向无环图来定义的,每个任务就是图中的一个节点。任务也是依照图中所给出的前后依赖顺序提交给Condor的。

Condor Enables Grid Computing.
支持网格计算

As grid computing becomes a reality, Condor is already there. The technique of glidein allows jobs submitted to Condor to be executed on grid machines in various locations worldwide. As the details of grid computing evolve, so does Condor’s ability, starting with Globus-controlled resources.
因为网格计算已经成为现实,而Condor又是现成的平台。所以glidein技术能够把提交给Condor的任务放到位于世界各地的分散网格机器上执行。随着网格技算的发展,以Globus-controlled资源为开端,Condor的功能也得以提升。

Sensitive to the Desires of Machine Owners.
优先满足机器拥有者的意愿

The owner of a machine has complete priority over the use of the machine. An owner is generally happy to let others compute on the machine while it is idle, but wants it back promptly upon returning. The owner does not want to take special action to regain control. Condor handles this automatically.
机器所有者对于机器的使用拥有绝对优先权。一个拥有者通常乐意把机器的空闲时间贡献给其它计算,只要在他回来的时候能立刻收回使用权。拥有者也不喜欢采用特殊的步骤才能收回控制权。Condor对此能够自动控制。

ClassAds.
分类广告

The ClassAd mechanism in Condor provides an extremely flexible, expressive framework for matchmaking resource requests with resource offers. Users can easily request both job requirements and job desires. For example, a user can require that a job run on a machine with 64 Mbytes of RAM, but state a preference for 128 Mbytes, if available. A workstation owner can state a preference that the workstation runs jobs from a specified set of users. The owner can also require that there be no interactive workstation activity detectable at certain hours before Condor could start a job. Job requirements/preferences and resource availability constraints can be described in terms of powerful expressions, resulting in Condor’s adaptation to nearly any desired policy.
Condor中的ClassAd机制提供了一个极为灵活并且表达能力超强的框架用以对资源申请和资源提供者进行匹配。用户可以很容易的申明任务的最低和理想需求。例如,用户可以要求某项任务运行在带有64兆RAM的机器上,但是可能的话,最好使用128兆的机器。而一个工作站拥有者则可以声明所属机器优先运行来自特定用户群的任务。拥有者还可以要求在若干个小时内都没有交互式动作的情况下Condor才可以启动任务。任务的需求/参数以及资源可用性约束都能通过功能强大的表达式加以描述,由此Condor几乎能胜任任何目标描述。

原创粉丝点击