mpirun mpd mpiexec

来源：互联网发布：外贸邮件软件小满编辑：程序博客网时间：2024/05/22 02:58

原文地址：mpirun mpd mpiexec作者：枝叶飞扬

2.安装配置文件并且进行设置

  运行命令1：touch mpd.conf
   运行命令2：chmod 600 mpd.conf
   在mpd.conf文件中输入以下文本内容并保存：
   MPD_SECRETWORD=mr.chen

3.开启mpi服务器并且进行编译执行mpi文件
   3.1 开启mpi环境：mpdboot
   3.2 编译mpi文件(-o Hello 指定输出文件的名称)：mpicc -o Hello Hello.c
   3.3 执行生成的二进制文件(-np 4:表示用4个进程)：mpirun -np 4 ./Hello
   运行结果如下:
   user@ubuntu:~/test_mpi_examples$ mpirun -np 4 ./Hello
Hello world! Processor 0 of 4 on ubuntu
Hello world! Processor 1 of 4 on ubuntu
Hello world! Processor 3 of 4 on ubuntu
Hello world! Processor 2 of 4 on ubuntu

4.关闭mpi服务器
   运行命令：mpdcleanup

6 修改PATH环境变量

PATH="$PATH:/usr/local/mpich/bin"

7 测试环境变量设置

#which mpd

#which mpicc

#which mpiexec

#which mpirun

所有以上的命令都应该指向安装目录的bin子目录。此外，如果没有用NFS来共享安装目录，则需要将bin子目录拷贝到其他每台机器上。

8 修改/etc/mpd.conf文件，内容为secretword=myword

#vi /etc/mpd.conf

设置文件读取权限为只有自己能读写

#chmod 600 /etc/mpd.conf

非root用户在家目录创建内容相同的.mpd.conf

9 创建主机名称集合文件/root/mpd.hosts

#vi mpd.hosts

文件内容如下：

station1

station3

station6

station8

使用

MPICH2采用mpd服务来对进程进行管理，使用mpiexec运行mpi程序。

MPD

启动单机上的mpd服务

# mpd &

查看mpd服务

# mpdtrace 查看主机名

或

#mpdtrace –l 查看主机名和端口号

关闭mpd进程管理

#mpdallexit

启动集群上的mpd服务

# mpdboot –n process-num –f mpd.hosts

启动 process-num个进程，mpd.hosts是前面创建的文件。

mpich2默认是使用ssh来登陆集群里的其他机器，也可以使用rsh来登陆集群里的其他机器来启动mpd服务

只要使用-rsh选项就可以指定用ssh或rsh

#mpdboot -rsh=rsh –n process-num –f hostfile

或#mpdboot -rsh=ssh –n process-num –f hostfile

关闭mpd服务

#mpdallexit

mpiexec

使用MPIEXEC来执行mpi任务

#mpiexec –np 3 ./cpi

或mpiexec –machinefile filename –np 4 ./cpi

1、MPICH2 1.0.3中之所以出现了mpd这样的东西，MPICH的开发者声称这是将mpi程序的通讯和计算分开。在MPICH1中，我们直接用mpirun来执行一个任务，此时，要先用rsh这些东西通讯，然后再启动进程，对出错调试、程序启动速度等都有影响。所以，在MPICH2中，将通讯这部分单独做出来，那就是mpd了，而且用Python书写，简单易懂，从而解决上面的问题。

2、MPICH2推荐用mpiexec来执行任务而不是mpirun，因为的确，mpiexec相比mpirun，有了很多实用的feature，比如：

mpiexec -n 1 -host loginnode master : -n 32 -host smp slave

mpiexec 可以针对不同的节点，做不同的任务发布策略。上述命令可以看到，我们要在loginnode上发布一个进程，该进程的可执行文件是 master，并且在一个名为smp的机器上发布32个进程（很显然这台hostname为smp的机器是一台32个CPU的机器）。不同的参数之间用冒号隔开就OK了。

3、通过上面的例子，就可以看到，mpiexec中可以针对节点做很多定制。比如-n, -path,-wdir, -host, -file, -configfile这些参数都是常用的，比如-wdir就可以定制working directory。而且，mpiexec还支持环境变量的定制，这是非常实用的，也就是说，可以针对不同的进程，给他们定义不同的环境变量列表。比如这个例子：

mpiexec -n 1 -env FOO BAR a.out : -n 2 -env BAZZ FAZZ b.out

可以看到，我们给a.out定义了一个FOO的环境变量（值为BAR），给b.out定义了一个BAZZ

mpiexec -genv FOO BAR -n 2 a.out : -n 4 b.out

使用genv（global env），可以让这个环境变量在所有进程中生效。

使用-envall, -genvall这两个option可以将执行mpiexec这个程序所在机器上的环境变量列表发布到指定的进程中去（全部进程中去）。

使用-envnone, -genvnone可以将指定进程（全部进程）的环境变量都清空。这样可以让我们定义一个干净的环境变量列表，如：

mpiexec -genvnone -env FOO BAR -n 50 a.out

这样，a.out进程的环境变量就只有一个FOO的环境变量。

mpiexec -genvnone -envlist PATH,LD_SEARCH_PATH -n 50 a.out

这个例子说明，将mpiexec所在机器上的PATH, LD_SEARCH_PATH这两个环境变量发布到a.out进程上，其他的环境变量都清空。

这个东西还是很有用的，特别是LD_LIBRARY_PATH这个环境变量，很可能要根据不同的进程来定制。

4、mpiexec还有很多实用的选项，比如：

-l: provides rank labels for lines of stdout and stderr.

这样很利于我们调试，因为在stdout和stderr的信息中，就可以看到这个信息是哪个进程print出来的。

-machinefile:

mpiexec中的machinefile和MPICH1中的machinefile相比，有一点增强，这里有个例子：

# comment line
hosta
hostb:2
hostc ifhn=hostc-gige
hostd:4 ifhn=hostd-gige

可以看到，不仅可以定义machine，还可以定义用什么网络界面，这对多个网段的cluster是有好处的。mpiexec发布任务的时候，会根据这个列表挨个发布。

-s: can be used to direct the stdin of mpiexec to specific processes in a parallel job.

这个也是很实用的选项。这个选项可以让我们在执行mpiexec的机器上做输入，然后把这些输入送到指定的进程中去，很有用。

mpiexec -s all -n 5 a.out # 把输入传给所有的进程

mpiexec -s 4 -n 5 a.out # 输入传给rank为4的进程

mpiexec -s 1,3 -n 5 a.out # 传给1,3进程

mpiexec -s 0-3 -n 5 a.out #传给0, 1, 2, 3进程

mpd 提供了其他一些比较实用的命令，如mpdsigjob，这个命令可以让我们对进程发出signal，这就比我们以前在MPICH1的时候，要终止一个进程，狂按ctrl+C好多了，这就是将通讯单独做成一个模块的好处啊。现在我们可以用 mpdsigjob，对一个或一批进程发布指定的signal，如SIGINT就相当于ctrl+C哦。

5、 MPICH2支持用gdb调试并行程序了。不过不知道支持的如何，可以看user guide中的这个例子，可以看到，在gdb调试的过程中，会显示出当前执行的这个代码将在哪些进程中执行。而且，通过使用z命令，可以进入到指定的单个进程中进行调试。设置断点、单步执行也都是availalable的。

6、最后是一些FAQ，这里摘录一些有价值的：

（1）What is the difference between mpd & smpd process manager?

MPD is the default process manager for MPICH2 on Unix platforms. It is written in Python. SMPD is the primary process manager for MPICH2 on Windows. It is also used for running on a combination of Windows and Linux machines. It is written in C.

（2）When I use the g95 Fortran compiler on a 64-bit platform, some of the tests fail

A: The g95 compiler incorrectly defines the default Fortran integer as a 64-bit integer while defining Fortran reals as 32-bit values (the Fortran standard requires that INTEGER and REAL be the same size). This was apparently done to allow a Fortran INTEGER to hold the value of a pointer, rather than requiring the programmer to select an INTEGER of a suitable KIND. To force the g95 compiler to correctly implement the Fortran standard, use the -i4 flag. For example, set the environment variable F90FLAGS before configuring MPICH2:

setenv F90FLAGS "-i4"

G95 users should note that there (at this writing) are two distributions of g95 for 64-bit Linux platforms. One uses 32-bit integers and reals (and conforms to the Fortran standard) and one uses 32-bit integers and 64-bit reals. We recommend using the one that conforms to the standard (note that the standard specifies the ratio of sizes, not the absolute sizes, so a Fortran 95 compiler that used 64 bits for both INTEGER and REAL would also conform to the Fortran standard. However, such a compiler would need to use 128 bits for DOUBLE PRECISION quantities).

（3）Q: How do I pass environment variables to the processes of my parallel program when using the mpd process manager?

A: By default, all the environment variables in the shell where mpiexec is run are passed to all processes of the application program. (The one exception is LD LIBRARY PATH when the mpd’s are being run as root.) This default can be overridden in many ways, and individual environment variables can be passed to specific processes using arguments to mpiexec.

注意哦，上面说过了，mpiexec可以定制环境变量，不过这里提到了，LD_LIBRARY_PATH在root启动的mpd ring里面有例外哦。

0 0