MPI 并行计算出现的问题!!!(已解决)

来源:互联网 发布:网络销售授权书范本 编辑:程序博客网 时间:2024/06/05 07:44
he@yuanhe:~/nfs_he$ mpirun -f nodes -n 3 ./example1
rank :0 ,source: -1 ,dest: 1
rank :2 ,source: 1 ,dest: 0
Fatal error in MPI_Send: Unknown error class, error stack:
MPI_Send(174)..............: MPI_Send(buf=0x7ffd4cc4db30, count=5, MPI_INT, dest=1, tag=5, MPI_COMM_WORLD) failed
MPID_nem_tcp_connpoll(1832): Communication error with rank 1: Connection refused


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 18691 RUNNING AT yuanhe
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1@centos] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:1@centos] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1@centos] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@yuanhe] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@yuanhe] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@yuanhe] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion

[mpiexec@yuanhe] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion


example1.c


#include <stdio.h>
#include "mpi.h"
const int COUNT = 5;
int
main (int argc, char *argv[])
{
  MPI_Status status;
  int tag = 5, size, rank;
  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);
  MPI_Comm_size (MPI_COMM_WORLD, &size);
  int A[5], B[5], C[5];
  int i;
  for (i = 0; i < COUNT; i++)
    {
      A[i] = 2;
      B[i] = 0;
      C[i] = 0;
    }
  int source = (rank - 1) % size;
  int dest = (rank + 1) % size;
  printf ("rank :%d ,source: %d ,dest: %d\n", rank, source, dest);
  MPI_Send (A, COUNT, MPI_INT, dest, tag, MPI_COMM_WORLD);
  if (rank != 0)
    MPI_Recv (B, COUNT, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
  int sum = 0;
  for (i = 0; i < COUNT; i++)
    {
      C[i] = A[i] + B[i];
      sum += C[i];
    }
  int ans = 0;
  MPI_Reduce (&sum, &ans, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
  if (rank == 0)
    printf ("the sum of numbers is %d\n", ans);
  MPI_Finalize ();
  return 0;
}

原因是:主机名没有配置正确。由于SSH不能达到本文中所说的”在列表中的每台机器上面都可以不用输入密码地SSH到列表中的所有机器上面,包括本机(localhost)” 而导致的。

你要做到在yuanhe机器上ssh yuanhe 和ssh centos都能无密码登陆

在centos机器上ssh yuanhe 和ssh centos 都能无密码登陆才行。

在主机yuanhe上

sudo vim /etc/hosts

注释掉127.0.0.1 yuanhe  留着127.0.0.1 localhost

在主机centos上

sudo vim /etc/hosts

注释掉127.0.0.1 centos  留着127.0.0.1 localhost



0 0
原创粉丝点击