NWChem 6.1.1 CCSD(T) parallel running
来源:互联网 发布:天猫魔盒上的直播软件 编辑:程序博客网 时间:2024/05/24 07:31
http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id887/NWChem_6.1.1_CCSD%28T%29_parallel_ru....html
Hi I trying to running NWchem 6.1.1 in a cluster, I compiled NWChem in my local user directory, Here are the environment variables I used to compile :export NWCHEM_TOP="/home/diego/Software/NWchem/nwchem-6.1.1"export TARGET=LINUX64export LARGE_FILES=TRUEexport ENABLE_COMPONENT=yesexport TCGRSH=/usr/bin/sshexport NWCHEM_TARGET=LINUX64export NWCHEM_MODULES="all python"export LIB_DEFINES="-DDFLT_TOT_MEM=16777216"export USE_MPI=yexport USE_MPIF=yexport USE_MPIF4=yexport IB_HOME=/usrexport IB_INCLUDE=$IB_HOME/include/infinibandexport IB_LIB=$IB_HOME/lib64export IB_LIB_NAME="-libumad -libverbs -lpthread -lrt"export ARMCI_NETWORK=OPENIBexport MKLROOT="/opt/intel/mkl"export MKL_INCLUDE=$MKLROOT/include/intel64/ilp64export BLAS_LIB="-L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm"export BLASOPT="$BLAS_LIB"export BLAS_SIZE=8export SCALAPACK_SIZE=8export SCALAPACK="-L$MKLROOT/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm"export SCALAPACK_LIB="$SCALAPACK"export USE_SCALAPACK=yexport MPI_HOME=/opt/intel/impi/4.0.3.008export MPI_LOC=$MPI_HOMEexport MPI_LIB=$MPI_LOC/lib64export MPI_INCLUDE=$MPI_LOC/include64export LIBMPI="-lmpigf -lmpigi -lmpi_ilp64 -lmpi"export CXX=/opt/intel/bin/icpcexport CC=/opt/intel/bin/iccexport FC=/opt/intel/bin/ifortexport PYTHONPATH="/usr"export PYTHONHOME="/usr"export PYTHONVERSION="2.6"export USE_PYTHON64=yexport PYTHONLIBTYPE=soexport MPICXX=$MPI_LOC/bin/mpiicpcexport MPICC=$MPI_LOC/bin/mpiiccexport MPIF77=$MPI_LOC/bin/mpiifort
input file :
startmemory global 1000 mb heap 100 mb stack 600 mb title "ZrB10 CCSD(T) single point"echo scratch_dir /scratch/userscharge -1geometry units angstromZr 0.00001 -0.00002 0.12043B 2.46109 0.44546 -0.10200B 2.25583 -1.07189 -0.09994B 1.19305 -2.20969 -0.10354B -0.32926 -2.46629 -0.09796B -1.72755 -1.82109 -0.10493B -2.46111 -0.44543 -0.10198B -2.25583 1.07193 -0.09983B -1.19306 2.20972 -0.10337B 0.32924 2.46632 -0.09779B 1.72753 1.82112 -0.10485endscf DOUBLET; UHFTHRESH 1.0e-10TOL2E 1.0e-8maxiter 200end tce ccsd(t) maxiter 200 freeze atomicend basisZr library def2-tzvp B library def2-tzvpendecpZr library def2-ecpendtask tce energy
pbs submit file:
c#!/bin/bash#PBS -N ZrB10_UHF#PBS -l nodes=10:ppn=16#PBS -q CABIN=/home/diego/Software/NWchem/nwchem-6.1.1/bin/LINUX64source /opt/intel/impi/4.0.3.008/bin/mpivars.shsource /home/diego/Software/NWchem/varsexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/composer_xe_2011_sp1.6.233/mkl/lib/intel64:/opt/intel/impi/4.0.3/intel64/lib#ulimit -s unlimited#ulimit -d unlimited#ulimit -l unlimited#ulimit -n 32767 export ARMCI_DEFAULT_SHMMAX=8000#export MA_USE_ARMCI_MEM=TRUEcd $PBS_O_WORKDIRNP=`(wc -l < $PBS_NODEFILE) | awk '{print $1}'`cat $PBS_NODEFILE |sort|uniq> mpd.hoststime mpirun -f mpd.hosts -np $NP $BIN/nwchem ZrB10.nw > ZrB10.logexit 0
memory for procesador 2GB of RAM, in 16 proc with 32GB of RAM in 10 nodes
and other ticks :
kernel.shmmax = 68719476736
the file error is
Last System Error Message from Task 32:: Cannot allocate memory
(rank:32 hostname:node32 pid:27391):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_pin_contig_hndl():1142 cond:(memhdl->memhndl!=((void *)0))
Varying stack, heap or global and ARMCI_DEFAULT_SHMMAX does not really change anything (if I set them low, then another error occurs). Setting MA_USE_ARMCI_MEM = y/n does not have any effect.
ldd /home/diego/Software/NWchem/nwchem-6.1.1/bin/LINUX64/nwchem :
linux-vdso.so.1 => (0x00007ffff7ffe000) libpython2.6.so.1.0 => /usr/lib64/libpython2.6.so.1.0 (0x0000003f3aa00000) libmkl_scalapack_ilp64.so => not found libmkl_intel_ilp64.so => not found libmkl_sequential.so => not found libmkl_core.so => not found libmkl_blacs_intelmpi_ilp64.so => not found libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003f39200000) libm.so.6 => /lib64/libm.so.6 (0x0000003f38600000) libmpigf.so.4 => not found libmpi_ilp64.so.4 => not found libmpi.so.4 => not found libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x000000308aa00000) libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x000000308a600000) librt.so.1 => /lib64/librt.so.1 (0x0000003f39a00000) libutil.so.1 => /lib64/libutil.so.1 (0x0000003f3c200000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003f38e00000) libc.so.6 => /lib64/libc.so.6 (0x0000003f38a00000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ffff7dce000) /lib64/ld-linux-x86-64.so.2 (0x0000003f38200000)
So what could be the reason for the failure? Any help would be appreciated.
Diego
- EdoapraForum:Admin, Forum:Mod, bureaucrat, sysop
Forum Vet
Threads 1Posts 30812:34:55 PM PDT - Fri, Jul 19th 2013 Diego
I have managed to get this input working on a Infiniband cluster using NWChem 6.3.
Here is some details of what I have done on a run using 224 processors(16 processors on each one of the 14 nodes)
1) Increased global memory input line to 1.6GB
memory global 1600 mb heap 100 mb stack 600 mb
2) Set ARMCI_DEFAULT_SHMMAX=8192
3) You need to have the system administrators to modify some of the kernel driver options for your Infiniband Hardware
Here are some webpages related to this very topic
http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
http://community.mellanox.com/docs/DOC-1120
In my case, the cluster I am using has the following parameter for the mlx4_core driver (but older
hardware might require different setting, as mentioned in the two webpages above)
log_num_mtt=20
log_mtts_per_seg=4
- NWChem 6.1.1 CCSD(T) parallel running
- Running NWCHEM
- NWChem
- NWChem 6.1.1 RedHat 5.4, Intel 11.1, OpenMPI 1.3.4
- Parallel program running in dual-core environment
- Long running JOB due to parallel execution
- Compiling NWChem
- nwchem 教程
- NWChem Planning
- memory problem in parallel running "ARMCI DASSERT fail"
- Architecting Parallel Applications(1)
- 并行编程(1):Parallel
- Parallel&Distributed Algorithm-1
- Parallel
- Wireshark: The NPF driver isn’t running
- Wireshark: The NPF driver isn’t running
- Wireshark“The NPF driver isn’t running”
- Wireshark:The NPF diriver isn't running.
- C++ 标准输出如何控制小数点后位数
- [leetcode] Permutations II
- VS2010使用GooglTest,GoogleMock
- 为什么要遵循统一的函数连接规范?什么事连接规范
- SMS Call---发送短信和打电话的方法
- NWChem 6.1.1 CCSD(T) parallel running
- Centos6.4下安装Boost1.51
- hdu 1203 I NEED A OFFER!
- Centos下安装Log4cxx
- HTML入门教程 - 3.页面标题(Titles of each pages)
- [leetcode] Anagrams
- vs2010使用gmock
- hdu 2093
- 网页自动刷新与自动跳转