memory problem in parallel running "ARMCI DASSERT fail"
来源:互联网 发布:淘宝直播加入要钱吗 编辑:程序博客网 时间:2024/06/03 16:42
memory problem in parallel running "ARMCI DASSERT fail"
memory problem in parallel running "ARMCI DASSERT fail"
Viewed 1309 times, With a total of 9 Posts
- Credits
- Search
- Today's Posts
- PsdMember
Clicked A Few Times
Threads 2Posts 811:48:27 PM PDT - Thu, Nov 1st 2012 I did series of Coupled Cluster testing calculations of FH2 using two nodes connected with infiniband, with 12 cores and 48GB memory per node.
For rather small basis and tasks, such as UCCSDT/avdz and UCCSD/vtz, everything is OK, but for tasks requiring more than 500MB memory occurs the problem.
The input file is :
start fh2scratch_dir ./tmpmemory heap 300 mb stack 300 mb global 3000 mbgeometry units au H -0.466571969 0.000000000 -3.498280516 H 0.624505061 0.000000000 -2.532671944 F -0.008378972 0.000000000 0.319965748endbasis noprint * library cc-pvdz # or aug-cc-pvdz or othersendSCF semidirect DOUBLET UHF THRESH 1.0e-10 TOL2E 1.0e-10ENDTCE SCF CCSD # or CCSDT or CCSDTQENDTASK TCE ENERGY
When I do UCCSD/avtz calculations, the Hartree Fock part is OK, but terminated at CC as below:
Memory Information ------------------ Available GA space size is 9437161950 doubles Available MA space size is 78639421 doubles Maximum block size 36 doubles tile_dim = 35 Block Spin Irrep Size Offset Alpha ------------------------------------------------- 1 alpha a' 5 doubles 0 1 2 alpha a" 1 doubles 5 2 3 beta a' 4 doubles 6 3 4 beta a" 1 doubles 10 4 5 alpha a' 34 doubles 11 5 6 alpha a' 34 doubles 45 6 7 alpha a" 31 doubles 79 7 8 beta a' 34 doubles 110 8 9 beta a' 35 doubles 144 9 10 beta a" 31 doubles 179 10 Global array virtual files algorithm will be used Parallel file system coherency ......... OK Integral file = ./tmp/fh2.aoints.00 Record size in doubles = 65536 No. of integs per rec = 43688 Max. records in memory = 15 Max. records in file = 2287 No. of bits per label = 8 No. of bits per value = 64 #quartets = 1.396D+05 #integrals = 8.008D+06 #direct = 0.0% #cached =100.0%File balance: exchanges= 12 moved= 15 time= 0.0 Fock matrix recomputed 1-e file size = 12706 1-e file name = ./tmp/fh2.f1 Cpu & wall time / sec 0.2 1.1 tce_ao2e: fast2e=1 half-transformed integrals in memory 2-e (intermediate) file size = 279803475 2-e (intermediate) file name = ./tmp/fh2.v2i Cpu & wall time / sec 1.8 2.3 tce_mo2e: fast2e=1 2-e integrals stored in memory 2-e file size = 119972997 2-e file name = ./tmp/fh2.v2 Cpu & wall time / sec 10.0 10.5 do_pt = F do_lam_pt = F do_cr_pt = F do_lcr_pt = F do_2t_pt = F T1-number-of-tasks 6 t1 file size = 678 t1 file name = ./tmp/fh2.t1 t1 file handle = -998 T2-number-of-boxes 38 t2 file size = 368230 t2 file name = ./tmp/fh2.t2 t2 file handle = -995 CCSD iterations ----------------------------------------------------------------- Iter Residuum Correlation Cpu Wall V2*C2 -----------------------------------------------------------------0: error ival=4(rank:0 hostname:compute-10-15.local pid:19142):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_call_data_server():2193 cond:(pdscr->status==IBV_WC_SUCCESS)12: error ival=4(rank:12 hostname:compute-10-1.local pid:9867):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_call_data_server():2193 cond:(pdscr->status==IBV_WC_SUCCESS)rank 0 in job 8 i10-15_52820 caused collective abort of all ranks exit status of rank 0: killed by signal 9
For UCCSDTQ/vdz calculations, it terminated at the third iteration of CC:
2-e file size = 386290 2-e file name = ./tmp/fh2.v2 Cpu & wall time / sec 0.4 0.4 do_pt = F do_lam_pt = F do_cr_pt = F do_lcr_pt = F do_2t_pt = F T1-number-of-tasks 6 t1 file size = 140 t1 file name = ./tmp/fh2.t1 t1 file handle = -998 T2-number-of-boxes 38 t2 file size = 14660 t2 file name = ./tmp/fh2.t2 t2 file handle = -995 t3 file size = 1160539 t3 file name = ./tmp/fh2.t32: WARNING:armci_set_mem_offset: offset changed 794624 to 92446723: WARNING:armci_set_mem_offset: offset changed 0 to 84500486: WARNING:armci_set_mem_offset: offset changed 794624 to 8450048 8: WARNING:armci_set_mem_offset: offset changed 794624 to 8450048 13: WARNING:armci_set_mem_offset: offset changed 0 to -620834816 t4 file size = 78188214 t4 file name = ./tmp/fh2.t4 CCSDTQ iterations -------------------------------------------------------- Iter Residuum Correlation Cpu Wall -------------------------------------------------------- 1 0.2682660632262 -0.1813353786615 86.8 89.1 2 0.0920127385001 -0.1943555090903 87.5 89.90: error ival=4(rank:0 hostname:compute-10-15.local pid:19656):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_call_data_server():2193 cond:(pdscr->status==IBV_WC_SUCCESS)12: error ival=4(rank:12 hostname:compute-10-1.local pid:10202):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_call_data_server():2193 cond:(pdscr->status==IBV_WC_SUCCESS)application called MPI_Abort(comm=0x84000003, 1) - process 0rank 0 in job 13 i10-15_52820 caused collective abort of all ranks exit status of rank 0: killed by signal 9
If only one node was used everything is also OK. It seems that the actually available memory becomes limited if parallel with multiple nodes.
I also checked the maximum shared memory, which is nearly 36GB:
cat /proc/sys/kernel/shmmax37976435712
The compiling setenvs are:
setenv LARGE_FILES TRUEsetenv LIB_DEFINES "-DDFLT_TOT_MEM=16777216"setenv NWCHEM_TOP /work2/nwchem-6.1.1setenv NWCHEM_TARGET LINUX64setenv ENABLE_COMPONENT yessetenv TCGRSH /usr/bin/sshsetenv USE_MPI "y"setenv USE_MPIF "y"setenv USE_MPIF4 "y"setenv MPI_LOC /work2/intel/impi/4.1.0.024/intel64setenv MPI_LIB ${MPI_LOC}/libsetenv MPI_INCLUDE ${MPI_LOC}/includesetenv LIBMPI "-lmpigf -lmpigi -lmpi_ilp64 -lmpi"setenv IB_HOME /usrsetenv IB_INCLUDE $IB_HOME/includesetenv IB_LIB $IB_HOME/lib64setenv IB_LIB_NAME "-libverbs -libumad -lpthread -lrt"setenv ARMCI_NETWORK OPENIBsetenv PYTHONHOME /usrsetenv PYTHONVERSION 2.4setenv USE_PYTHON64 "y"setenv CCSDTQ yessetenv CCSDTLR yessetenv NWCHEM_MODULES "all python"setenv MKLROOT /work1/soft/intel/mkl/10.1.2.024setenv BLASOPT "-L${MKLROOT}/lib/em64t -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm"setenv FC "ifort -i8 -I${MKLROOT}/include"setenv CC "icc -DMKL_ILP64 -I${MKLROOT}/include"setenv MSG_COMMS MPI
How can I deal with this error? Did anyone get the similar problem?
Any suggestions are welcome.
- EdoapraForum:Admin, Forum:Mod, bureaucrat, sysop
Forum Vet
Threads 1Posts 30711:17:32 AM PDT - Fri, Nov 2nd 2012 Psd,
Your calculations are likely to be crashing while creating shared memory segments.
If you set the environmental variable ARMCI_DEFAULT_SHMMAX to a value of 2048 (or larger),
you should be able to overcome this problem.
Please keep in mind that
ARMCI_DEFAULT_SHMMAX has to be greater or equal than the kernel parameter kernel.shmmax
(Root can only change kernel.shmmax, therefore you might have to ask the system
administrator to do it).
For example, if the value of kernel.shmmax is 4294967296 as in the example below,
ARMCI_DEFAULT_SHMMAX can be at most 4096 (4294967296=4096*1024*1024)
$ sysctl kernel.shmmax
kernel.shmmax = 4294967296
Cheers, Edo
- PsdMember
Clicked A Few Times
Threads 2Posts 87:20:33 AM PDT - Sat, Nov 3rd 2012 Edo,
Thanks for your advise, but this still does not work actually.
When I setenv ARMCI_DEFAULT_SHMMAX 36000, comes this warning:
incorrect ARMCI_DEFAULT_SHMMAX should be <1,8192>mb and 2^N Found=36000
and the output error is:
2-e (intermediate) file size = 279803475 2-e (intermediate) file name = ./tmp/fh2.v2i14: WARNING:armci_set_mem_offset: offset changed 0 to 1212416013: WARNING:armci_set_mem_offset: offset changed 0 to 1212416018: WARNING:armci_set_mem_offset: offset changed 0 to 121241601: WARNING:armci_set_mem_offset: offset changed 0 to 121282562: WARNING:armci_set_mem_offset: offset changed 0 to 121241606: WARNING:armci_set_mem_offset: offset changed 0 to 1212416025: WARNING:armci_set_mem_offset: offset changed 0 to 1199308831: WARNING:armci_set_mem_offset: offset changed 67596288 to 79589376(rank:24 hostname:compute-11-3.local pid:8753):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))Last System Error Message from Task 24:: Cannot allocate memory(rank:12 hostname:compute-11-4.local pid:4225):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))Last System Error Message from Task 12:: Cannot allocate memory(rank:0 hostname:compute-11-32.local pid:22892):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))Last System Error Message from Task 0:: Cannot allocate memoryapplication called MPI_Abort(comm=0x84000003, 1) - process 24application called MPI_Abort(comm=0x84000003, 1) - process 12application called MPI_Abort(comm=0x84000003, 1) - process 0rank 24 in job 11 i11-32_41520 caused collective abort of all ranks exit status of rank 24: killed by signal 9
But if I setenv ARMCI_DEFAULT_SHMMAX <= 8192, it do not run!
As a result I still haven't find the bottleneck.
Quote:Edoapra Nov 2nd 11:17 am
Psd,
Your calculations are likely to be crashing while creating shared memory segments.
If you set the environmental variable ARMCI_DEFAULT_SHMMAX to a value of 2048 (or larger),
you should be able to overcome this problem.
Please keep in mind that
ARMCI_DEFAULT_SHMMAX has to be greater or equal than the kernel parameter kernel.shmmax
(Root can only change kernel.shmmax, therefore you might have to ask the system
administrator to do it).
For example, if the value of kernel.shmmax is 4294967296 as in the example below,
ARMCI_DEFAULT_SHMMAX can be at most 4096 (4294967296=4096*1024*1024)
$ sysctl kernel.shmmax
kernel.shmmax = 4294967296
Cheers, Edo
Your calculations are likely to be crashing while creating shared memory segments.
If you set the environmental variable ARMCI_DEFAULT_SHMMAX to a value of 2048 (or larger),
you should be able to overcome this problem.
Please keep in mind that
ARMCI_DEFAULT_SHMMAX has to be greater or equal than the kernel parameter kernel.shmmax
(Root can only change kernel.shmmax, therefore you might have to ask the system
administrator to do it).
For example, if the value of kernel.shmmax is 4294967296 as in the example below,
ARMCI_DEFAULT_SHMMAX can be at most 4096 (4294967296=4096*1024*1024)
$ sysctl kernel.shmmax
kernel.shmmax = 4294967296
Cheers, Edo
- EdoapraForum:Admin, Forum:Mod, bureaucrat, sysop
Forum Vet
Threads 1Posts 30710:54:54 AM PST - Mon, Nov 5th 2012 What is the error when you set ARMCI_DEFAULT_SHMMAX=8192 ?
Thanks, Edo
- PsdMember
Clicked A Few Times
Threads 2Posts 85:48:45 PM PST - Mon, Nov 5th 2012
argument 1 = fh2.nw(rank:12 hostname:compute-11-3.local pid:1523):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))Last System Error Message from Task 12:: Cannot allocate memoryapplication called MPI_Abort(comm=0x84000003, 1) - process 12(rank:0 hostname:compute-11-32.local pid:4764):ARMCI DASSERT fail. ../../ga-5-1/armci/src/devices/openib/openib.c:armci_server_register_region():1124 cond:(memhdl->memhndl!=((void *)0))Last System Error Message from Task 0:: Cannot allocate memoryrank 12 in job 2 i11-32_41208 caused collective abort of all ranks exit status of rank 12: killed by signal 9 [5:i11-32] unexpected disconnect completion event from [12:i11-3]Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0internal ABORT - process 5
Thanks!
Quote:Edoapra Nov 5th 9:54 am
What is the error when you set ARMCI_DEFAULT_SHMMAX=8192 ?
Thanks, Edo
Thanks, Edo
- EdoapraForum:Admin, Forum:Mod, bureaucrat, sysop
Forum Vet
Threads 1Posts 30710:47:17 AM PST - Tue, Nov 6th 2012 Physical Memory
Psd,
How much physical memory is available on each node and how many processors are you using on each node?
Thanks, Edo
- PsdMember
Clicked A Few Times
Threads 2Posts 87:18:13 PM PST - Tue, Nov 6th 2012 Hi Edo
There's 48G physical memory available on each node, and 12 processors are used on each node.
It is really strange that, for a medium calculation that could run with only one node, fails using two or more nodes. I don't think this break down is caused by the lack of memory, maybe some tools such as ga are not well installed, or maybe some system services are not available. It seems that the host machine can only use up to 1GB remote memory, and the ga do not make the memory sum up.
Quote:Edoapra Nov 6th 9:47 am
Psd,
How much physical memory is available on each node and how many processors are you using on each node?
Thanks, Edo
How much physical memory is available on each node and how many processors are you using on each node?
Thanks, Edo
- EdoapraForum:Admin, Forum:Mod, bureaucrat, sysop
Forum Vet
Threads 1Posts 3073:40:19 PM PST - Wed, Nov 7th 2012 shared memory segments
Psd,
Did you check if there are shared memory segments still allocated on the nodes of your cluster?
You can do it by running the command
ipcs -a
The scripts ipcreset can be used both to display and cleanup existing shared memory segments.
You can find it in
$NWCHEM_TOP/src/tools/ga-5-1/global/testing/ipcreset
Cheers, Edo
- PsdMember
Clicked A Few Times
Threads 2Posts 812:21:14 AM PST - Thu, Nov 15th 2012 Parallel efficiency of CC tasks
Hi Edoapra!
Thanks for your help. We have found the bottleneck and fixed it, and it works well now. This is caused by the default amount of memory infiniband can register, in default it is limited to 4G.
Besides, I performed some CC calculations, and I found that it is unsatisfactory about the paralleling efficiency. For the task as below, if I use16 cores with 1 nodes, each iteration takes 759.6s of CPU time, but when I use128 cores with 8 nodes, it increases to 1326s to complete one iteration. Is this normal?
Thanks!
Jun Chen
2012/11/15
the input file is:
start fh2permanent_dir .scratch_dir ./tmpmemory heap 500 mb stack 500 mb global 9000 mbgeometry units au H -0.466571969 0.000000000 -3.498280516 H 0.624505061 0.000000000 -2.532671944 F -0.008378972 0.000000000 0.319965748# symmetry c1endbasis noprint * library aug-cc-pvqzendSCF semidirect DOUBLET RHF THRESH 1.0d-8 TOL2E 1.0d-8ENDTCE SCF CCSDT THRESH 1.0d-5 FREEZE atomic DIIS 5ENDTASK TCE ENERGY
Quote:Edoapra Nov 7th 2:40 pm
Psd,
Did you check if there are shared memory segments still allocated on the nodes of your cluster?
You can do it by running the command
ipcs -a
The scripts ipcreset can be used both to display and cleanup existing shared memory segments.
You can find it in
$NWCHEM_TOP/src/tools/ga-5-1/global/testing/ipcreset
Cheers, Edo
Did you check if there are shared memory segments still allocated on the nodes of your cluster?
You can do it by running the command
ipcs -a
The scripts ipcreset can be used both to display and cleanup existing shared memory segments.
You can find it in
$NWCHEM_TOP/src/tools/ga-5-1/global/testing/ipcreset
Cheers, Edo
- KarolForum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
Clicked A Few Times
Threads 0Posts 1712:52:53 PM PST - Fri, Nov 16th 2012 Hi,
I am not surprised that your CCSDT/CCSDTQ jobs are not running (or perhaps not scaling properly).
Please look at your tilesizes you are using. For unoccupied orbitals the max. tilesize is 35 which poses a huge demand on the local memory requirement and additionally provide really poor granularity.
For the CCSDT part the local memory demand is proportional to tilesize^6, so please set for the CCSDT the tilesize parameter equal to 15.
For the CCSDTQ part the local memory demand is proportional to tilesize^8, so be even more conservative with the tilesize in these runs. I guess tilesize 8 shoule be fine.
Please also modify the memory settigns. Something like this should work
memory heap 100 mb stack 1200 mb global 2500 mb
Best,
Karol
Forum >>NWChem's corner >> Running NWChem
- memory problem in parallel running "ARMCI DASSERT fail"
- Parallel program running in dual-core environment
- Install PG fail on windows: Problem running post-install step.
- 11gR2 新特性之—In-Memory Parallel execution
- ThreadAbortException when running two tests in parallel, one taking over 30 seconds
- Scalaz(59)- scalaz-stream: fs2-程序并行运算,fs2 running effects in parallel
- Memory Problem
- Compute PI in parallel
- Parallel In-Place Merge
- Parallel scan in HBase
- GNU Parallel in caffe
- Long running JOB due to parallel execution
- NWChem 6.1.1 CCSD(T) parallel running
- running beyond physical memory limits
- Using OpenMP: Portable Shared Memory Parallel Programming
- sync scrollview in parallel way
- Parallel task in C# 4.0
- Parallel mex file in Matlab
- python异常如何全面捕获
- 嵌入式驱动开发的前期Linux 和 C学习(七)
- 专注力的重要性
- 运营商的数据挖掘主题
- HttpContext详解
- memory problem in parallel running "ARMCI DASSERT fail"
- JSONObject和JSONArray
- 设置联系人头像的图片
- 总结思考2013-08-12
- oracle 使用命令创建oracle数据库
- ie6、ie7下JSON.parse JSON未定义的解决方法
- linux下配置MemAdmin
- ORACLE 用户表权限授权
- 我拒绝接受的几个最佳编程实践方法