HPC GPU Node:

来源:互联网 发布:淘宝子账号认证失败 编辑:程序博客网 时间:2024/05/17 03:04

https://hpc.oit.uci.edu/gpu



HPC GPU Node:


NVIDIA Corporation has graciously donated four (4) of their top high-end Tesla M2090 GPU cards to the HPC Cluster at UCI for your research needs.

Each NVIDIA Tesla M2090 card has the following attributes:
Peak double precision floating point performance665 GigaflopsPeak single precision floating point performance1331 GigaflopsMemory bandwidth (ECC off)177 GBytes/secMemory size (GDDR5)6 GigaBytesCUDA cores512


The GPU node ( compute-1-14 ) has dual Intel Xeon DP E5645 2.4GHz 12MB cache (24 cores) CPUs with 96GB DDR3 1333Mhz of main memory.  

There are a total of 2,048 CUDA cores with the 4 Tesla M2090 NVIDIA cards.

When requesting GPU resources, please try requesting 6 Intel cores per each gpu card you request.  Since the node has 24 Intel cores, the division comes out to 6 Intel cores per each GPU card.    

There are no fixed numbers when requesting cores verses GPU cards, it all depends on the running program.  If  you can run with 2 Intel cores and 2 GPU cards, then use those numbers.


Consider the following CUDA script file is available at: 
~demo/hello-cuda.sh

    $ cat  ~demo/hello-cuda.sh
 #$ -q gpu
 Requesting the GPU queue. #$ -l gpu=1
Requesting 1 gpu card out of 4 avilable gpu cards.
 #$ -pe gpu-node-cores 6
 Run with the Parallel Enviroment "gpu-node-core" requesting 6 node cores.

Let's run a cuda hello world example:

$ mkdir cuda
$ cd cuda
$ cp ~demo/hello-cuda.sh  .
$ qsub hello-cuda.sh
$ qstat


Check the directory for the output "out" file and other files the script created.



How many GPU's are available now?

As mentioned above, the GPU compute-1-14 node has 4 GPU cards.    To see how many gpus are currently avaialble use:

$ qhost -F gpu -h compute-1-14
HOSTNAME           NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
--------------------------------------------------------------------------------
compute-1-14        24    2   12   24  0.69   94.6G    1.8G   94.4G     0.0
    Host Resource(s):      hc:gpu=4.000000

GPU compute node compute-1-14 has 4 gpu's available.



CUDA-Compilers

CUDA compiler, debugger and libs are available with:

    module load  nvidia-cuda/5.0



CUDA Documentation:

On the HPC cluster, you can get additional help files at /data/apps/cuda/doc  or by clicking on this link.

The SDK CUDA Toolkit has been installed in /data/apps/cuda/NVIDIA_GPU_Computing_SDK

CUDA SDK Toolkit Documentation is also available from this link.




NVIDIA-SMI

To display the GPU information, you can use the qrsh command as follows:

qrsh -q gpu nvidia-smi  

Fri Apr 19 10:10:01 2012       
+------------------------------------------------------+                       
| NVIDIA-SMI 3.295.41   Driver Version: 295.41         |                       
|-------------------------------+----------------------+----------------------+
| Nb.  Name                     | Bus Id        Disp.  | Volatile ECC SB / DB |
| Fan   Temp   Power Usage /Cap | Memory Usage         | GPU Util. Compute M. |
|===============================+======================+======================|
0.  Tesla M2090               | 0000:04:00.0  Off    |         0          0 |
|  N/A    N/A  P0    77W / 225W |   6%  330MB / 5375MB |   31%     Default    |
|-------------------------------+----------------------+----------------------|
| 1.  Tesla M2090               | 0000:05:00.0  Off    |         0          0 |
|  N/A    N/A  P12   29W / 225W |   0%   10MB / 5375MB |    0%     Default    |
|-------------------------------+----------------------+----------------------|
| 2.  Tesla M2090               | 0000:83:00.0  Off    |         0          0 |
|  N/A    N/A  P12   27W / 225W |   0%   10MB / 5375MB |    0%     Default    |
|-------------------------------+----------------------+----------------------|
| 3.  Tesla M2090               | 0000:84:00.0  Off    |         0          0 |
|  N/A    N/A  P12   28W / 225W |   0%   10MB / 5375MB |    0%     Default    |
|-------------------------------+----------------------+----------------------|
| Compute processes:                                               GPU Memory |
|  GPU  PID     Process name                                       Usage      |
|=============================================================================|
|  0.  13951    ...namd/NAMD_2.9b3_Linux-x86_64-multicore-CUDA/namd2   317MB  |
+-----------------------------------------------------------------------------+

In the display above, Tesla #0 is active and has a load of 31%.   All other Tesla cards are idle ( 0% utilization ).

You can get additional help for nvidia-smi on compute-1-14 with:
    • nvidia-smi -h
    • man nvidia-smi



If you are familiar with using GPU and like to contributing to help others learn how to use the GPU node, please let me know and I will post in on the HPC How To list.

0 0
原创粉丝点击