学习Python、Cython、PyCuda

来源:互联网 发布:m2m 数据采集 编辑:程序博客网 时间:2024/06/06 08:35

目标

从基于CPU的Cython代码 -> 基于CPU+GPU的Cython+PyCuda代码


原因

源程序用了Python的PyFITS、Astropy等库,打算简单粗暴的把CPU并行部分改成GPU并行,所以加入PyCuda。(代码应该被搞得更复杂了,若大家有更简单的方法,请留言)


准备工作

1. 粗略看完Hetland,M.L.的《Python基础教程》的前10章。(花了一天)

2. 搭建环境:(网上找教程,很多)

  • Note
    • 安装NVIDIA CUDA ToolKit 要注意显卡的版本,不支持NVIDIA GeForce GTX 300及之前的版本。
    • 若是Windows用户,安装Visual Studio 也要注意版本,NVIDIA CUDA ToolKit 支持VS15及之前的版本。(建议使用Linux)


Cython+PyCuda的测试

Cython

1. 初始文件目录

(初始文件是我编辑的,其他编译后新增文件后面会列出)

/*--testCuda/ |  --setup.py  --test/   |    --constants.h    --constants.pxd    --test.pyx    --__init__.py*/

Note:pyx文件、pxd文件

2.各文件内容

  • constants.h

//constants.h#ifndef CONSTANTS_H#define CONSTANTS_H#define PI (3.1415926535897932384626433832)#define TWOPI (PI * 2.0)#endif

  • constants.pxd

#!pythoncdef extern from "./constants.h":    long double PI    long double TWOPI

  • test.pyx

#!python# import python3 compat modulesfrom __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionfrom __future__ import unicode_literals# import std libimport sysimport traceback# import cython specificscimport cythonfrom cython.parallel import prangefrom cython.operator cimport dereference as deref, preincrement as incfrom cpython cimport bool as python_boolcimport openmp# import C/C++ modulesfrom libc.math cimport exp, cos, sin, sqrt, asin, acos, atan2, fabs, fmodfrom libcpp.vector cimport vectorfrom libcpp.pair cimport pairfrom libcpp.set cimport set as cpp_setfrom libcpp cimport boolfrom libcpp.unordered_map cimport unordered_map# import numpy/data typesimport numpy as npfrom numpy cimport (    int8_t, int16_t, int32_t, int64_t, uint8_t, uint16_t,    uint32_t, uint64_t, float32_t, float64_t    )cimport numpy as npfrom .constants cimport PI, TWOPIprint ("test: Cython")print (TWOPI)


  • __init__.py

from .test import *

  • setup.py

#!/usr/bin/env pythonfrom setuptools import setupfrom setuptools.extension import Extensionfrom Cython.Distutils import build_extimport numpyimport platformimport osEX_COMP_ARGS = []TEST_EXT = Extension(//只是照搬一下,求解每句话的意义?'test.test',['test/test.pyx'],extra_compile_args=['-fopenmp', '-O3', '-std=c++11'] + EX_COMP_ARGS,extra_link_args=['-fopenmp'],language='c++',include_dirs=[numpy.get_include(),])setup(name='test_Cython',packages=['test'],cmdclass={'build_ext': build_ext},ext_modules=[TEST_EXT,])


3.编译

$ python setup.py build_ext --inplace


4. 运行

打开python

$ python

运行

>>> import testtest: Cython6.28318530718>>>


5. 此时文件目录

/*--testCuda/ |  --setup.py  --test/   |    --constants.h    --constants.pxd    --test.pyx    --test.cpp    --test.so    --__init__.py    --__init__.pyc  --build/   |    --temp.linux-x86_64-2.7*/

PyCuda

把PyCuda样例加入test.pyx

#!python# import python3 compat modulesfrom __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionfrom __future__ import unicode_literals# import std libimport sysimport traceback# import cython specificscimport cythonfrom cython.parallel import prangefrom cython.operator cimport dereference as deref, preincrement as incfrom cpython cimport bool as python_boolcimport openmp# import C/C++ modulesfrom libc.math cimport exp, cos, sin, sqrt, asin, acos, atan2, fabs, fmodfrom libcpp.vector cimport vectorfrom libcpp.pair cimport pairfrom libcpp.set cimport set as cpp_setfrom libcpp cimport boolfrom libcpp.unordered_map cimport unordered_map# import numpy/data typesimport numpy as npfrom numpy cimport (    int8_t, int16_t, int32_t, int64_t, uint8_t, uint16_t,    uint32_t, uint64_t, float32_t, float64_t    )cimport numpy as npfrom .constants cimport PI, TWOPIprint ("test: Cython")print (TWOPI)import pycuda.driver as cudaimport pycuda.autoinitfrom pycuda.compiler import SourceModulea = np.random.randn(4,4)a = a.astype(numpy.float32)a_gpu = cuda.mem_alloc(a.size * a.dtype.itemsize)cuda.memcpy_htod(a_gpu, a)mod = SourceModule("""    __global__ void doublify(float *a)    {      int idx = threadIdx.x + threadIdx.y*4;      a[idx] *= 2;    }    """)func = mod.get_function(str("doublify"))func(a_gpu, block=(4,4,1))a_doubled = np.empty_like(a)cuda.memcpy_dtoh(a_doubled, a_gpu)print ("original array:")print (a)print ("doubled with kernel:")print (a_doubled)


编译->运行,结果如下:

>>> import testtest: Cython6.28318530718original array:[...省略]doubled with kernel:[...省略]>>>


总结:

PyCuda是可以和Cython结合的!希望路过的大牛能甩几个言简意赅的帖子,让我深入理解一下。

原创粉丝点击