Vtune: amplxe-cl 命令行使用

来源:互联网 发布:windows pdf阅读器下载 编辑:程序博客网 时间:2024/06/08 07:07

参考文献

点击打开链接

http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/2011Update/lin/ug_docs/index.htm

amplxe-cl -collect hotspots -- ./driver /home/zxx/work_autumn_2011/matrices/rma10.mtx
Reading sparse matrix from file (/home/zxx/work_autumn_2011/matrices/rma10.mtx): done
Using 46835-by-46835 matrix with 2374001 nonzero values
------------------------------------------

####  Testing COO Kernels  ####
    creating coo_matrix:coo transform time elapsed 0.013690

do coo spmv time elapsed 5.434732 seconds
 
orignal do coo spmv time elapsed 5.429192 seconds
 
Using result path `/home/zxx/work_autumn_2011/all_format/r001hs'
Executing actions 75 % Generating a report                                     

Summary
-------

Elapsed Time:  11.312
CPU Time:      11.280
Executing actions 100 % done                      



amplxe-cl -report hotspots -result-dir r001hs

Using result path `/home/zxx/work_autumn_2011/all_format/r001hs'

Executing actions 75 % Generating a report                                     
Function    Module    CPU Time
__spmv_coo_serial_host_sse    driver    5.420
__spmv_coo_serial_host<unsigned int, double>    driver    5.410
read_coo_matrix<unsigned int, double>    driver    0.350
test_coo_matrix_kernels<unsigned int, double>    driver    0.060
coo_to_csr<unsigned int, double>    driver    0.020
csr_to_coo<unsigned int, double>    driver    0.020

Executing actions 100 % done                                         


amplxe-cl -report summary -result-dir r001hs
Using result path `/home/zxx/work_autumn_2011/all_format/r001hs'
Executing actions 75 % Generating a report                                     

Summary
-------

Elapsed Time:  11.312
CPU Time:      11.280
Executing actions 100 % done               

同collect 后面的。

This example runs the hardware event-based sampling collector for the sample application and displays the default summary report.

$ amplxe-cl -collect-with runsa -knob event-config=CPU_CLK_UNHALTED.CORE,CPU_CLK_UNHALTED.REF,INST_RETIRED.ANYhome/test/sample


比较常用的命令

collect

collect-with

event-config

knob

$ amplxe-cl -collect-with runsa -knob event-config=CPU_CLK_UNHALTED.CORE,CPU_CLK_UNHALTED.REF,INST_RETIRED.ANYhome/test/sample


查看报告时比较特殊

$amplxe-cl -report sfdump -result-dir r000rs

Currently, the only way to view the sample-after values is to display the results of a run with the default values using the 'sfdump' report type, e.g.,



sudo amplxe-cl -collect-with runsa  -knob event-config=UOPS_EXECUTED.PORT2_CORE:sa=1000,UOPS_EXECUTED.PORT3_CORE:sa=1000,UOPS_EXECUTED.PORT4_CORE:sa=1000 -- ./driver


 以我的经验,sa>=1000,否则机器容易跑死。

我设了100,1,死了2次。

$ amplxe-cl -report hw-events -r r010runsa/

这个report 类型对于原生事件查看结果比较好


This option enables multiple runs to achieve more precise results for hardware event-based collections.

When disabled, the collector uses event multiplexing.

 sudo amplxe-cl -collect-with runsa -knob event-config=UOPS_EXECUTED.PORT2_CORE,UOPS_EXECUTED.PORT3_CORE,UOPS_EXECUTED.PORT4_CORE -- ./dr    iver 

用了 之后,不能跑第二次。


测的结果不太准啊, 郁闷。。。

不知道为什么,一定要学好architecture system and os system.

找出原因来。