Handling oprofile sample buffer overflows
来源:互联网 发布:unity3d 烟花粒子特效 编辑:程序博客网 时间:2024/05/21 10:12
Handling oprofile sample buffer overflows
What to do when sample buffer overflows are reported
OProfile is a system-wide profiler for Linux, capable of profiling all running code at low overhead.
As of OProfile release 0.9.6, when you run 'opcontrol -shutdown' after a profiling run, oprofile will check for lost samples and display a warning message if any samples have been dropped. The message recommends reviewing the oprofiled.log for more information on lost samples and to try to eliminate or minimize them.
But how does one do that? And how does one interpret the information about lost samples in the log?
Oprofiled.log layout
Let's start out with the interpretation of the log information. The oprofiled.log is located in:
where
# cat /var/lib/oprofile/samples/oprofiled.log
At the end of a profiling run, a section titled "OProfile Statistics" is written to this log. The beginning (summary) part of this section includes several lines giving information about lost samples or events. For example:
oprofiled started Tue Jun 22 17:00:46 2010kernel pointer size: 8Tue Jun 22 17:01:21 2010-- OProfile Statistics --Nr. sample dumps: 4Nr. non-backtrace samples: 4279Nr. kernel samples: 3834Nr. lost samples (no kernel/user): 0Nr. lost kernel samples: 0Nr. incomplete code structs: 0Nr. samples lost due to sample file open failure: 0Nr. samples lost due to no permanent mapping: 1Nr. event lost due to buffer overflow: 0Nr. samples lost due to no mapping: 11Nr. backtraces skipped due to no file mapping: 0Nr. samples lost due to no mm: 42
Following that, is a "Statistics for cpu :
---- Statistics for cpu : 2Nr. samples lost cpu buffer overflow: 0Nr. samples received: 0Nr. backtrace aborted: 0Nr. samples lost invalid pc: 0---- Statistics for cpu : 1Nr. samples lost cpu buffer overflow: 0Nr. samples received: 0Nr. backtrace aborted: 0Nr. samples lost invalid pc: 0---- Statistics for cpu : 0Nr. samples lost cpu buffer overflow: 0Nr. samples received: 67Nr. backtrace aborted: 0Nr. samples lost invalid pc: 0oprofiled stopped Tue Jun 22 17:01:21 2010
Getting back to the summary part, the "OProfile Statistics".
The "Nr. non-backtrace samples" line shows the number of samples actually received.
- Recently discovered (Dec 2011) Bug Alert: The "Nr. non-backtrace samples" statistic up through OProfile 0.9.7 is not accurate. This value is less than the actual number of samples received.
- Looking at the "lost samples' lines, you'll almost always see some samples lost due to "no mapping" or "no mm". This can happen when samples are recorded by the oprofile kernel module for an application, but then that app ends before those samples can be processed by the userspace oprofile daemon. There's not much the user can do about these. Proportionally, the number of such lost samples is usually very low, so these can generally be safely ignored.
- On the other hand, the user often can (and should) take action to eliminate or minimize the number of "events lost due to buffer overflows", especially if that number is high (say, anything over 1%) compared to the total "Nr. non-backtrace samples". The same is true for "Nr. samples lost cpu buffer overflow" in the per-cpu stats sections. OK, so what action(s) should be taken?
Reducing/Eliminating Events lost due to buffer overflow
Note: The "events lost due to buffer overflow" statistic is mis-named; it should be named "samples lost due to buffer overflow". The following text uses the phrasing that the tool provides, but keep this in mind.
Generally speaking, if you're seeing "events lost due to buffer overflow" in the summary part of the "OProfile Statistics" section of the oprofiled.log, the easiest thing to try is lowering your sampling interval (i.e., using a higher 'count' value for the event(s) you're profiling). But since OProfile is a statistical profiler, you need to keep in mind that the fewer the samples over a given period of time, the less accurate your profile data will be. It's difficult to give a rule of thumb on what constitutes a statistically valid profile. But as a rough example, consider that gprof typically collects 100 samples per second. This is fine for the low-hanging fruit, but may not be fine-grained enough for detailed performance investigations. So, for time-based or cycles-based profiles, you should try to do at least an order of magnitude better than gprof.
Here is an example of how to change the 'count' value for the event PM_CYC_GRP1, the count value is the number that follows the colon:
$ opcontrol --event PM_RUN_CYC_GRP1:25000
You can check the new count value by issuing:
$ opcontrol --statusDaemon not runningEvent 0: PM_RUN_CYC_GRP1:25000:0:1:1Separate options: libraryvmlinux file: /boot/vmlinux-2.6.32.12-0.7-ppc64Image filter: noneCall-graph depth: 0
oprofile provides the ability to tailor the sizes of the key working buffers allocated.
$ opcontrol 2>&1 | grep buffer --buffer-size=num kernel buffer size in sample units --buffer-watershed kernel buffer watershed in sample units (2.6 only= --cpu-buffer-size=num per-cpu buffer size in units (2.6 only) --note-table-size kernel notes buffer size in notes units (2.4 only)
If, after lowering your sampling interval, you're still getting too many overflows, try adjusting the buffer size along with the buffer-watershed (which should be around 1/4 the value of buffer size). Once you have eliminated the main buffer overflows, then go after CPU buffer overflows (if any) by adjusting the cpu-buffer-size. The default size for the main event buffer is 131072, and the default size for the per-cpu buffers is 8192.
See the 'opcontrol' man page for details on setting these parameters. The man page uses "num samples" to indicate that the value you enter will initialize a buffer whose size is big enough to hold "x" samples. The samples are unsigned long's, so they would be 64-bits or 32-bits, depending on the bitness of the kernel (since these buffers are allocated by the kernel).
Here is a quick example:
# Set the event (main) buffer size$ opcontrol --buffer-size=1000000# Set the buffer watershed$ opcontrol --buffer-watershed=256000# Set the cpu buffer size$ opcontrol --cpu-buffer-size=16000
Run opcontrol --status again to verify the "non-default" settings.
$ opcontrol --statusDaemon not runningEvent 0: PM_RUN_CYC_GRP1:1000000:0:1:1Separate options: libraryvmlinux file: /boot/vmlinux-2.6.32.12-0.7-ppc64Image filter: noneCall-graph depth: 0Buffer size: 1000000CPU buffer watershed: 250000CPU buffer size: 16000
To restore to the defaults, simply set each of the buffer values to zero (0).
$ opcontrol --buffer-size=0$ opcontrol --buffer-watershed=0$ opcontrol --cpu-buffer-size=0
Testing this on a POWER7 system
We did some testing on a POWER7 @3Ghz, with 32 logical processors - CPUs (i.e., 8 cores, with the default of SMT=4). The table below shows the results. You can see that, with a sampling interval of one sample every 25,000 PM_RUN_CYC events, we found no buffer size settings that prevented lost samples. But by adjusting sampling interval and buffer parameters, we were able to collect profile data with very few (or no) lost samples.
Notes
- With a sampling interval of one sample for every 50,000 run cycles on a 3 GHz system, the total possible number of samples per second is 60,000 x num_cpus (in this case 32 CPUs), assuming all CPUs are 100% busy. Using that assumption, the nominal number of samples for this testing scenario is 1,920,000 per second. That's 3Ghz / 50,000 samples = 60,000 samples per sec.
- Without oprofile running, an example runtime for a single test application is 23 seconds; 53 seconds runtime with app running on all CPUs (we'll call this "nominal runtime"). Thus, 32 apps should register 1,920,000 x 53 = 101,760,000 total samples over the collective runtime. This is a theoretical value, because when oprofile is running, the profiling overhead causes an increased runtime for the apps.
- The data in the table show that, for this type of system, oprofile is indeed capable of processing 1,920,000 million samples per second with appropriate profiling parameters.
Sampling = 25,000 events
All profiling parameters in this case resulted in too many lost samples
PM_RUN_CYC
buffer overflow
per-cpu buffer
(ratio of nominal)
Sampling = 50,000 events
The last two did the best here. The first lost nearly all the samples.
PM_RUN_CYC
buffer overflow
per-cpu buffer
(ratio of nominal)
several thousand
on others1.21Not good50,0001M.25M16K800,00041.22Not good50,0002M.5M16K031.21Good50,0004M1M16K031.23Good
In this example, the main buffer size didn't make a difference between 2MB and 4MB, so for this workload the 2MB example would have been fine. For other workloads, you may see more events lost.
Sampling = 100,000 events
This set of profiling parameters uses even less resources and should provide as statistically valid a profile as when using a sampling interval of 50,000. Note that the oprofile overhead is also substantially less.
PM_RUN_CYC
buffer overflow
per-cpu buffer
(ratio of nominal)
上一篇:安装使用oprofile
下一篇:RC4加密解密
- socket中的短连接与长连接,心...
- Haproxy、Keepalived双主高可...
- Tomcat的性能与最大并发(1000)...
- Nginx深入详解之日志
- socket编程的同步、异步与阻塞...
- linux dhcp peizhi roc
- 关于Unix文件的软链接
- 求教这个命令什么意思,我是新...
- sed -e "/grep/d" 是什么意思...
- 谁能够帮我解决LINUX 2.6 10...
- Handling oprofile sample buffer overflows
- Handling oprofile sample buffer overflows
- Buffer Overflows
- Handling Kernel Stack Overflows
- Win32 Buffer Overflows
- Lab 1: Buffer Overflows
- Buffer overflows 缓冲区溢出
- Buffer Overflows in Regmon Traces
- WebGoat第五关:Buffer Overflows
- How to write Buffer Overflows
- Shellcode locations and buffer overflows in Windows
- Handling Compression (Accept-Encoding) Sample
- What is "Buffer Overflows" and "format string vulnerabilities"
- Object size checking to prevent (some) buffer overflows
- Exploring Heap-Based Buffer Overflows with the Application Verifier
- oprofile
- oprofile
- Oprofile
- C#_ImageList和ListView的使用
- CFI support for GNU assembler (GAS)
- 安装使用oprofile
- 第二本书的第一章 示例一
- Linux中select、poll和epoll的区别
- Handling oprofile sample buffer overflows
- RC4加密解密
- 4 Linux Commands To View Page Faults Statistics
- 加入LINUX内核邮件组
- CPU处理数据流程
- LIBC小记
- 千分位的实现与去除 小数整数都可以实现 正则
- C++静态成员函数访问非静态成员
- Linux:find命令详解