异常的段 错误 肯定是内存泻露 或者是栈溢出造成的
来源:互联网 发布:php 变量函数名 编辑:程序博客网 时间:2024/04/27 18:00
Join them; it only takes a minute:
Sign up
- Ask programming questions
- Answer and help your peers
- Get recognized for your expertise
Weird SIGSEGV segmentation fault in std::string::assign() method from libstdc++.so.6
My program recently encountered a weird segfault when running. I want to know if somebody had met this error before and how it could be fixed. Here is more info:
Basic info:
- CentOS 5.2, kernal version is 2.6.18
- g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50)
- CPU: Intel x86 family
- libstdc++.so.6.0.8
- My program will start multiple threads to process data. The segfault occurred in one of the threads.
- Though it's a multi-thread program, the segfault seemed to occur on a local std::string object. I'll show this in the code snippet later.
- The program is compiled with -g, -Wall and -fPIC, and without -O2 or other optimization options.
The core dump info:
Core was generated by `./myprog'.Program terminated with signal 11, Segmentation fault.#0 0x06f6d919 in __gnu_cxx::__exchange_and_add(int volatile*, int) () from /usr/lib/libstdc++.so.6(gdb) bt#0 0x06f6d919 in __gnu_cxx::__exchange_and_add(int volatile*, int) () from /usr/lib/libstdc++.so.6#1 0x06f507c3 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/libstdc++.so.6#2 0x06f50834 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/libstdc++.so.6#3 0x081402fc in Q_gdw::ProcessData (this=0xb2f79f60) at ../../../myprog/src/Q_gdw/Q_gdw.cpp:798#4 0x08117d3a in DataParser::Parse (this=0x8222720) at ../../../myprog/src/DataParser.cpp:367#5 0x08119160 in DataParser::run (this=0x8222720) at ../../../myprog/src/DataParser.cpp:338#6 0x080852ed in Utility::__dispatch (arg=0x8222720) at ../../../common/thread/Thread.cpp:603#7 0x0052c832 in start_thread () from /lib/libpthread.so.0#8 0x00ca845e in clone () from /lib/libc.so.6
Please note that the segfault begins within the basic_string::operator=().
The related code: (I've shown more code than that might be needed, and please ignore the coding style things for now.)
int Q_gdw::ProcessData(){ char tmpTime[10+1] = {0}; char A01Time[12+1] = {0}; std::string tmpTimeStamp; // Get the timestamp from TP if((m_BackFrameBuff[11] & 0x80) >> 7) { for (i = 0; i < 12; i++) { A01Time[i] = (char)A15Result[i]; } tmpTimeStamp = FormatTimeStamp(A01Time, 12); // Segfault occurs on this line
And here is the prototype of this FormatTimeStamp method:
std::string FormatTimeStamp(const char *time, int len)
I think such string assignment operations should be a kind of commonly used one, but I just don't understand why a segfault could occurr here.
What I have investigated:
I've searched on the web for answers. I looked at here. The reply says try to recompile the program with _GLIBCXX_FULLY_DYNAMIC_STRING macro defined. I tried but the crash still happens.
I also looked at here. It also says to recompile the program with _GLIBCXX_FULLY_DYNAMIC_STRING, but the author seems to be dealing with a different problem with mine, thus I don't think his solution works for me.
Updated on 08/15/2011
Hi guys, here is the original code of this FormatTimeStamp. I understand the coding doesn't look very nice(too many magic numbers, for instance..), but let's focus on the crash issue first.
string Q_gdw::FormatTimeStamp(const char *time, int len){ string timeStamp; string tmpstring; if (time) // It is guaranteed that "time" is correctly zero-terminated, so don't worry about any overflow here. tmpstring = time; // Get the current time point. int year, month, day, hour, minute, second;#ifndef _WIN32 struct timeval timeVal; struct tm *p; gettimeofday(&timeVal, NULL); p = localtime(&(timeVal.tv_sec)); year = p->tm_year + 1900; month = p->tm_mon + 1; day = p->tm_mday; hour = p->tm_hour; minute = p->tm_min; second = p->tm_sec;#else SYSTEMTIME sys; GetLocalTime(&sys); year = sys.wYear; month = sys.wMonth; day = sys.wDay; hour = sys.wHour; minute = sys.wMinute; second = sys.wSecond;#endif if (0 == len) { // The "time" doesn't specify any time so we just use the current time char tmpTime[30]; memset(tmpTime, 0, 30); sprintf(tmpTime, "%d-%d-%d %d:%d:%d.000", year, month, day, hour, minute, second); timeStamp = tmpTime; } else if (6 == len) { // The "time" specifies "day-month-year" with each being 2-digit. // For example: "150811" means "August 15th, 2011". timeStamp = "20"; timeStamp = timeStamp + tmpstring.substr(4, 2) + "-" + tmpstring.substr(2, 2) + "-" + tmpstring.substr(0, 2); } else if (8 == len) { // The "time" specifies "minute-hour-day-month" with each being 2-digit. // For example: "51151508" means "August 15th, 15:51". // As the year is not specified, the current year will be used. string strYear; stringstream sstream; sstream << year; sstream >> strYear; sstream.clear(); timeStamp = strYear + "-" + tmpstring.substr(6, 2) + "-" + tmpstring.substr(4, 2) + " " + tmpstring.substr(2, 2) + ":" + tmpstring.substr(0, 2) + ":00.000"; } else if (10 == len) { // The "time" specifies "minute-hour-day-month-year" with each being 2-digit. // For example: "5115150811" means "August 15th, 2011, 15:51". timeStamp = "20"; timeStamp = timeStamp + tmpstring.substr(8, 2) + "-" + tmpstring.substr(6, 2) + "-" + tmpstring.substr(4, 2) + " " + tmpstring.substr(2, 2) + ":" + tmpstring.substr(0, 2) + ":00.000"; } else if (12 == len) { // The "time" specifies "second-minute-hour-day-month-year" with each being 2-digit. // For example: "305115150811" means "August 15th, 2011, 15:51:30". timeStamp = "20"; timeStamp = timeStamp + tmpstring.substr(10, 2) + "-" + tmpstring.substr(8, 2) + "-" + tmpstring.substr(6, 2) + " " + tmpstring.substr(4, 2) + ":" + tmpstring.substr(2, 2) + ":" + tmpstring.substr(0, 2) + ".000"; } return timeStamp;}
Updated on 08/19/2011
This problem has finally been addressed and fixed. The FormatTimeStamp() function has nothing to do with the root cause, in fact. The segfault is caused by a writing overflow of a local char buffer.
This problem can be reproduced with the following simpler program(please ignore the bad namings of some variables for now):
(Compiled with "g++ -Wall -g main.cpp")
#include <string>#include <iostream>void overflow_it(char * A15, char * A15Result){ int m; int t = 0,i = 0; char temp[3]; for (m = 0; m < 6; m++) { t = ((*A15 & 0xf0) >> 4) *10 ; t += *A15 & 0x0f; A15 ++; std::cout << "m = " << m << "; t = " << t << "; i = " << i << std::endl; memset(temp, 0, sizeof(temp)); sprintf((char *)temp, "%02d", t); // The buggy code: temp is not big enough when t is a 3-digit integer. A15Result[i++] = temp[0]; A15Result[i++] = temp[1]; }}int main(int argc, char * argv[]){ std::string str; { char tpTime[6] = {0}; char A15Result[12] = {0}; // Initialize tpTime for(int i = 0; i < 6; i++) tpTime[i] = char(154); // 154 would result in a 3-digit t in overflow_it(). overflow_it(tpTime, A15Result); str.assign(A15Result); } std::cout << "str says: " << str << std::endl; return 0;}
Here are two facts we should remember before going on: 1). My machine is an Intel x86 machine so it's using the Little Endian rule. Therefore for a variable "m" of int type, whose value is, say, 10, it's memory layout might be like this:
Starting addr:0xbf89bebc: m(byte#1): 10 0xbf89bebd: m(byte#2): 0 0xbf89bebe: m(byte#3): 0 0xbf89bebf: m(byte#4): 0
2). The program above runs within the main thread. When it comes to the overflow_it() function, the variables layout in the thread stack looks like this(which only shows the important variables):
0xbfc609e9 : temp[0]0xbfc609ea : temp[1]0xbfc609eb : temp[2]0xbfc609ec : m(byte#1) <-- Note that m follows temp immediately. m(byte#1) happens to be the byte temp[3].0xbfc609ed : m(byte#2)0xbfc609ee : m(byte#3)0xbfc609ef : m(byte#4)0xbfc609f0 : t...(3 bytes)0xbfc609f4 : i...(3 bytes)...(etc. etc. etc...)0xbfc60a26 : A15Result <-- Data would be written to this buffer in overflow_it()...(11 bytes)0xbfc60a32 : tpTime...(5 bytes)0xbfc60a38 : str <-- Note the str takes up 4 bytes. Its starting address is **16 bytes** behind A15Result.
My analysis:
1). m is a counter in overflow_it() whose value is incremented by 1 at each for loop and whose max value is supposed not greater than 6. Thus it's value could be stored completely in m(byte#1)(remember it's Little Endian) which happens to be temp3.
2). In the buggy line: When t is a 3-digit integer, such as 109, then the sprintf() call would result in a buffer overflow, because serializing the number 109 to the string "109" actually requires 4 bytes: '1', '0', '9' and a terminating '\0'. Because temp[] is allocated with 3 bytes only, the final '\0' would definitely be written to temp3, which is just the m(byte#1), which unfortunately stores m's value. As a result, m's value is reset to 0 every time.
3). The programmer's expectation, however, is that the for loop in the overflow_it() would execute 6 times only, with each time m being incremented by 1. Because m is always reset to 0, the actual loop time is far more than 6 times.
4). Let's look at the variable i in overflow_it(): Every time the for loop is executed, i's value is incremented by 2, and A15Result[i] will be accessed. However, if you compile and run this program, you'll see the i value finally adds up to 24, which means the overflow_it() writes data to the bytes ranging from A15Result[0] to A15Result[23]. Note that the object str is only 16 bytes behind A15Result[0], thus the overflow_it() has "sweeped through" str and destroy it's correct memory layout.
5). I think the correct use of std::string, as it is a non-POD data structure, depends on that that instantiated std::string object must have a correct internal state. But in this program, str's internal layout has been changed by force externally. This should be why the assign() method call would finally cause a segfault.
Update on 08/26/2011
In my previous update on 08/19/2011, I said that the segfault was caused by a method call on a local std::string object whose memory layout had been broken and thus became a "destroyed" object. This is not an "always" true story. Consider the C++ program below:
//C++class A { public: void Hello(const std::string& name) { std::cout << "hello " << name; }};int main(int argc, char** argv){ A* pa = NULL; //!! pa->Hello("world"); return 0;}
The Hello() call would succeed. It would succeed even if you assign an obviously bad pointer to pa. The reason is: the non-virtual methods of a class don't reside within the memory layout of the object, according to the C++ object model. The C++ compiler turns the A::Hello() method to something like, say, A_Hello_xxx(A * const this, ...) which could be a global function. Thus, as long as you don't operate on the "this" pointer, things could go pretty well.
This fact shows that a "bad" object is NOT the root cause that results in the SIGSEGV segfault. The assign() method is not virtual in std::string, thus the "bad" std::string object wouldn't cause the segfault. There must be some other reason that finally caused the segfault.
I noticed that the segfault comes from the __gnu_cxx::__exchange_and_add() function, so I then looked into its source code in this web page:
00046 static inline _Atomic_word 00047 __exchange_and_add(volatile _Atomic_word* __mem, int __val)00048 { return __sync_fetch_and_add(__mem, __val); }
The __exchange_and_add() finally calls the __sync_fetch_and_add(). According to this web page, the __sync_fetch_and_add() is a GCC builtin function whose behavior is like this:
type __sync_fetch_and_add (type *ptr, type value, ...){ tmp = *ptr; *ptr op= value; // Here the "op=" means "+=" as this function is "_and_add". return tmp;}
There it is! The passed-in ptr pointer is dereferenced here. In the 08/19/2011 program, the ptr is actually the "this" pointer of the "bad" std::string object within the assign() method. It is the derefenence at this point that actually caused the SIGSEGV segmentation fault.
We could test this with the following program:
#include <bits/atomicity.h>int main(int argc, char * argv[]){ __sync_fetch_and_add((_Atomic_word *)0, 10); // Would result in a segfault. return 0;}
2 Answers
There are two likely possibilities:
- some code before line 798 has corrupted the local
tmpTimeStamp
object - the return value from
FormatTimeStamp()
was somehow bad.
The _GLIBCXX_FULLY_DYNAMIC_STRING
is most likely a red herring and has nothing to do with the problem.
If you install debuginfo
package for libstdc++
(I don't know what it's called on CentOS), you'll be able to "see into" that code, and might be able to tell whether the left-hand-side (LHS) or the RHS of the assignment operator caused the problem.
If that's not possible, you'll have to debug this at the assembly level. Going into frame #2
and doing x/4x $ebp
should give you previous ebp
, caller address (0x081402fc
), LHS (should match &tmpTimeStamp
in frame #3
), and RHS. Go from there, and good luck!
print *(std::string *)0x7fff74320
or some such. – Employed Russian Aug 15 '11 at 16:11I guess there could be some problem inside FormatTimeStamp
function, but without source code it's hard to say anything. Try to check your program under Valgrind. Usually this helps to fix such sort of bugs.
- 异常的段 错误 肯定是内存泻露 或者是栈溢出造成的
- 正月十五,团圆的一天,银行卡找不到了,肯定是自己的失误造成的
- 为什么实模式下段基地址肯定是16的倍数?
- 这肯定是他们的BUG
- finally是肯定会执行的
- 这个大二肯定是充实的
- 肯定是奶奶来了的飞鸽传书
- 算法调整肯定是有用的
- free失败多是内存越界造成的
- 此错误通常是由宏安全性设置造成的
- 又是打印log错误造成的core dump
- 栈溢出:strcpy()造成的缓冲区溢出
- 运行程序是内存溢出的情况:OutOfMemoryError: PermGenspace
- 关于Tomcat 8.5 启动是内存溢出的问题
- 牛客网“程序发生段错误,可能是数组越界,堆栈溢出(比如,递归调用层数太多)”错误的可能原因
- MSDN也有写错的东西吗?答案当然是肯定的
- 2011.11.13 poj2251 如果要用到BFS或者DFS,如果最短路径的相关问题,那么肯定是BFS!!!不会是DFS!!!
- JVM的内存溢出异常
- 算法面试List
- ListView优化-在滚动的时候不加载图片
- 函数的可重入性
- Qt中QGraphics类坐标映射关系详解(有图有真相,实例讲解)
- Xcode7.3编译unity导出工程出现no simulator只能选择devices解决办法
- 异常的段 错误 肯定是内存泻露 或者是栈溢出造成的
- 模板函数的使用方法
- 算法学习之8皇后问题
- uva 580 Critical Mass
- LinuxMint 17——最易用的Linux桌面操作系统安装步骤图解,傻瓜也会安装linux了!
- Android平台上,如何通过SecureCRT等终端通过命令控制机顶盒,发送键值
- 毕业设计 校园失物招领
- Android Binder机制----代码部分好好理解
- Oracle中使用批处理文件批量建表