C标准库源码解读(VC9.0版本)——ctype.h

来源:互联网 发布:威纶通触摸屏如何编程 编辑:程序博客网 时间:2024/05/17 07:43

      ANSI C文档(C89 http://flash-gordon.me.uk/ansi.c.txt)如是说:

4.3 CHARACTER HANDLING <ctype.h>  

The header <ctype.h> declares several functions useful for testingand mapping characters./89/ In all cases the argument is an int , the value of which shall be representable as an unsigned char or shallequal the value of the macro EOF .  If the argument has any othervalue, the behavior is undefined.

      这里说明了ctype.h里面实现的所有测试字符的函数传进去的参数都以int形参给出,并且有效值是无符号字符(unsigned char)或EOF宏(通常实现为-1),其他输入的int型值行为未定义。

      C标准库规定了需要实现的13个字符处理函数,可以分为两类——字符测试函数和字符转换函数。

      下面给出了C99的字符测试函数表,表中是12个字符测试函数,比原来的标准多出来一个isblank。http://www.open-std.org/JTC1/SC22/WG14/www/docs/C99RationaleV5.10.pdf P118

7.4.1.3 The isblank function

A new feature of C99: text processing applications often need to distinguish white space that can 15 occur within lines from white space that separates lines (for example, see §6.10 regarding use of whitespace in the preprocessor). This distinction is also a property of POSIX locale definition files.

 

ASCII valuescharactersiscntrlisblankisspaceisupperislowerisalphaisdigitisxdigitisalnumispunctisgraphisprint0x00 .. 0x08NUL, (other control codes)x           0x09tab ('\t')xxx         0x0A .. 0x0D(white-space control codes: '\f','\v','\n','\r')x x         0x0E .. 0x1F(other control codes)x           0x20space (' ') xx        x0x21 .. 0x2F!"#$%&'()*+,-./         xxx0x30 .. 0x390123456789      xxx xx0x3a .. 0x40:;<=>?@         xxx0x41 .. 0x46ABCDEF   x x xx xx0x47 .. 0x5AGHIJKLMNOPQRSTUVWXYZ   x x  x xx0x5B .. 0x60[\]^_`         xxx0x61 .. 0x66abcdef    xx xx xx0x67 .. 0x7Aghijklmnopqrstuvwxyz    xx  x xx0x7B .. 0x7E{|}~         xxx0x7F(DEL)x           

上表来源:http://www.cplusplus.com/reference/cctype/

下面英文部分是字符测试函数要求描述,摘自C89标准文档说明

4.3.1 Character testing functions  

The functions in this section return nonzero (true) if and only ifthe value of the argument c conforms to that in the description of thefunction.

4.3.1.1 The isalnum functionSynopsis        

 #include <ctype.h>        

int isalnum(int c);

Description   The isalnum function tests for any character for which isalpha orisdigit is true.

 

4.3.1.2 The isalpha functionSynopsis        

#include <ctype.h>        

 int isalpha(int c);

Description   The isalpha function tests for any character for which isupper orislower is true, or any of an implementation-defined set of charactersfor which none of iscntrl , isdigit , ispunct , or isspace is true.In the C locale, isalpha returns true only for the characters forwhich isupper or islower is true.

 

4.3.1.3 The iscntrl functionSynopsis        

 #include <ctype.h>        

int iscntrl(int c);

Description   The iscntrl function tests for any control character. 

 

4.3.1.4 The isdigit functionSynopsis        

 #include <ctype.h>        

int isdigit(int c);

Description   The isdigit function tests for any decimal-digit character (asdefined in $2.2.1).

 

4.3.1.5 The isgraph functionSynopsis        

 #include <ctype.h>        

int isgraph(int c);

Description   The isgraph function tests for any printing character except space (' '). 

 

4.3.1.6 The islower functionSynopsis        

#include <ctype.h>        

int islower(int c);Description  

The islower function tests for any lower-case letter or any of animplementation-defined set of characters for which none of iscntrl ,isdigit , ispunct , or isspace is true.  In the C locale, islowerreturns true only for the characters defined as lower-case letters (asdefined in $2.2.1).

 

4.3.1.7 The isprint functionSynopsis        

#include <ctype.h>        

int isprint(int c);

Description   The isprint function tests for any printing character includingspace (' ').

 

4.3.1.8 The ispunct functionSynopsis        

#include <ctype.h>        

int ispunct(int c);

Description   The ispunct function tests for any printing character except space(' ') or a character for which isalnum is true.

 

4.3.1.9 The isspace functionSynopsis        

#include <ctype.h>        

int isspace(int c);

Description   The isspace function tests for the standard white-space charactersor for any of an implementation-defined set of characters for whichisalnum is false.  The standard white-space characters are thefollowing: space (' '), form feed ('\f'), new-line ('\n'), carriagereturn ('\r'), horizontal tab ('\t'), and vertical tab ('\v').  In theC locale, isspace returns true only for the standard white-spacecharacters.

 

4.3.1.10 The isupper functionSynopsis        

 #include <ctype.h>        

int isupper(int c);

Description   The isupper function tests for any upper-case letter or any of animplementation-defined set of characters for which none of iscntrl ,isdigit , ispunct , or isspace is true.  In the C locale, isupperreturns true only for the characters defined as upper-case letters (asdefined in $2.2.1).

 

4.3.1.11 The isxdigit functionSynopsis        

#include <ctype.h>        

int isxdigit(int c);

Description   The isxdigit function tests for any hexadecimal-digit character (asdefined in $3.1.3.2).

 

      我原来以为要isalpha()判断是否是英文字母类型需要一些像 if(c>='A' && c<='Z')之类的代码,但看完了,发现是查表。在线程数据初始化的时候会构造一个所有控制字符和可视字符的数组,存在线程局部数据表里;在我们使用以上函数判断是否为某一类型时,查表确定结果。下面先给出表(数组),探究表设计的原理,然后把跟踪过程学习到的一些windowsAPI原理以及系统知识罗列出来。

      在VC/ctr/src/ctype.c里,有如下定义:

const unsigned short __newctype[384] = {        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,        0, 0, 0, 0, 0, 0, 0,        0,                      /* -1 EOF   */        _CONTROL,               /* 00 (NUL) */        _CONTROL,               /* 01 (SOH) */        _CONTROL,               /* 02 (STX) */        _CONTROL,               /* 03 (ETX) */        _CONTROL,               /* 04 (EOT) */        _CONTROL,               /* 05 (ENQ) */        _CONTROL,               /* 06 (ACK) */        _CONTROL,               /* 07 (BEL) */        _CONTROL,               /* 08 (BS)  */        _SPACE+_CONTROL,        /* 09 (HT)  */        _SPACE+_CONTROL,        /* 0A (LF)  */        _SPACE+_CONTROL,        /* 0B (VT)  */        _SPACE+_CONTROL,        /* 0C (FF)  */        _SPACE+_CONTROL,        /* 0D (CR)  */        _CONTROL,               /* 0E (SI)  */        _CONTROL,               /* 0F (SO)  */        _CONTROL,               /* 10 (DLE) */        _CONTROL,               /* 11 (DC1) */        _CONTROL,               /* 12 (DC2) */        _CONTROL,               /* 13 (DC3) */        _CONTROL,               /* 14 (DC4) */        _CONTROL,               /* 15 (NAK) */        _CONTROL,               /* 16 (SYN) */        _CONTROL,               /* 17 (ETB) */        _CONTROL,               /* 18 (CAN) */        _CONTROL,               /* 19 (EM)  */        _CONTROL,               /* 1A (SUB) */        _CONTROL,               /* 1B (ESC) */        _CONTROL,               /* 1C (FS)  */        _CONTROL,               /* 1D (GS)  */        _CONTROL,               /* 1E (RS)  */        _CONTROL,               /* 1F (US)  */        _SPACE+_BLANK,          /* 20 SPACE */        _PUNCT,                 /* 21 !     */        _PUNCT,                 /* 22 "     */        _PUNCT,                 /* 23 #     */        _PUNCT,                 /* 24 $     */        _PUNCT,                 /* 25 %     */        _PUNCT,                 /* 26 &     */        _PUNCT,                 /* 27 '     */        _PUNCT,                 /* 28 (     */        _PUNCT,                 /* 29 )     */        _PUNCT,                 /* 2A *     */        _PUNCT,                 /* 2B +     */        _PUNCT,                 /* 2C ,     */        _PUNCT,                 /* 2D -     */        _PUNCT,                 /* 2E .     */        _PUNCT,                 /* 2F /     */        _DIGIT+_HEX,            /* 30 0     */        _DIGIT+_HEX,            /* 31 1     */        _DIGIT+_HEX,            /* 32 2     */        _DIGIT+_HEX,            /* 33 3     */        _DIGIT+_HEX,            /* 34 4     */        _DIGIT+_HEX,            /* 35 5     */        _DIGIT+_HEX,            /* 36 6     */        _DIGIT+_HEX,            /* 37 7     */        _DIGIT+_HEX,            /* 38 8     */        _DIGIT+_HEX,            /* 39 9     */        _PUNCT,                 /* 3A :     */        _PUNCT,                 /* 3B ;     */        _PUNCT,                 /* 3C <     */        _PUNCT,                 /* 3D =     */        _PUNCT,                 /* 3E >     */        _PUNCT,                 /* 3F ?     */        _PUNCT,                 /* 40 @     */        _UPPER+_HEX,            /* 41 A     */        _UPPER+_HEX,            /* 42 B     */        _UPPER+_HEX,            /* 43 C     */        _UPPER+_HEX,            /* 44 D     */        _UPPER+_HEX,            /* 45 E     */        _UPPER+_HEX,            /* 46 F     */        _UPPER,                 /* 47 G     */        _UPPER,                 /* 48 H     */        _UPPER,                 /* 49 I     */        _UPPER,                 /* 4A J     */        _UPPER,                 /* 4B K     */        _UPPER,                 /* 4C L     */        _UPPER,                 /* 4D M     */        _UPPER,                 /* 4E N     */        _UPPER,                 /* 4F O     */        _UPPER,                 /* 50 P     */        _UPPER,                 /* 51 Q     */        _UPPER,                 /* 52 R     */        _UPPER,                 /* 53 S     */        _UPPER,                 /* 54 T     */        _UPPER,                 /* 55 U     */        _UPPER,                 /* 56 V     */        _UPPER,                 /* 57 W     */        _UPPER,                 /* 58 X     */        _UPPER,                 /* 59 Y     */        _UPPER,                 /* 5A Z     */        _PUNCT,                 /* 5B [     */        _PUNCT,                 /* 5C \     */        _PUNCT,                 /* 5D ]     */        _PUNCT,                 /* 5E ^     */        _PUNCT,                 /* 5F _     */        _PUNCT,                 /* 60 `     */        _LOWER+_HEX,            /* 61 a     */        _LOWER+_HEX,            /* 62 b     */        _LOWER+_HEX,            /* 63 c     */        _LOWER+_HEX,            /* 64 d     */        _LOWER+_HEX,            /* 65 e     */        _LOWER+_HEX,            /* 66 f     */        _LOWER,                 /* 67 g     */        _LOWER,                 /* 68 h     */        _LOWER,                 /* 69 i     */        _LOWER,                 /* 6A j     */        _LOWER,                 /* 6B k     */        _LOWER,                 /* 6C l     */        _LOWER,                 /* 6D m     */        _LOWER,                 /* 6E n     */        _LOWER,                 /* 6F o     */        _LOWER,                 /* 70 p     */        _LOWER,                 /* 71 q     */        _LOWER,                 /* 72 r     */        _LOWER,                 /* 73 s     */        _LOWER,                 /* 74 t     */        _LOWER,                 /* 75 u     */        _LOWER,                 /* 76 v     */        _LOWER,                 /* 77 w     */        _LOWER,                 /* 78 x     */        _LOWER,                 /* 79 y     */        _LOWER,                 /* 7A z     */        _PUNCT,                 /* 7B {     */        _PUNCT,                 /* 7C |     */        _PUNCT,                 /* 7D }     */        _PUNCT,                 /* 7E ~     */        _CONTROL,               /* 7F (DEL) */        /* and the rest are 0... */};

      再补充一下宏(在VC/include/ctype.h):

/* set bit masks for the possible character types */#define _UPPER          0x1     /* upper case letter */#define _LOWER          0x2     /* lower case letter */#define _DIGIT          0x4     /* digit[0-9] */#define _SPACE          0x8     /* tab, carriage return, newline, */                                /* vertical tab or form feed */#define _PUNCT          0x10    /* punctuation character */#define _CONTROL        0x20    /* control character */#define _BLANK          0x40    /* space char */#define _HEX            0x80    /* hexadecimal digit */#define _LEADBYTE       0x8000                  /* multibyte leadbyte */#define _ALPHA          (0x0100|_UPPER|_LOWER)  /* alphabetic character */

      那要怎么使用呢?__newctype[]里面前128个元素值都是0,后面的元素在注释里标注了对应的ascii码值,也就是说__newctype+128就是字符‘\0’对应的元素值了。实际在VC实现里面就是使用__newctype+128的形式(还有__newctype[-1]呢!取到的值就是EOF)。上面定义的_UPPER,__LOWER等,明显是使用了位设置单一属性,这样,我们如果要判断一个字符c是否为英文字符,只需要判断 __newctype[128+(int)c]&(128|_UPPER|_LOWER)就可以了!只是一个位与运算。这就是查表原理的实现。照这样说,后面的诸多判断函数,都不用去跟踪解析了,哈哈!

      那么,这些内容是存在哪里?我是怎么能跟踪到这个地方的呢?这个过程,我学习到了几个新知识,下面一步一步展开。

      写一段再简单不过的语句

#include <ctype.h>void main(){ int i = 64;int kk = isalpha(i);}

     这是我跟踪的调用栈:

 msvcr90d.dll!__set_flsgetvalue()  Line 256C msvcr90d.dll!_getptd_noexit()  Line 578 + 0xb bytesC msvcr90d.dll!_getptd()  Line 641 + 0x5 bytesC msvcr90d.dll!_LocaleUpdate::_LocaleUpdate(localeinfo_struct * plocinfo=0x00000000)  Line 264 + 0x5 bytesC++>msvcr90d.dll!_chvalidator_l(localeinfo_struct * plocinfo=0x00000000, int c=0x00000040, int mask=0x00000103)  Line 68C++ msvcr90d.dll!_chvalidator(int c=0x00000040, int mask=0x00000103)  Line 57 + 0xf bytesC++ msvcr90d.dll!isalpha(int c=0x00000040)  Line 69 + 0xe bytesC++>ConsoleApp.exe!main()  Line 11 + 0xc bytesC

      第一层(在上面的表值定义里也有的#define _ALPHA          (0x0100|_UPPER|_LOWER)  /* alphabetic character */):

extern __inline int (__cdecl isalpha) (        int c        ){    if (__locale_changed == 0)    {        return __fast_ch_check(c, _ALPHA);    }    else    {        return (_isalpha_l)(c, NULL);    }}

      我们进入的是__fast_ch_check函数(__local_changed我找不到其意义的定义,但我看了_isalpha_l的实现,确定其最终与__fast_ch_check调用到同一个地方,所以在分析过程可以忽略)。__fast_ch_check是一个宏:

#ifdef _DEBUG#define __fast_ch_check(a,b)       _chvalidator(a,b)#else  /* _DEBUG */#define __fast_ch_check(a,b)       (__initiallocinfo.pctype[(a)] & (b))#endif  /* _DEBUG */

      调试的时候我们使用的是_DEBUG版本的:

#if defined (_DEBUG)extern "C" int __cdecl _chvalidator(        int c,        int mask        ){        _ASSERTE((unsigned)(c + 1) <= 256);        return _chvalidator_l(NULL, c, mask);}extern "C" int __cdecl _chvalidator_l(        _locale_t plocinfo,        int c,        int mask        ){    _LocaleUpdate _loc_update(plocinfo);    _ASSERTE((unsigned)(c + 1) <= 256);    if (c >= -1 && c <= 255)    {        return (_loc_update.GetLocaleT()->locinfo->pctype[c] & mask);    }    else    {        return (_loc_update.GetLocaleT()->locinfo->pctype[-1] & mask);    }}#endif  /* defined (_DEBUG) */

      到上面这段代码,mask是0x103(_ALPHA),c是我们输入的0x40(64)。可以看到,这里使用了C++的语法,用了类的函数调用GetLocaleT()。如果输入在-1到255范围内,取pctype[c]与mask作位与运算的结果为返回值。

      问题是:_LocaleUpdate类的作用是什么?这个pctype数组又是什么内容?


 

#ifdef __cplusplusclass _LocaleUpdate{    _locale_tstruct localeinfo;    _ptiddata ptd;    bool updated;    public:    _LocaleUpdate(_locale_t plocinfo)        : updated(false)    {        if (plocinfo == NULL)        {            ptd = _getptd();            localeinfo.locinfo = ptd->ptlocinfo;            localeinfo.mbcinfo = ptd->ptmbcinfo;            __UPDATE_LOCALE(ptd, localeinfo.locinfo);            __UPDATE_MBCP(ptd, localeinfo.mbcinfo);            if (!(ptd->_ownlocale & _PER_THREAD_LOCALE_BIT))            {                ptd->_ownlocale |= _PER_THREAD_LOCALE_BIT;                updated = true;            }        }        else        {            localeinfo=*plocinfo;        }    }    ~_LocaleUpdate()    {        if (updated)            ptd->_ownlocale = ptd->_ownlocale & ~_PER_THREAD_LOCALE_BIT;    }    _locale_t GetLocaleT()    {        return &localeinfo;    }};#endif  /* __cplusplus */

      可以清楚的看到, GetLocaleT返回的是&localeinfo,localeinfo中的locinfo结构中的pctype是我们需要找的数据,自然追到_getptd()函数(在tidtable.c文件中):

_ptiddata __cdecl _getptd (        void        ){        _ptiddata ptd = _getptd_noexit();        if (!ptd) {            _amsg_exit(_RT_THREAD); /* write message and die */        }        return ptd;}

_ptiddata __cdecl _getptd_noexit (        void        ){    _ptiddata ptd;    DWORD   TL_LastError;    TL_LastError = GetLastError();#ifdef _M_IX86    /*     * Initialize FlsGetValue function pointer in TLS by calling __set_flsgetvalue()     */    if ( (ptd = (__set_flsgetvalue())(__flsindex)) == NULL ) {#else  /* _M_IX86 */    if ( (ptd = FLS_GETVALUE(__flsindex)) == NULL ) {#endif  /* _M_IX86 */        /*         * no per-thread data structure for this thread. try to create         * one.         */#ifdef _DEBUG        extern void * __cdecl _calloc_dbg_impl(size_t, size_t, int, const char *, int, int *);        if ((ptd = _calloc_dbg_impl(1, sizeof(struct _tiddata), _CRT_BLOCK, __FILE__, __LINE__, NULL)) != NULL) {#else  /* _DEBUG */        if ((ptd = _calloc_crt(1, sizeof(struct _tiddata))) != NULL) {#endif  /* _DEBUG */            if (FLS_SETVALUE(__flsindex, (LPVOID)ptd) ) {                /*                 * Initialize of per-thread data                 */                _initptd(ptd,NULL);                ptd->_tid = GetCurrentThreadId();                ptd->_thandle = (uintptr_t)(-1);            }            else {                /*                 * Return NULL to indicate failure                 */                _free_crt(ptd);                ptd = NULL;            }        }    }    SetLastError(TL_LastError);    return(ptd);}

      只需要注意_set_flsgetvalue函数:

_CRTIMP PFLS_GETVALUE_FUNCTION __cdecl __set_flsgetvalue(){#ifdef _M_IX86    PFLS_GETVALUE_FUNCTION flsGetValue = FLS_GETVALUE;    if (!flsGetValue)    {        flsGetValue = _decode_pointer(gpFlsGetValue);        TlsSetValue(__getvalueindex, flsGetValue);    }    return flsGetValue;#else  /* _M_IX86 */    return NULL;#endif  /* _M_IX86 */}

      其中有宏定义:

#define FLS_GETVALUE    ((PFLS_GETVALUE_FUNCTION)TlsGetValue(__getvalueindex))

      至此,我们跟到了windows API层次!TlsGetValue,跟不下去了。__getvalueindex是一个全局变量,值为1。这个函数返回函数指针,__set_flsgetvalue())(__flsindex),就相当于TlsGetValue(__flsindex)。

      那么TlsGetValue到底干了什么事,返回一个其中带有pctype结构的结构体。当然问MSDN了。

      http://msdn.microsoft.com/en-us/library/windows/desktop/ms686812(v=vs.85).aspx

      http://msdn.microsoft.com/en-us/library/windows/desktop/ms686749(v=vs.85).aspx

ms686749.tls(en-us,VS.85).png

       额,这些东西涉及windows操作系统线程的一些数据原理了。简单说就是线程运行需要开辟自己的数据空间,用一个指针数组存放数据的指针,以tlsindex来访问。那回过头来,我们这个线程所使用的数据是用TlsGetValue(tlsindex)取得——但从哪里设置的呢?根据MSDN上说的,那么是用TlsSetValue来设置的咯。

      同样在pctype.c里面有如下初始化函数(根据经验找到的,再经过断点验证猜测):

int __cdecl _mtinit (        void        ){        _ptiddata ptd;#ifdef _M_IX86        /*         * Initialize fiber local storage function pointers.         */        HINSTANCE hKernel32 = _crt_wait_module_handle(_KERNEL32);        if (hKernel32 == NULL) {            _mtterm();            return FALSE;       /* fail to load DLL */        }        gpFlsAlloc = (PFLS_ALLOC_FUNCTION)GetProcAddress(hKernel32,                                                            "FlsAlloc");        gpFlsGetValue = (PFLS_GETVALUE_FUNCTION)GetProcAddress(hKernel32,                                                                "FlsGetValue");        gpFlsSetValue = (PFLS_SETVALUE_FUNCTION)GetProcAddress(hKernel32,                                                                "FlsSetValue");        gpFlsFree = (PFLS_FREE_FUNCTION)GetProcAddress(hKernel32,                                                        "FlsFree");        if (!gpFlsAlloc || !gpFlsGetValue || !gpFlsSetValue || !gpFlsFree) {            gpFlsAlloc = (PFLS_ALLOC_FUNCTION)__crtTlsAlloc;            gpFlsGetValue = (PFLS_GETVALUE_FUNCTION)TlsGetValue;            gpFlsSetValue = (PFLS_SETVALUE_FUNCTION)TlsSetValue;            gpFlsFree = (PFLS_FREE_FUNCTION)TlsFree;        }        /*         * Allocate and initialize a TLS index to store FlsGetValue pointer         * so that the FLS_* macros can work transparently         */        if ( (__getvalueindex = TlsAlloc()) == TLS_OUT_OF_INDEXES ||             !TlsSetValue(__getvalueindex, (LPVOID)gpFlsGetValue) ) {            return FALSE;        }#endif  /* _M_IX86 */        _init_pointers();       /* initialize global function pointers */#ifdef _M_IX86        /*         * Encode the fiber local storage function pointers         */        gpFlsAlloc = (PFLS_ALLOC_FUNCTION) _encode_pointer(gpFlsAlloc);        gpFlsGetValue = (PFLS_GETVALUE_FUNCTION) _encode_pointer(gpFlsGetValue);        gpFlsSetValue = (PFLS_SETVALUE_FUNCTION) _encode_pointer(gpFlsSetValue);        gpFlsFree = (PFLS_FREE_FUNCTION) _encode_pointer(gpFlsFree);#endif  /* _M_IX86 */        /*         * Initialize the mthread lock data base         */        if ( !_mtinitlocks() ) {            _mtterm();            return FALSE;       /* fail to load DLL */        }        /*         * Allocate a TLS index to maintain pointers to per-thread data         */        if ( (__flsindex = FLS_ALLOC(&_freefls)) == FLS_OUT_OF_INDEXES ) {            _mtterm();            return FALSE;       /* fail to load DLL */        }        /*         * Create a per-thread data structure for this (i.e., the startup)         * thread.         */        if ( ((ptd = _calloc_crt(1, sizeof(struct _tiddata))) == NULL) ||             !FLS_SETVALUE(__flsindex, (LPVOID)ptd) )        {            _mtterm();            return FALSE;       /* fail to load DLL */        }        /*         * Initialize the per-thread data         */        _initptd(ptd,NULL);        ptd->_tid = GetCurrentThreadId();        ptd->_thandle = (uintptr_t)(-1);        return TRUE;}

      关键点在:_initptd():

_CRTIMP void __cdecl _initptd (        _ptiddata ptd,        pthreadlocinfo ptloci        ){#ifdef _M_IX86    HINSTANCE hKernel32 = _crt_wait_module_handle(_KERNEL32);#endif  /* _M_IX86 */    ptd->_pxcptacttab = (void *)_XcptActTab;    ptd->_holdrand = 1L;#ifdef _M_IX86    if (hKernel32 != NULL)    {        // Initialize the function pointers in the ptd data        ptd->_encode_ptr = GetProcAddress(hKernel32, _ENCODE_POINTER);        ptd->_decode_ptr = GetProcAddress(hKernel32, _DECODE_POINTER);    }#endif  /* _M_IX86 */    // It is necessary to always have GLOBAL_LOCALE_BIT set in perthread data    // because when doing bitwise or, we won't get __UPDATE_LOCALE to work when    // global per thread locale is set.    ptd->_ownlocale = _GLOBAL_LOCALE_BIT;    // Initialize _setloc_data. These are the only valuse that need to be    // initialized.    ptd->_setloc_data._cachein[0]='C';    ptd->_setloc_data._cacheout[0]='C';    ptd->ptmbcinfo = &__initialmbcinfo;    _mlock(_MB_CP_LOCK);    __try    {        InterlockedIncrement(&(ptd->ptmbcinfo->refcount));    }    __finally    {        _munlock(_MB_CP_LOCK);    }    // We need to make sure that ptd->ptlocinfo in never NULL, this saves us    // perf counts when UPDATING locale.    _mlock(_SETLOCALE_LOCK);    __try {        ptd->ptlocinfo = ptloci;        /*         * Note that and caller to _initptd could have passed __ptlocinfo, but         * that will be a bug as between the call to _initptd and __addlocaleref         * the global locale may have changed and ptloci may be pointing to invalid         * memory. Thus if the wants to set the locale to global, NULL should         * be passed.         */        if (ptd->ptlocinfo == NULL)            ptd->ptlocinfo = __ptlocinfo;        __addlocaleref(ptd->ptlocinfo);    }    __finally {        _munlock(_SETLOCALE_LOCK);    }}

      从上面调用可知道,ptloci传进来的是0,所以ptd->ptlocinfo = __ptlocinfo,看其定义:

pthreadlocinfo __ptlocinfo = &__initiallocinfo;

      再看:

typedef struct threadlocaleinfostruct * pthreadlocinfo;
typedef struct threadlocaleinfostruct {        int refcount;        unsigned int lc_codepage;        unsigned int lc_collate_cp;        unsigned long lc_handle[6]; /* LCID */        LC_ID lc_id[6];        struct {            char *locale;            wchar_t *wlocale;            int *refcount;            int *wrefcount;        } lc_category[6];        int lc_clike;        int mb_cur_max;        int * lconv_intl_refcount;        int * lconv_num_refcount;        int * lconv_mon_refcount;        struct lconv * lconv;        int * ctype1_refcount;        unsigned short * ctype1;        const unsigned short * pctype;        const unsigned char * pclmap;        const unsigned char * pcumap;        struct __lc_time_data * lc_time_curr;} threadlocinfo;

      看到pctype了,跟之前直接找定义是同一个地方的,只是上面用了倒推的方法一步一步找。对于数据,直接可以看到:

threadlocinfo __initiallocinfo = {    1,                                        /* refcount                 */    _CLOCALECP,                               /* lc_codepage              */    _CLOCALECP,                               /* lc_collate_cp            */    {   _CLOCALEHANDLE,                       /* lc_handle[_ALL]          */        _CLOCALEHANDLE,                       /* lc_handle[_COLLATE]      */        _CLOCALEHANDLE,                       /* lc_handle[_CTYPE]        */        _CLOCALEHANDLE,                       /* lc_handle[_MONETARY]     */        _CLOCALEHANDLE,                       /* lc_handle[_NUMERIC]      */        _CLOCALEHANDLE                        /* lc_handle[_TIME]         */    },    {   {0, 0, 0},                            /* lc_id[LC_ALL]            */        {0, 0, 0},                            /* lc_id[LC_COLLATE]        */        {0, 0, 0},                            /* lc_id[LC_CTYPE]          */        {0, 0, 0},                            /* lc_id[LC_MONETARY]       */        {0, 0, 0},                            /* lc_id[LC_NUMERIC]        */        {0, 0, 0}                             /* lc_id[LC_TIME]           */    },    {   {NULL, NULL, NULL, NULL},             /* lc_category[LC_ALL]      */        {__clocalestr, NULL, NULL, NULL},     /* lc_category[LC_COLLATE]  */        {__clocalestr, NULL, NULL, NULL},     /* lc_category[LC_CTYPE]    */        {__clocalestr, NULL, NULL, NULL},     /* lc_category[LC_MONETARY] */        {__clocalestr, NULL, NULL, NULL},     /* lc_category[LC_NUMERIC]  */        {__clocalestr, NULL, NULL, NULL}      /* lc_category[LC_TIME]     */    },    1,                                        /* lc_clike                 */    1,                                        /* mb_cur_max               */    NULL,                                     /* lconv_intl_refcount      */    NULL,                                     /* lconv_num_refcount       */    NULL,                                     /* lconv_mon_refcount       */    &__lconv_c,                               /* lconv                    */    NULL,                                     /* ctype1_refcount          */    NULL,                                     /* ctype1                   */    __newctype + 128,                         /* pctype                   */    __newclmap + 128,                         /* pclmap                   */    __newcumap + 128,                         /* pcumap                   */    &__lc_time_c,                             /* lc_time_curr             */};

      数一数结构,pctype对应的正好是__newctype+128。那么,一切都清楚了!

 

      跟踪的过程,是剖析实现的过程,追到windowsAPI,有助到了解windows的接口和原理。

 

下面英文部分是字符转换函数要求描述,摘自C89标准文档说明

4.3.2 Character case mapping functions

4.3.2.1 The tolower functionSynopsis        

#include <ctype.h>        

int tolower(int c);

Description   The tolower function converts an upper-case letter to thecorresponding lower-case letter.Returns   If the argument is an upper-case letter, the tolower functionreturns the corresponding lower-case letter if there is one; otherwisethe argument is returned unchanged.  In the C locale, tolower mapsonly the characters for which isupper is true to the correspondingcharacters for which islower is true.

      微软的实现应该是直接用汇编:

extern "C" int __cdecl tolower (        int c        ){70195760  mov         edi,edi 70195762  push        ebp  70195763  mov         ebp,esp 70195765  push        ecx      if (__locale_changed == 0)70195766  cmp         dword ptr [___locale_changed (702362C8h)],0 7019576D  jne         tolower+33h (70195793h)     {        return __ascii_towlower(c);7019576F  cmp         dword ptr [c],41h 70195773  jl          tolower+26h (70195786h) 70195775  cmp         dword ptr [c],5Ah 70195779  jg          tolower+26h (70195786h) 7019577B  mov         eax,dword ptr [c] 7019577E  add         eax,20h 70195781  mov         dword ptr [ebp-4],eax 70195784  jmp         tolower+2Ch (7019578Ch) 70195786  mov         ecx,dword ptr [c] 70195789  mov         dword ptr [ebp-4],ecx 7019578C  mov         eax,dword ptr [ebp-4] 7019578F  jmp         tolower+41h (701957A1h)     }    else70195791  jmp         tolower+41h (701957A1h)     {        return _tolower_l(c, NULL);70195793  push        0    70195795  mov         edx,dword ptr [c] 70195798  push        edx  70195799  call        _tolower_l (70195580h) 7019579E  add         esp,8     }}701957A1  mov         esp,ebp 701957A3  pop         ebp  701957A4  ret   


 

4.3.2.2 The toupper functionSynopsis        

#include <ctype.h>        

int toupper(int c);

Description   The toupper function converts a lower-case letter to the corresponding upper-case letter.  Returns   If the argument is a lower-case letter, the toupper functionreturns the corresponding upper-case letter if there is one; otherwisethe argument is returned unchanged.  In the C locale, toupper mapsonly the characters for which islower is true to the correspondingcharacters for which isupper is true.

 

 

原创粉丝点击