GNU-LIBC源码学习之strlen
来源:互联网 发布:阿里云企业邮箱客户端 编辑:程序博客网 时间:2024/06/08 08:49
问题来源于一道很小的编程题,要求写出strlen的函数。
很来很容易
str为字符指针 int cnt=0; while(*str != '\0') cnt++; return cnt;
在vs2008上顺利执行,但是在题目系统中出问题
题目给的要求是
1000ms 10000K
那就是这种算法超时了,
想想算距离,除了这种计数方法,还有什么
还有就是两绝对地址相减,地址相减得到绝对值差/类型长度
所以修改代码为如下
char* base=str; while(*str++); return (str - base -1);
直接找到字符串的结束地址,然后首尾相减。
这种算法相对于前一种,少执行一个计数累加,并且随着字符串长度的增加,效果越好。
编译提交后,符合时间要求。
基于此,要研究一下GNU libc是怎么写这个函数的
不看不知道 一看吓一跳 先贴出源码吧
/* Copyright (C) 1991-2015 Free Software Foundation, Inc. This file is part of the GNU C Library. Written by Torbjorn Granlund (tege@sics.se), with help from Dan Sahlin (dan@sics.se); commentary by Jim Blandy (jimb@ai.mit.edu). The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, see <http://www.gnu.org/licenses/>. */#include <string.h>#include <stdlib.h>#undef strlen/* Return the length of the null-terminated string STR. Scan for the null terminator quickly by testing four bytes at a time. */size_tstrlen (const char *str){ const char *char_ptr; const unsigned long int *longword_ptr; unsigned long int longword, himagic, lomagic; /* Handle the first few characters by reading one character at a time. Do this until CHAR_PTR is aligned on a longword boundary. */ for (char_ptr = str; ((unsigned long int) char_ptr & (sizeof (longword) - 1)) != 0; ++char_ptr) if (*char_ptr == '\0') return char_ptr - str; /* All these elucidatory comments refer to 4-byte longwords, but the theory applies equally well to 8-byte longwords. */ longword_ptr = (unsigned long int *) char_ptr; /* Bits 31, 24, 16, and 8 of this number are zero. Call these bits the "holes." Note that there is a hole just to the left of each byte, with an extra at the end: bits: 01111110 11111110 11111110 11111111 bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD The 1-bits make sure that carries propagate to the next 0-bit. The 0-bits provide holes for carries to fall into. */ himagic = 0x80808080L; lomagic = 0x01010101L; if (sizeof (longword) > 4) { /* 64-bit version of the magic. */ /* Do the shift in two steps to avoid a warning if long has 32 bits. */ himagic = ((himagic << 16) << 16) | himagic; lomagic = ((lomagic << 16) << 16) | lomagic; } if (sizeof (longword) > 8) abort (); /* Instead of the traditional loop which tests each character, we will test a longword at a time. The tricky part is testing if *any of the four* bytes in the longword in question are zero. */ for (;;) { longword = *longword_ptr++; if (((longword - lomagic) & ~longword & himagic) != 0) { /* Which of the bytes was the zero? If none of them were, it was a misfire; continue the search. */ const char *cp = (const char *) (longword_ptr - 1); if (cp[0] == 0) return cp - str; if (cp[1] == 0) return cp - str + 1; if (cp[2] == 0) return cp - str + 2; if (cp[3] == 0) return cp - str + 3; if (sizeof (longword) > 4) { if (cp[4] == 0) return cp - str + 4; if (cp[5] == 0) return cp - str + 5; if (cp[6] == 0) return cp - str + 6; if (cp[7] == 0) return cp - str + 7; } } }}libc_hidden_builtin_def (strlen)
在阅读之前向他们三人致敬,表示感谢
This file is part of the GNU C Library.
Written by Torbjorn Granlund (tege@sics.se),
with help from Dan Sahlin (dan@sics.se);
commentary by Jim Blandy (jimb@ai.mit.edu).
膜拜一下吧。颤抖吧
字节序是个很好的东西,32位cpu如果在内存对齐的情况下,一次能读四个字节,这个特性就可以利用起来。
第一个代码块
/* Handle the first few characters by reading one character at a time. Do this until CHAR_PTR is aligned on a longword boundary. */ for (char_ptr = str; ((unsigned long int) char_ptr & (sizeof (longword) - 1)) != 0; ++char_ptr) if (*char_ptr == '\0') return char_ptr - str;
处理没对齐的字符串部分
什么是对齐?
如果是是4字节对齐,那么其地址就要是4字节的整数倍。
地址应为0、4、8、12、等
这些地址的后两个bit要为0,在linux0.12内核中,linus在做字节对齐,这样处理的,比如要4字节对齐
add&0x3即可,所以这里的方法是一直的。
这里加了if语句,是防止,字符串的大小还没有4字节,那就直接输出结果吧。
第二个代码块
/* All these elucidatory comments refer to 4-byte longwords, but the theory applies equally well to 8-byte longwords. */ longword_ptr = (unsigned long int *) char_ptr;
字节对齐后,取新的对齐后的地址,付给指针longword_ptr
第三个代码块
if (sizeof (longword) > 4) { /* 64-bit version of the magic. */ /* Do the shift in two steps to avoid a warning if long has 32 bits. */ himagic = ((himagic << 16) << 16) | himagic; lomagic = ((lomagic << 16) << 16) | lomagic; } if (sizeof (longword) > 8) abort ();
这是处理不是32bit机,或者说,处理一些不是4字节对齐的cpu而言的,不去讨论。
第四个代码块
/* Instead of the traditional loop which tests each character, we will test a longword at a time. The tricky part is testing if *any of the four* bytes in the longword in question are zero. */ for (;;) { longword = *longword_ptr++; if (((longword - lomagic) & ~longword & himagic) != 0) { /* Which of the bytes was the zero? If none of them were, it was a misfire; continue the search. */ const char *cp = (const char *) (longword_ptr - 1);
这里是一个for循环和if判断,基本思想也是,访问数据然后判断,不过是4字节(32位机)访问。
himagic = 0x80808080L; //1000_0000
lomagic = 0x01010101L; //0000_0001
看看if成立的条件
(1)(longword - lomagic)
让每个字节减一,如果不是结束符’\0’去减一,那么是不需要产生借位的,而结束符减一产生借位后,字节高位将会置一,
所以相减后能得到字节高位为1的,就是这几种情况,结束符、本身高位就有一,两种情况,
(2)~longword取反,针对上边两种情况,取反后高位为一的就只有一种情况了,那就是本身高位为0的,就是结束符,
(3) himagic判断每个字节的高位情况,如果是1则结果为1,否则为0
那么(1)&(2)&(3)就完成了,结束符的检测,如果4个字节中有结束符出现,那么结果为真
第五个代码块
const char *cp = (const char *) (longword_ptr - 1); if (cp[0] == 0) return cp - str; if (cp[1] == 0) return cp - str + 1; if (cp[2] == 0) return cp - str + 2; if (cp[3] == 0) return cp - str + 3;
检测到结束符后,然后在将4字节指针转化为char指针,去看看具体哪个字节为结束符,下面就简单了。
- GNU-LIBC源码学习之strlen
- GNU LIBC源码学习之strcmp
- GNU牛人写的strlen()
- Libc学习
- strlen源码
- libc++ tuple源码剖析
- C/C++ 学习之 sizeof & strlen
- libc++ hashtable 源码简析
- Android 源码编译 libc++.so
- strlen源码解析
- strlen源码分析
- strlen源码剖析
- strlen源码剖析
- strlen源码剖析
- strlen源码剖析
- strlen源码剖析
- strlen glibc 源码分析
- strlen源码分析
- Spring中 @Autowired标签与 @Resource标签 的区别
- 知识点——网络
- 监控JVM内存?
- LintCode - 更新二进制位
- Codeforces Round #311 (Div. 2) A B C D
- GNU-LIBC源码学习之strlen
- Leetcode_190_Reverse Bits
- IOS开发-06.提示框
- leetcode 69:Sqrt(x)
- 对类的继承的理解
- Android使用Volley保持与服务器的会话
- java jar 打包命令
- 数组配对(算法)
- PAT 数据结构 08-排序4. The World's Richest (25)