String类源码简析(上 源码行数1~1904)
来源:互联网 发布:初中微机考试模拟软件 编辑:程序博客网 时间:2024/04/30 16:30
String类源码简析(上 源码行数1~1904):
类声明:
public final class String implements java.io.Serializable, Comparable<String>, CharSequence
要求实现的三个接口:
- java.io.Serializable: 要求String类实现序列化
- Comparable:提供可以和自身(String类的两个对象)比较的规则,需要实现public int compareTo(T o)方法,T为String
- CharSequence:提供对字符序列的一些‘只读’属性,需要实现:
- int length():返回该字符序列的长度
- char charAt(int index):返回该字符序列下标为index的字符
- CharSequence subSequence(int start, int end):返回该字符序列的一个子集(从start开始,到end结束)
- public String toString():返回该字符序列的String表示法(当然对于今天的String类来说他的表示法就是他自己)
变量:
/** The value is used for character storage. */ private final char value[]; /** The offset is the first index of the storage that is used. */ private final int offset; /** The count is the number of characters in the String. */ private final int count; /** Cache the hash code for the string */ private int hash; // Default to 0
1.value: 从这里我们可以得到String类的本质是字符数组,并且该字符数组有final修饰。
注意final修饰只是表示value这个索引不能变,但是实际的数组里面的元素可以变;
但是就String类的value变量而言,因为这是private类型,而String也没有给我们提供直接改变数组的接口,所以对于String类来说,我们可以把它理解为是:’一出生就不可变的’.
2.offset:返回实际存储的第一个字符的下标
这个我的理解是:value是个字符数组,假设长度为100,而我需要用10个槽,正常情况肯定是放到value[0]~[9];但如果有特殊需求要放在其他槽里面,把前面的槽都空出来,那就需要用到offset来指明实际存放的第一个槽的下标。 (小弟找了好多帖子都没有找到答案,这是我自己的理解,如果有不对之处请发我邮箱:770486267@qq.com)
3.count:实际存储的长度。
4.hash:hash值,默认是0
5.还有一个没见过的:serialPersistentFields ,和序列化有关,(以后记得补上)
** * Class String is special cased within the Serialization Stream Protocol. * * A String instance is written initially into an ObjectOutputStream in the * following format: * <pre> * <code>TC_STRING</code> (utf String) * </pre> * The String is written by method <code>DataOutput.writeUTF</code>. * A new handle is generated to refer to all future references to the * string instance within the stream. */ private static final ObjectStreamField[] serialPersistentFields = new ObjectStreamField[0];
构造器:
参数中含有char[]
1.构造一个空字符串:
/** * Initializes a newly created {@code String} object so that it represents * an empty character sequence. Note that use of this constructor is * unnecessary since Strings are immutable. */ public String() { this.offset = 0; this.count = 0; this.value = new char[0]; }
2.用另一个字符串来初始化。
/** * Initializes a newly created {@code String} object so that it represents * the same sequence of characters as the argument; in other words, the * newly created string is a copy of the argument string. Unless an * explicit copy of {@code original} is needed, use of this constructor is * unnecessary since Strings are immutable. * * @param original * A {@code String} */ public String(String original) { int size = original.count; char[] originalValue = original.value; char[] v; if (originalValue.length > size) { // The array representing the String is bigger than the new // String itself. Perhaps this constructor is being called // in order to trim the baggage, so make a copy of the array. int off = original.offset; v = Arrays.copyOfRange(originalValue, off, off+size); } else { // The array representing the String is the same // size as the String, so no point in making a copy. v = originalValue; } this.offset = 0; this.count = size; this.value = v; }
问题: 为什么original对象可以直接访问到他的私有变量count,value?
注意:private的访问权限是本类,不是本对象。
说明:
size为参数字符串的长度,originalValue是参数字符串的value,v是实际用来初始化本对象的value。
为什么需要这个if语句?就像我前面分析的,对于非正常情况,参数字符串的实际存储并不是从0开始存的;如果是从0开始存的,那么value数组的长度等于size,而不是从0开始存的,那value数组的长度大于size,一旦大于的话就需要通过offset来确定具体的字符串的存储位置。
而关于Arrays.copyOfRange的用法(源码自己去看,以后分析到Arrays类的时候会细看;这里可以提一句Arrays的各种copy方法实际调用的都是System.arraycopy()方法,这个方法已经是native方法,详细信息看前面的System类源码分析):
Arrays.copyOfRange(T[ ] original,int from,int to)将一个原始的数组original,从小标from开始复制,复制到小标to,生成一个新的数组。注意这里包括下标from,不包括下标to。
3.用字符数组来初始化:
/** * Allocates a new {@code String} so that it represents the sequence of * characters currently contained in the character array argument. The * contents of the character array are copied; subsequent modification of * the character array does not affect the newly created string. * * @param value * The initial value of the string */ public String(char value[]) { int size = value.length; this.offset = 0; this.count = size; this.value = Arrays.copyOf(value, size); }
4.用字符数组的一部分来初始化,第二个参数为要初始化字符串的开始,第三个参数为字符串长度:
/** * Allocates a new {@code String} that contains characters from a subarray * of the character array argument. The {@code offset} argument is the * index of the first character of the subarray and the {@code count} * argument specifies the length of the subarray. The contents of the * subarray are copied; subsequent modification of the character array does * not affect the newly created string. * * @param value * Array that is the source of characters * * @param offset * The initial offset * * @param count * The length * * @throws IndexOutOfBoundsException * If the {@code offset} and {@code count} arguments index * characters outside the bounds of the {@code value} array */ public String(char value[], int offset, int count) { if (offset < 0) { throw new StringIndexOutOfBoundsException(offset); } if (count < 0) { throw new StringIndexOutOfBoundsException(count); } // Note: offset or count might be near -1>>>1. if (offset > value.length - count) { throw new StringIndexOutOfBoundsException(offset + count); } this.offset = 0; this.count = count; this.value = Arrays.copyOfRange(value, offset, offset+count); }
5.用Unicode数组的一部分来初始化。(虽然是int,但是实质是ASCII码)
/** * Allocates a new {@code String} that contains characters from a subarray * of the Unicode code point array argument. The {@code offset} argument * is the index of the first code point of the subarray and the * {@code count} argument specifies the length of the subarray. The * contents of the subarray are converted to {@code char}s; subsequent * modification of the {@code int} array does not affect the newly created * string. * * @param codePoints * Array that is the source of Unicode code points * * @param offset * The initial offset * * @param count * The length * * @throws IllegalArgumentException * If any invalid Unicode code point is found in {@code * codePoints} * * @throws IndexOutOfBoundsException * If the {@code offset} and {@code count} arguments index * characters outside the bounds of the {@code codePoints} array * * @since 1.5 */ public String(int[] codePoints, int offset, int count) { if (offset < 0) { throw new StringIndexOutOfBoundsException(offset); } if (count < 0) { throw new StringIndexOutOfBoundsException(count); } // Note: offset or count might be near -1>>>1. if (offset > codePoints.length - count) { throw new StringIndexOutOfBoundsException(offset + count); } int expansion = 0; int margin = 1; char[] v = new char[count + margin]; int x = offset; int j = 0; for (int i = 0; i < count; i++) { int c = codePoints[x++]; if (c < 0) { throw new IllegalArgumentException(); } if (margin <= 0 && (j+1) >= v.length) { if (expansion == 0) { expansion = (((-margin + 1) * count) << 10) / i; expansion >>= 10; if (expansion <= 0) { expansion = 1; } } else { expansion *= 2; } int newLen = Math.min(v.length+expansion, count*2); margin = (newLen - v.length) - (count - i); v = Arrays.copyOf(v, newLen); } if (c < Character.MIN_SUPPLEMENTARY_CODE_POINT) { v[j++] = (char) c; } else if (c <= Character.MAX_CODE_POINT) { Character.toSurrogates(c, v, j); j += 2; margin--; } else { throw new IllegalArgumentException(); } } this.offset = 0; this.value = v; this.count = j; }
6.一个私有的构造器,可以提升速度
// Package private constructor which shares value array for speed. String(int offset, int count, char value[]) { this.value = value; this.offset = offset; this.count = count; }
第一个参数虽然是int的数组,但是该数组的各个元素必须大于0,并且实际存储的也是ASCII码,(比如int[0] = 49,那字符串显示才是1)
参数中含有byte[]或short[]或int[]或long[]
先来一个检查函数:用来判断三个参数是否符合要求
private static void checkBounds(byte[] bytes, int offset, int length) { if (length < 0) throw new StringIndexOutOfBoundsException(length); if (offset < 0) throw new StringIndexOutOfBoundsException(offset); if (offset > bytes.length - length) throw new StringIndexOutOfBoundsException(offset + length); }
6.用指定的字符集解码,将一个byte[]数组的下标为offset开始,长度为length的子数组构造成一个字符串
/** * Constructs a new {@code String} by decoding the specified subarray of * bytes using the specified charset. The length of the new {@code String} * is a function of the charset, and hence may not be equal to the length * of the subarray. * * <p> The behavior of this constructor when the given bytes are not valid * in the given charset is unspecified. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @param offset * The index of the first byte to decode * * @param length * The number of bytes to decode * @param charsetName * The name of a supported {@linkplain java.nio.charset.Charset * charset} * * @throws UnsupportedEncodingException * If the named charset is not supported * * @throws IndexOutOfBoundsException * If the {@code offset} and {@code length} arguments index * characters outside the bounds of the {@code bytes} array * * @since JDK1.1 */ public String(byte bytes[], int offset, int length, String charsetName) throws UnsupportedEncodingException { if (charsetName == null) throw new NullPointerException("charsetName"); checkBounds(bytes, offset, length); char[] v = StringCoding.decode(charsetName, bytes, offset, length); this.offset = 0; this.count = v.length; this.value = v; }
7.上一个方法的差不多版本,只不过是将字符集编码的类型确定位Charset
/** * Constructs a new {@code String} by decoding the specified subarray of * bytes using the specified {@linkplain java.nio.charset.Charset charset}. * The length of the new {@code String} is a function of the charset, and * hence may not be equal to the length of the subarray. * * <p> This method always replaces malformed-input and unmappable-character * sequences with this charset's default replacement string. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @param offset * The index of the first byte to decode * * @param length * The number of bytes to decode * * @param charset * The {@linkplain java.nio.charset.Charset charset} to be used to * decode the {@code bytes} * * @throws IndexOutOfBoundsException * If the {@code offset} and {@code length} arguments index * characters outside the bounds of the {@code bytes} array * * @since 1.6 */ public String(byte bytes[], int offset, int length, Charset charset) { if (charset == null) throw new NullPointerException("charset"); checkBounds(bytes, offset, length); char[] v = StringCoding.decode(charset, bytes, offset, length); this.offset = 0; this.count = v.length; this.value = v; }
8.用默认的字符解码集来解码byte[]
/** * Constructs a new {@code String} by decoding the specified array of bytes * using the specified {@linkplain java.nio.charset.Charset charset}. The * length of the new {@code String} is a function of the charset, and hence * may not be equal to the length of the byte array. * * <p> The behavior of this constructor when the given bytes are not valid * in the given charset is unspecified. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @param charsetName * The name of a supported {@linkplain java.nio.charset.Charset * charset} * * @throws UnsupportedEncodingException * If the named charset is not supported * * @since JDK1.1 */ public String(byte bytes[], String charsetName) throws UnsupportedEncodingException { this(bytes, 0, bytes.length, charsetName); } /** * Constructs a new {@code String} by decoding the specified array of * bytes using the specified {@linkplain java.nio.charset.Charset charset}. * The length of the new {@code String} is a function of the charset, and * hence may not be equal to the length of the byte array. * * <p> This method always replaces malformed-input and unmappable-character * sequences with this charset's default replacement string. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @param charset * The {@linkplain java.nio.charset.Charset charset} to be used to * decode the {@code bytes} * * @since 1.6 */ public String(byte bytes[], Charset charset) { this(bytes, 0, bytes.length, charset); } /** * Constructs a new {@code String} by decoding the specified subarray of * bytes using the platform's default charset. The length of the new * {@code String} is a function of the charset, and hence may not be equal * to the length of the subarray. * * <p> The behavior of this constructor when the given bytes are not valid * in the default charset is unspecified. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @param offset * The index of the first byte to decode * * @param length * The number of bytes to decode * * @throws IndexOutOfBoundsException * If the {@code offset} and the {@code length} arguments index * characters outside the bounds of the {@code bytes} array * * @since JDK1.1 */ public String(byte bytes[], int offset, int length) { checkBounds(bytes, offset, length); char[] v = StringCoding.decode(bytes, offset, length); this.offset = 0; this.count = v.length; this.value = v; }
9.将byte[]全部转成字符串
/** * Constructs a new {@code String} by decoding the specified array of bytes * using the platform's default charset. The length of the new {@code * String} is a function of the charset, and hence may not be equal to the * length of the byte array. * * <p> The behavior of this constructor when the given bytes are not valid * in the default charset is unspecified. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @since JDK1.1 */ public String(byte bytes[]) { this(bytes, 0, bytes.length); }
用StringBuffer和StringBuild来初始化字符串
/** * Allocates a new string that contains the sequence of characters * currently contained in the string buffer argument. The contents of the * string buffer are copied; subsequent modification of the string buffer * does not affect the newly created string. * * @param buffer * A {@code StringBuffer} */ public String(StringBuffer buffer) { String result = buffer.toString(); this.value = result.value; this.count = result.count; this.offset = result.offset; } /** * Allocates a new string that contains the sequence of characters * currently contained in the string builder argument. The contents of the * string builder are copied; subsequent modification of the string builder * does not affect the newly created string. * * <p> This constructor is provided to ease migration to {@code * StringBuilder}. Obtaining a string from a string builder via the {@code * toString} method is likely to run faster and is generally preferred. * * @param builder * A {@code StringBuilder} * * @since 1.5 */ public String(StringBuilder builder) { String result = builder.toString(); this.value = result.value; this.count = result.count; this.offset = result.offset; }
常用的方法:
1.length():得到字符串的长度
public int length() { return count; }
2.isEmpty():判断是否为空:
public boolean isEmpty() { return count == 0; }
3.charAt(int index):返回下标为index的字符
public char charAt(int index) { if ((index < 0) || (index >= count)) { throw new StringIndexOutOfBoundsException(index); } return value[index + offset]; }
4.getChars(char dst[], int dstBegin):将字符串赋值到字符数组中,位置为desBegin的开始处
void getChars(char dst[], int dstBegin) { System.arraycopy(value, offset, dst, dstBegin, count); }
5. equals(Object anObject):
public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String)anObject; int n = count; if (n == anotherString.count) { char v1[] = value; char v2[] = anotherString.value; int i = offset; int j = anotherString.offset; while (n-- != 0) { if (v1[i++] != v2[j++]) return false; } return true; } } return false; }
先比较两个引用的指向是否一样
再判断该Object是不是字符串,如果是,比较他们的内容是否一样。
6.equalsIgnoreCase(String anotherString):忽略大小写比较是否相等
public boolean equalsIgnoreCase(String anotherString) { return (this == anotherString) ? true : (anotherString != null) && (anotherString.count == count) && regionMatches(true, 0, anotherString, 0, count); }
7.compareTo(String anotherString):定义两个字符串之间如何比较大小
public int compareTo(String anotherString) { int len1 = count; int len2 = anotherString.count; int n = Math.min(len1, len2); char v1[] = value; char v2[] = anotherString.value; int i = offset; int j = anotherString.offset; if (i == j) { int k = i; int lim = n + i; while (k < lim) { char c1 = v1[k]; char c2 = v2[k]; if (c1 != c2) { return c1 - c2; } k++; } } else { while (n-- != 0) { char c1 = v1[i++]; char c2 = v2[j++]; if (c1 != c2) { return c1 - c2; } } } return len1 - len2; }
即先比较第一个元素的ASCII码,若想等再比较第二个。。。
8.startsWith(String prefix)/endsWith(String suffix):判断字符串是否以prefix这个字符串位开头/判断字符串是否以suffix这个字符串结尾
public boolean startsWith(String prefix) { return startsWith(prefix, 0); } public boolean endsWith(String suffix) { return startsWith(suffix, count - suffix.count); }
说实话我更推荐用正则
9.indexOf(String str):返回str字符串第一次出现的索引
public int indexOf(String str) { return indexOf(str, 0); }
其他部分方法:
各种get方法:
1.getChars(int srcBegin, int srcEnd, char dst[], int dstBegin):将部分字符串赋值到字符数组中。
public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) { if (srcBegin < 0) { throw new StringIndexOutOfBoundsException(srcBegin); } if (srcEnd > count) { throw new StringIndexOutOfBoundsException(srcEnd); } if (srcBegin > srcEnd) { throw new StringIndexOutOfBoundsException(srcEnd - srcBegin); } System.arraycopy(value, offset + srcBegin, dst, dstBegin, srcEnd - srcBegin); }
2.getBytes(String charsetName)/public byte[] getBytes(Charset charset)/ getBytes() :按照charsetName/charset/默认 这种解码方式返回字节数组
public byte[] getBytes(String charsetName) throws UnsupportedEncodingException { if (charsetName == null) throw new NullPointerException(); return StringCoding.encode(charsetName, value, offset, count); }public byte[] getBytes(Charset charset) { if (charset == null) throw new NullPointerException(); return StringCoding.encode(charset, value, offset, count); }public byte[] getBytes() { return StringCoding.encode(value, offset, count); }
3.contentEquals(CharSequence cs):和字符序列比较是否一样
public boolean contentEquals(CharSequence cs) { if (count != cs.length()) return false; // Argument is a StringBuffer, StringBuilder if (cs instanceof AbstractStringBuilder) { char v1[] = value; char v2[] = ((AbstractStringBuilder)cs).getValue(); int i = offset; int j = 0; int n = count; while (n-- != 0) { if (v1[i++] != v2[j++]) return false; } } // Argument is a String if (cs.equals(this)) return true; // Argument is a generic CharSequence char v1[] = value; int i = offset; int j = 0; int n = count; while (n-- != 0) { if (v1[i++] != cs.charAt(j++)) return false; } return true; }
4.忽略大小写来比较大小的一个内部类+变量+方法
public static final Comparator<String> CASE_INSENSITIVE_ORDER = new CaseInsensitiveComparator(); private static class CaseInsensitiveComparator implements Comparator<String>, java.io.Serializable { // use serialVersionUID from JDK 1.2.2 for interoperability private static final long serialVersionUID = 8575799808933029326L; public int compare(String s1, String s2) { int n1=s1.length(), n2=s2.length(); for (int i1=0, i2=0; i1<n1 && i2<n2; i1++, i2++) { char c1 = s1.charAt(i1); char c2 = s2.charAt(i2); if (c1 != c2) { c1 = Character.toUpperCase(c1); c2 = Character.toUpperCase(c2); if (c1 != c2) { c1 = Character.toLowerCase(c1); c2 = Character.toLowerCase(c2); if (c1 != c2) { return c1 - c2; } } } } return n1 - n2; } } /** * Compares two strings lexicographically, ignoring case * differences. This method returns an integer whose sign is that of * calling <code>compareTo</code> with normalized versions of the strings * where case differences have been eliminated by calling * <code>Character.toLowerCase(Character.toUpperCase(character))</code> on * each character. * <p> * Note that this method does <em>not</em> take locale into account, * and will result in an unsatisfactory ordering for certain locales. * The java.text package provides <em>collators</em> to allow * locale-sensitive ordering. * * @param str the <code>String</code> to be compared. * @return a negative integer, zero, or a positive integer as the * specified String is greater than, equal to, or less * than this String, ignoring case considerations. * @see java.text.Collator#compare(String, String) * @since 1.2 */ public int compareToIgnoreCase(String str) { return CASE_INSENSITIVE_ORDER.compare(this, str); }
5.startsWith(String prefix, int toffset):判断字符串的前toffset是不是prefix
public boolean startsWith(String prefix, int toffset) { char ta[] = value; int to = offset + toffset; char pa[] = prefix.value; int po = prefix.offset; int pc = prefix.count; // Note: toffset might be near -1>>>1. if ((toffset < 0) || (toffset > count - pc)) { return false; } while (--pc >= 0) { if (ta[to++] != pa[po++]) { return false; } } return true; }
6.得到hashCode。(公式:s[0]*31^(n-1) + s[1]*31^(n-2) + … + s[n-1])
public int hashCode() { int h = hash; if (h == 0) { int off = offset; char val[] = value; int len = count; for (int i = 0; i < len; i++) { h = 31*h + val[off++]; } hash = h; } return h; }
7.indexOf和lastIndexOf的核心代码
static int indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) { if (fromIndex >= sourceCount) { return (targetCount == 0 ? sourceCount : -1); } if (fromIndex < 0) { fromIndex = 0; } if (targetCount == 0) { return fromIndex; } char first = target[targetOffset]; int max = sourceOffset + (sourceCount - targetCount); for (int i = sourceOffset + fromIndex; i <= max; i++) { /* Look for first character. */ if (source[i] != first) { while (++i <= max && source[i] != first); } /* Found first character, now look at the rest of v2 */ if (i <= max) { int j = i + 1; int end = j + targetCount - 1; for (int k = targetOffset + 1; j < end && source[j] == target[k]; j++, k++); if (j == end) { /* Found whole string. */ return i - sourceOffset; } } } return -1; } static int lastIndexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) { /* * Check arguments; return immediately where possible. For * consistency, don't check for null str. */ int rightIndex = sourceCount - targetCount; if (fromIndex < 0) { return -1; } if (fromIndex > rightIndex) { fromIndex = rightIndex; } /* Empty string always matches. */ if (targetCount == 0) { return fromIndex; } int strLastIndex = targetOffset + targetCount - 1; char strLastChar = target[strLastIndex]; int min = sourceOffset + targetCount - 1; int i = min + fromIndex; startSearchForLastChar: while (true) { while (i >= min && source[i] != strLastChar) { i--; } if (i < min) { return -1; } int j = i - 1; int start = j - (targetCount - 1); int k = strLastIndex - 1; while (j > start) { if (source[j--] != target[k--]) { i--; continue startSearchForLastChar; } } return start - sourceOffset + 1; } }
- String类源码简析(上 源码行数1~1904)
- String 源码简析
- String源码简析(下)
- Linux源码规模(代码行数)研究
- 统计代码行数源码
- 统计代码行数源码
- 统计源码行数命令:
- 【Java源码】String类
- String 类 源码
- 读源码String类
- String类源码学习
- String类源码解析
- Java源码阅读之String(1)
- String 源码
- String源码
- String 源码
- 【源码】String
- Eclipse统计工程源码行数
- 单片机原理(1):基本结构
- CentOS 7下MySQL服务启动失败的解决思路
- ApacheBench网站压力测试步骤
- C#版浅谈三层
- 返回参数二进制中 1 的个数
- String类源码简析(上 源码行数1~1904)
- 第十三周 查找(一) 项目一 验证算法(1)
- log4j配置说明和示例
- POJ-1947-Rebuilding Roads
- Python简化类例六:另一种变量赋值取值的写法
- STL标准库Vector
- Vim配色 256色色表
- CSS简史
- hbase rest 源码解析 对象与字符串的互转