JDK1.8的String笔记

来源：互联网发布：淘宝杂货铺店铺介绍编辑：程序博客网时间：2024/06/04 19:19

java中String类为什么要设计成final呢？

主要是为了“效率” 和 “安全性” 的缘故。若 String允许被继承, 由于它的高度被使用率, 可能会降低程序的性能，所以String被定义成final。

1、不允许其他类继承。这个应该不是最终原因，但这里权且也当成是一个原因。

String类的内部好多方法的实现都不是Java编程语言本身编写的，好多方法都是调用的操作系统本地的API，这就是著名的“本地方法调用”，也只有这样才能做事，这种类是非常底层的，和操作系统交流频繁的，那么如果这种类可以被继承的话，如果我们再把它的方法重写了，往操作系统内部写入一段具有恶意攻击性质的代码什么的，这不就成了核心病毒了么？
2、String类中的成员属性也几乎都设计成了private final的，这样String就被设计成一个不变类，这样有助于共享，提高性能。可以将字符串对象保存在字符串常量池中以供与字面值相同字符串对象共享。如果String对象是可变的，那就不能这样共享，因为一旦对某一个String类型变量引用的对象值改变，将同时改变一起共享字符串对象的其他String类型变量所引用的对象的值。
3、String被设计为不变类，其中的offset，value[]都被设计成private final的，这样在多线程时，对String对象的访问是可以保证安全的。java程序语言的许多特性依赖于不可变的String对象。
上面这些原因列出后发现2和3不是String类设计成final的原因，但总归也是String被设计成不变类的原因。

public final class String    implements java.io.Serializable, Comparable<String>, CharSequence {    /** The value is used for character storage. */    private final char value[];    /** Cache the hash code for the string */    private int hash; // Default to 0    /** use serialVersionUID from JDK 1.0.2 for interoperability */    private static final long serialVersionUID = -6849794470754667710L;

String是以char数组的形式存储在这里的，而且他是final类型的，证明其不可变（immutable），这也就是为什么String 不可变的原因。

构造器

(1)无参构造器

/**     * Initializes a newly created {@code String} object so that it represents     * an empty character sequence.  Note that use of this constructor is     * unnecessary since Strings are immutable.     */    public String() {        this.value = new char[0];    }

将创建一个包含0个字符的字符串序列。
可以看到由于String是不可变的，所以如果使用无参构造器，则完全没有必要！！

（2）String 参数

/**     * Initializes a newly created {@code String} object so that it represents     * the same sequence of characters as the argument; in other Words, the     * newly created string is a copy of the argument string. Unless an     * explicit copy of {@code original} is needed, use of this constructor is     * unnecessary since Strings are immutable.     *     * @param  original     *         A {@code String}     */    public String(String original) {        this.value = original.value;        this.hash = original.hash;    }

可以看到只是将value引用指向original中的value数组，因为两者都是final的，所以这个看来也没那么必要。因为String s1=new String("s1s1"); String s2=new String(s1);这种用法完全没有必要，而不如直接引用，s2=s1;

(3)char[]参数

public String(char value[]) {        this.value = Arrays.copyOf(value, value.length);    }

可以发现当通过char数组构建时，只是将char数组复制到value中，而且是复制，而不是简单的引用相等。

public String(char value[], int offset, int count) {        if (offset < 0) {            throw new StringIndexOutOfBoundsException(offset);        }        if (count < 0) {            throw new StringIndexOutOfBoundsException(count);        }        // Note: offset or count might be near -1>>>1.        if (offset > value.length - count) {            throw new StringIndexOutOfBoundsException(offset + count);        }        this.value = Arrays.copyOfRange(value, offset, offset+count);    }

与上面的区别是，这里只是利用char数组中的一部分来构建String，其中offset代表起始下标，count是所有构建的长度。

(4)byte[]

所谓好的适用性模块，一定是能有一坨坨的各种适应代码的。下面是一系列的利用byte[]数组来构建String对象的构造器，主要差别是可能需要指定特殊的字符集来解码，但是这一点其实在web编程，网络编程中还是很重要的。

public String(byte bytes[], Charset charset) {        this(bytes, 0, bytes.length, charset);    }public String(byte bytes[], int offset, int length) {        checkBounds(bytes, offset, length);        this.value = StringCoding.decode(bytes, offset, length);    }//采用默认的字符集从byte数组中offset开始，长度为length构建String对象public String(byte bytes[]) {        this(bytes, 0, bytes.length);    }public String(byte bytes[], int offset, int length, Charset charset) {        if (charset == null)            throw new NullPointerException("charset");        checkBounds(bytes, offset, length);        this.value =  StringCoding.decode(charset, bytes, offset, length);    }//指定了字符集，起始位置，以及长度

(5)基于StringBuilder,StringBuffer参数

public String(StringBuffer buffer) {        synchronized(buffer) {            this.value = Arrays.copyOf(buffer.getValue(), buffer.length());        }    }//由于不是原子性操作，仍然使用了同步方法synchronized    public String(StringBuilder builder) {        this.value = Arrays.copyOf(builder.getValue(), builder.length());    }

其实与toString()方法，效果一样。更习惯于toString()方法。

五. 重要方法

(1)length()

public int length() {        return value.length;    }

返回字符串中所包含的字符数目，即value数组的长度

(2)isEmpty()

public boolean isEmpty() {        return value.length == 0;    }

判断字符串是否为空，即判断value数组的长度为0即可

(3)charAt(int index)

public char charAt(int index) {        if ((index < 0) || (index >= value.length)) {            throw new StringIndexOutOfBoundsException(index);        }        return value[index];    }

返回第index个字符，即只需要检索value数组即可

(4)getBytes()

public byte[] getBytes() {        return StringCoding.encode(value, 0, value.length);    }    public byte[] getBytes(String charsetName)            throws UnsupportedEncodingException {        if (charsetName == null) throw new NullPointerException();        return StringCoding.encode(charsetName, value, 0, value.length);    }//以指定字符集编码    public byte[] getBytes(Charset charset) {        if (charset == null) throw new NullPointerException();        return StringCoding.encode(charset, value, 0, value.length);    }

String对象转为byte数组

(5)equals()

public boolean equals(Object anObject) {        if (this == anObject) {            return true;        }        if (anObject instanceof String) {            String anotherString = (String)anObject;            int n = value.length;            if (n == anotherString.value.length) {                char v1[] = value;                char v2[] = anotherString.value;                int i = 0;                while (n-- != 0) {                    if (v1[i] != v2[i])                        return false;                    i++;                }                return true;            }        }        return false;    }

可以看到equals方法重写了，会判断两个字符串的每一个字符是否相等。

(6)compareTo(String anotherString)

public int compareTo(String anotherString) {        int len1 = value.length;        int len2 = anotherString.value.length;        int lim = Math.min(len1, len2);        char v1[] = value;        char v2[] = anotherString.value;        int k = 0;        while (k < lim) {            char c1 = v1[k];            char c2 = v2[k];            if (c1 != c2) {                return c1 - c2;            }            k++;        }        return len1 - len2;    }

比较两个字符串的大小。如果两个字符串的字符序列相等，则返回0；不相等时，从两个字符串第0个字符开始比较，返回第一个不相等的字符差。另一种情况，较长的字符串的前面部分恰好是较短的字符串，则返回他们的长度差。

(7)regionMatches(int toffset,String other,int ooffset,int len)

/* @param   toffset   the starting offset of the subregion in this string.     * @param   other     the string argument.     * @param   ooffset   the starting offset of the subregion in the string     *                    argument.     * @param   len       the number of characters to compare.     * @return  {@code true} if the specified subregion of this string     *          exactly matches the specified subregion of the string argument;     *          {@code false} otherwise.     */    public boolean regionMatches(int toffset, String other, int ooffset,            int len) {        char ta[] = value;        int to = toffset;        char pa[] = other.value;        int po = ooffset;        // Note: toffset, ooffset, or len might be near -1>>>1.        if ((ooffset < 0) || (toffset < 0)                || (toffset > (long)value.length - len)                || (ooffset > (long)other.value.length - len)) {            return false;        }        while (len-- > 0) {            if (ta[to++] != pa[po++]) {                return false;            }        }        return true;    }

判断部分子字符串是否相等，主要用来判断一段区间内是否相等。

(8)equalsIgnoreCase(String anotherString)

public boolean equalsIgnoreCase(String anotherString) {        return (this == anotherString) ? true                : (anotherString != null)                && (anotherString.value.length == value.length)                && regionMatches(true, 0, anotherString, 0, value.length);    }    判断两个字符串在忽略大小写的情况下是否相等，主要调用regionMatches方法    public boolean regionMatches(boolean ignoreCase, int toffset,            String other, int ooffset, int len) {        char ta[] = value;        int to = toffset;        char pa[] = other.value;        int po = ooffset;        // Note: toffset, ooffset, or len might be near -1>>>1.        if ((ooffset < 0) || (toffset < 0)                || (toffset > (long)value.length - len)                || (ooffset > (long)other.value.length - len)) {            return false;        }        while (len-- > 0) {            char c1 = ta[to++];            char c2 = pa[po++];            //在这里先行判断，如果相等就直接跳过后面即可，可以提高效率            if (c1 == c2) {                continue;            }            if (ignoreCase) {                // If characters don't match but case may be ignored,                // try converting both characters to uppercase.                // If the results match, then the comparison scan should                // continue.                char u1 = Character.toUpperCase(c1);                char u2 = Character.toUpperCase(c2);                //都转换成大写的形式，如果相等，则跳过                if (u1 == u2) {                    continue;                }                // Unfortunately, conversion to uppercase does not work properly                // for the Georgian alphabet, which has strange rules about case                // conversion.  So we need to make one last check before                // exiting.                if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {                    continue;                }            }            return false;        }        return true;    }

可以看出来这个判断方法并不难，但是每一处代码都是为了提高效率。

(9)startsWith(String prefix, int toffset)

public boolean startsWith(String prefix, int toffset) {        char ta[] = value;        int to = toffset;        char pa[] = prefix.value;        int po = 0;        int pc = prefix.value.length;        // Note: toffset might be near -1>>>1.        if ((toffset < 0) || (toffset > value.length - pc)) {            return false;        }        while (--pc >= 0) {            if (ta[to++] != pa[po++]) {                return false;            }        }        return true;    }

该对象从offset位置算起，是否以prefix开始。

public boolean startsWith(String prefix) {        return startsWith(prefix, 0);    }

判断String是否以prefix字符串开始。

(10)endsWith(String suffix)

public boolean endsWith(String suffix) {        return startsWith(suffix, value.length - suffix.value.length);    }

判断String是否以suffix结尾，可以看到其直接复用了startsWith。

(11)indexOf(int ch)

public int indexOf(int ch) {        return indexOf(ch, 0);    }//可以直接定位ch第一次出现时的下标    通过调用indexOf(int ch,int fromIndex)来实现

public int indexOf(int ch, int fromIndex) {        final int max = value.length;        if (fromIndex < 0) {            fromIndex = 0;        } else if (fromIndex >= max) {            // Note: fromIndex might be near -1>>>1.            return -1;        }                if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {            // handle most cases here (ch is a BMP code point or a            // negative value (invalid code point))            final char[] value = this.value;            for (int i = fromIndex; i < max; i++) {                if (value[i] == ch) {                    return i;                }            }            return -1;        } else {            return indexOfSupplementary(ch, fromIndex);        }    }//找出ch字符在该字符串中从fromIndex开始后第一次出现的位置

而我们应用这个方法时却只是这样应用
   String s="abcdefg";
   int idx=s.indexOf('f');//idx=5
   可见我们并没有直接传入一个int型的参数，而是直接传入char型
   这里其实涉及到了自动类型转换中的，自动提升问题，当把一个表数范围小的数值或变量直接赋给另一个表数范围大的变量时，系统将可以进行自动类型转换。也就是这里char类型自动转为了int型。

(12)lastIndexOf(int ch)

public int lastIndexOf(int ch) {        return lastIndexOf(ch, value.length - 1);    }//找出ch字符在该字符串中最后一次出现的位置public int lastIndexOf(int ch, int fromIndex) {        if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {            // handle most cases here (ch is a BMP code point or a            // negative value (invalid code point))            final char[] value = this.value;            int i = Math.min(fromIndex, value.length - 1);            for (; i >= 0; i--) {                if (value[i] == ch) {                    return i;                }            }            return -1;        } else {            return lastIndexOfSupplementary(ch, fromIndex);        }    }

返回值：在此对象表示的字符序列（小于等于fromIndex）中最后一次出现该字符的索引；如果在该点之前未出现该字符，则返回-1。

(13)indexOf(String str)

public int indexOf(String str) {            return indexOf(str, 0);        } public int indexOf(String str, int fromIndex) {            return indexOf(value, 0, value.length,                    str.value, 0, str.value.length, fromIndex); }

找出str子字符串在该字符串中第一次出现的位置。

最终调用的代码，为下面的代码，这里可能有点乱，但是只要理清楚这几个参数即可理清楚整个过程了。

/* @param   source       the characters being searched.//这里就是value数组         * @param   sourceOffset offset of the source string./ //源字符串的偏移量         * @param   sourceCount  count of the source string.    //这里是value数组的长度         * @param   target       the characters being searched for.  //待搜索目标字符串         * @param   targetOffset offset of the target string.   //待搜索目标字符串的偏移量         * @param   targetCount  count of the target string.   //待搜索目标字符串的长度         * @param   fromIndex    the index to begin searching from. //起始位置         */        static int indexOf(char[] source, int sourceOffset, int sourceCount,            char[] target, int targetOffset, int targetCount,            int fromIndex) {            if (fromIndex >= sourceCount) {//越界了                return (targetCount == 0 ? sourceCount : -1);            }            if (fromIndex < 0) {                fromIndex = 0;            }            if (targetCount == 0) {                return fromIndex;            }            char first = target[targetOffset];//待搜索字符串第一个字符            int max = sourceOffset + (sourceCount - targetCount);//搜索第一个匹配的字符时所能达到的最大值，因为要保证后面的长度>=targetCount            //下面这里就是核心搜索算法了，会先匹配第一个字符，然后依次向后移，直到完全匹配            //或者是匹配到max仍然没有匹配成功            for (int i = sourceOffset + fromIndex; i <= max; i++) {                /* Look for first character. */                if (source[i] != first) {                    while (++i <= max && source[i] != first);                }                /* Found first character, now look at the rest of v2 */                //可以注意这里i下标只是用来匹配第一个字符，因为有可能部分匹配时，需要从先在匹配                //所以这里重新应用下标j                if (i <= max) {                    int j = i + 1;                    int end = j + targetCount - 1;                    for (int k = targetOffset + 1; j < end && source[j]                            == target[k]; j++, k++);                    if (j == end) {                        /* Found whole string. */                        return i - sourceOffset;                    }                }            }            return -1;        }//当匹配失败时，返回-1

这段搜索匹配的代码写的非常漂亮，代码简洁而且清晰。感觉哪怕分析String源码看到这一段也值了。

(14)lastIndexOf(String str)

public int lastIndexOf(String str) {      return lastIndexOf(str, value.length);//这里fromIndex传入的是value数组的长度，因为要进行的是倒序匹配，表明从最后一个字符开始}

找出str子字符串在该字符串中最后一次出现的位置。.调用的代码如下：

/*Returns the index within this string of the last occurrence of the  * specified substring, searching backward starting at the specified index. * <p>The returned index is the largest value <i>k</i> for which: * <blockquote><pre> * <i>k</i> {@code <=} fromIndex *///  这里说的真绕，也就是要搜索返回的字符串下标要小于等于fromIndex,然后再是其中的最大值, 也就//是正向起始搜索位置最大值为fromIndex,fromIndex为开始反向搜索的索引位置public int lastIndexOf(String str, int fromIndex) {     return lastIndexOf(value, 0, value.length,str.value, 0, str.value.length, fromIndex);   }

最终调用的方法如下，与上面的方法类似，只不过这次是从后搜索，所以匹配也倒着匹配从最后一个字符匹配。

static int lastIndexOf(char[] source, int sourceOffset, int sourceCount,            char[] target, int targetOffset, int targetCount,            int fromIndex) {                /*                 * Check arguments; return immediately where possible. For                 * consistency, don't check for null str.                 */                //第一个字符所能匹配的最大位置，类似于上面的max                int rightIndex = sourceCount - targetCount;                if (fromIndex < 0) {                    return -1;                }                if (fromIndex > rightIndex) {                    fromIndex = rightIndex;                }                /* Empty string always matches. */                if (targetCount == 0) {                    return fromIndex;                }                int strLastIndex = targetOffset + targetCount - 1;//目标字符串最后一个字符下标                char strLastChar = target[strLastIndex];//最后一个字符                int min = sourceOffset + targetCount - 1;//目标字符串最后一个字符所能匹配的源字符串最小下标                int i = min + fromIndex;//这里i下标永远是最后一个字符匹配的下标索引            startSearchForLastChar:                while (true) {                    while (i >= min && source[i] != strLastChar) {                        i--;                    }                    //小于min则不可能在搜索到了                    if (i < min) {                        return -1;                    }                    int j = i - 1;                    int start = j - (targetCount - 1);                    int k = strLastIndex - 1;                    while (j > start) {                        if (source[j--] != target[k--]) {                            //当存在部分匹配，而前半部分不匹配时，跳出当前查找，整体向前窗移                            i--;                            continue startSearchForLastChar;//直接跳到顶层while循环                        }                    }                    return start - sourceOffset + 1;                }            }

可以看到与indexOf方法是对应的，只不过是反向搜索。

在这里是时候来几组实例来学习一下子字符串匹配问题了
            public static void main(String[] args){
                String s1="java java java";
                //indexOf两个匹配问题
                System.out.println(s1.indexOf("java"));//输出0
                System.out.println(s1.indexOf("java",2));//输出5，大于等于2,从2开始搜索起始点
                System.out.println(s1.indexOf("java",9));//输出10，大于等于9，从9开始搜索起始点
            }

            public static void main(String[] args){
                String s1="java java java";
                //接下来是lastIndexOf
                System.out.println(s1.lastIndexOf("java"));//输出为10
                System.out.println(s1.lastIndexOf("java",2));//输出为0，返回值要小于等于2，从2开始，向左边搜索起始点
                System.out.println(s1.lastIndexOf("java",9));//输出为5，返回值要小于等于9，从9开始，向左边搜索起始点
            }

(15)substring(int beginIndex)

public String substring(int beginIndex) {            if (beginIndex < 0) {                throw new StringIndexOutOfBoundsException(beginIndex);            }            int subLen = value.length - beginIndex;            if (subLen < 0) {                throw new StringIndexOutOfBoundsException(subLen);            }            return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);  }

这里要注意的是这个方法是substring，而不是subString;

获取从beginIndex开始到结束的子字符串，而这里返回一个新建的String对象.

/*         * Returns a string that is a substring of this string. The         * substring begins at the specified {@code beginIndex} and         * extends to the character at index {@code endIndex - 1}.         * Thus the length of the substring is {@code endIndex-beginIndex}.         */  public String substring(int beginIndex, int endIndex) {            if (beginIndex < 0) {                throw new StringIndexOutOfBoundsException(beginIndex);            }            if (endIndex > value.length) {                throw new StringIndexOutOfBoundsException(endIndex);            }            int subLen = endIndex - beginIndex;            if (subLen < 0) {                throw new StringIndexOutOfBoundsException(subLen);            }            return ((beginIndex == 0) && (endIndex == value.length)) ? this                    : new String(value, beginIndex, subLen); }

获取从beginIndex位置开始到endIndex位置的子字符串，但是这里不包含endIndex，因为长度为endIndex-beginIndex;

(16)concat(String str)

public String concat(String str) {            int otherLen = str.length();            if (otherLen == 0) {                return this;            }            int len = value.length;            char buf[] = Arrays.copyOf(value, len + otherLen);            str.getChars(buf, len);            return new String(buf, true);    }

将该String对象与str连接在一起，与+运算符功能相同,但是可以看到已经新new一个String对象了，所以对于String对象慎用==，一定要用equals()

这个方法主要调用了getChars(buf,len)方法，而getChars方法只是一个数组复制包装方法;

/**         * Copy characters from this string into dst starting at dstBegin.         * This method doesn't perform any range checking.         */void getChars(char dst[], int dstBegin) {            System.arraycopy(value, 0, dst, dstBegin, value.length);    }

同时他仍然有一个public 调用版本，对外方法

/*         * Copies characters from this string into the destination character         * array.         * <p>         * The first character to be copied is at index {@code srcBegin};         * the last character to be copied is at index {@code srcEnd-1}         * (thus the total number of characters to be copied is         * {@code srcEnd-srcBegin}). The characters are copied into the         * subarray of {@code dst} starting at index {@code dstBegin}         * and ending at index:         * <blockquote><pre>         *     dstbegin + (srcEnd-srcBegin) - 1         * </pre></blockquote>         *         * @param      srcBegin   index of the first character in the string         *                        to copy.         * @param      srcEnd     index after the last character in the string         *                        to copy.         * @param      dst        the destination array.         * @param      dstBegin   the start offset in the destination array.         * @exception IndexOutOfBoundsException If any of the following        */        public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {            if (srcBegin < 0) {                throw new StringIndexOutOfBoundsException(srcBegin);            }            if (srcEnd > value.length) {                throw new StringIndexOutOfBoundsException(srcEnd);            }            if (srcBegin > srcEnd) {                throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);            }            System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);        }

（17）replace(char oldChar,char newChar)

public String replace(char oldChar, char newChar) {            if (oldChar != newChar) {                int len = value.length;                int i = -1;                char[] val = value; /* avoid getfield opcode */                while (++i < len) {                    if (val[i] == oldChar) {                        break;                    }                }                //这里也可以发现由于String是不可变的，所以当改变其中某一个值时，只能在建一个String对象                //而再建对象就涉及到了重新复制的处理，比较麻烦                if (i < len) {                    char buf[] = new char[len];                    //拷贝前半部分，没有对应的字符                    for (int j = 0; j < i; j++) {                        buf[j] = val[j];                    }                    while (i < len) {                        char c = val[i];                        buf[i] = (c == oldChar) ? newChar : c;                        i++;                    }                    return new String(buf, true);                }            }            return this; }

将字符串中的所有oldChar替换为newChar.
这里有个疑问？为什么不直接一遍复制然后直接替换呢？为啥还需要在先扫描一遍，查找是否存在oldChar呢？可能也是为了节省重新建String对象的内存吧，当不存在oldChar时，只要返回当前String对象即可，而不需要重新再建一个String对象了，而由于String是不可变的对象，所以即便是后来的引用做任何改变也不会影响原来的String对象。

(18)几个与正则匹配相关的方法

public boolean matches(String regex) {
return Pattern.matches(regex, this);
}
判断字符串是否完全匹配该正则表达式

public String replaceFirst(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceFirst(replacement);
}
将第一个匹配的字符串替换成replacement字符串，并且返回新的字符串

public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}
将所有匹配的字符串替换成replacement字符串，并且返回新的字符串

public String[] split(String regex) {
return split(regex, 0);
}
将字符串按照指定表达式进行分割，分别返回分割后的String数组

(19）trim（）

public String trim() {            int len = value.length;            int st = 0;            char[] val = value;    /* avoid getfield opcode */            while ((st < len) && (val[st] <= ' ')) {                st++;            }            while ((st < len) && (val[len - 1] <= ' ')) {                len--;            }            return ((st > 0) || (len < value.length)) ? substring(st, len) : this;  }

这个trim()是去掉首尾的空格，而实现方式也非常简单，分别找到第一个非空格字符的下标，与最后一个非空格字符的下标
然后返回之间的子字符串。注意这里由于应用了substring方法，所以len变量的控制要小心

（20）toCharArray()

public char[] toCharArray() {            // Cannot use Arrays.copyOf because of class initialization order issues            char result[] = new char[value.length];            System.arraycopy(value, 0, result, 0, value.length);            return result; }

返回char[]数组，相等于重新拷贝一份然后返回

0 0