为什么JDK中String类的indexof不使用KMP或者Boyer-Moore等时间复杂度低的算法编辑器

来源：互联网发布：仿淘宝拖拽式模板系统编辑：程序博客网时间：2024/05/21 18:34

今天在leetcode上刷题，正好刷到查找字符串的题目，想到了以前了解的KMP和Boyer-Moore等算法。这两个及其类似的算法的时间复杂度都接近于O(n)。

后面自己又去看了下JDK的String类中的indexof方法的实现，发现很奇怪，仅仅只是用了暴力破解法，也就是最原始的实现，时间复杂度也到了O(n*m)。

String类的indexof(String s)方法中调用一下方法：

/**     * Code shared by String and StringBuffer to do searches. The     * source is the character array being searched, and the target     * is the string being searched for.     *     * @param   source       the characters being searched.     * @param   sourceOffset offset of the source string.     * @param   sourceCount  count of the source string.     * @param   target       the characters being searched for.     * @param   targetOffset offset of the target string.     * @param   targetCount  count of the target string.     * @param   fromIndex    the index to begin searching from.     */    static int indexOf(char[] source, int sourceOffset, int sourceCount,            char[] target, int targetOffset, int targetCount,            int fromIndex) {        if (fromIndex >= sourceCount) {            return (targetCount == 0 ? sourceCount : -1);        }        if (fromIndex < 0) {            fromIndex = 0;        }        if (targetCount == 0) {            return fromIndex;        }        char first = target[targetOffset];        int max = sourceOffset + (sourceCount - targetCount);        for (int i = sourceOffset + fromIndex; i <= max; i++) {            /* Look for first character. */            if (source[i] != first) {                while (++i <= max && source[i] != first);            }            /* Found first character, now look at the rest of v2 */            if (i <= max) {                int j = i + 1;                int end = j + targetCount - 1;                for (int k = targetOffset + 1; j < end && source[j]                        == target[k]; j++, k++);                if (j == end) {                    /* Found whole string. */                    return i - sourceOffset;                }            }        }        return -1;    }

谷歌并翻了下StackOverflow：

原来JDK的编写者们认为大多数情况下，字符串都不长，使用原始实现可能代价更低。因为KMP和Boyer-Moore算法都需要预先计算处理来获得辅助数组，需要一定的时间和空间，这可能在短字符串查找中相比较原始实现耗费更大的代价。而且一般大字符串查找时，程序员们也会使用其它特定的数据结构，查找起来更简单。这有点类似于排除特定情况下的快速排序了。不同环境选择不同算法。

Reference:

http://stackoverflow.com/questions/19543547/why-jdks-string-indexof-does-not-use-kmp/

0 0