关于String里indexOf()的一些思考
这些天偶然翻起了算法书,看到KMP算法的时候,突然发现多年来似乎一直没有完全搞明白,于是恶补了一下。写了个程序,“基本”明白了fail函数和KMP的那些事~~~~
具体程序见下 1package test;
2/** *//**
3 * @author Jia Yu
4 * @date 2010-9-28
5 */
6public class StringMatch {
7
8 private int []f;
9 /** *//**
10 * KMP fail function to calculate f[]
11 * f[j] = k < j where k is the maximum of pat[0..k] == pat[j-k..j]
12 * = -1 otherwise
13 * @param pat
14 */
15 public void fail(String pat){
16 int lenP = pat.length();
17 f = new int[lenP];
18 f[0] = -1;
19 for(int j=1;j<lenP;j++){
20 int i = f[j-1];
21 while((pat.charAt(j)==pat.charAt(i+1))&&(i>=0)) i = f[i];
22 if(pat.charAt(j)!=pat.charAt(i+1)) f[j] = i + 1;
23 else f[j] = -1;
24 }
25 }
26
27 /** *//**
28 * Implementation of KMP algorithm.
29 * @param s string which is source string
30 * @param pat string pattern
31 * @return
32 */
33 public int kmp_find(String s,String pat){
34 int lenS,lenP;
35 lenS = s.length();
36 lenP = pat.length();
37 int i,j;
38 i=j=0;
39 while(i<lenS&&j<lenP){
40 if(s.charAt(i)==pat.charAt(j)){
41 i++;
42 j++;
43 }
44 else{
45 if(j==0) i++;
46 else j=f[j-1] +1;
47 }
48 }
49 if(j<lenP||lenP==0) return -1;
50 else return i-lenP;
51 }
52
53 /** *//**
54 * @param args
55 */
56 public static void main(String[] args) {
57 // TODO Auto-generated method stub
58 StringMatch ss = new StringMatch();
59 String s = "abcdedabc";
60 String pat = "dab";
61 ss.fail(pat);
62 System.out.println(ss.kmp_find(s, pat));
63 }
64
65}
66完事后忍不住想和String的indexOf比比性能。一直以为jdk 的src里String的indexOf是用Naïve string search algorithm实现的。没有任何技巧,当然也有好多人去拿这个说事,但是逛过一些论坛,记得人们只是自己去实现KMP,然后说KMP有多好,但是很少有拿出数据来比对的。所以今天,在投简历闲暇,还是抽出些时间,看了看String match相关的一些东西。写了一些代码,发现了一些东东~~~~
不多扯闲话了。首先进行了一个KMP和String indexOf的比较,看看结果吧。
preprocess using 7549ns
=====================================================
Cycles : 30000
String find pat in pos -1
Used Time is 87134ns
KMP find pat in pos -1
Used Time is 1071829ns
=====================================================
Cycles : 90000
String find pat in pos -1
Used Time is 150480ns
KMP find pat in pos -1
Used Time is 2277475ns
=====================================================
Cycles : 270000
String find pat in pos -1
Used Time is 380815ns
KMP find pat in pos -1
Used Time is 848257ns
=====================================================
Cycles : 810000
String find pat in pos 119457
Used Time is 509997ns
KMP find pat in pos 119457
Used Time is 1141992ns
=====================================================
Cycles : 2430000
String find pat in pos 459895
Used Time is 1845130ns
KMP find pat in pos 459895
Used Time is 4180643ns
测试代码如下:
1/** *//**
2 *
3 */
4package test;
5
6import java.util.Random;
7
8/** *//**
9 * @author Jia Yu
10 * @date 2010-9-28
11 */
12public class StringMatchTest2 {
13
14 private static String src;
15 private static String pat;
16 private static long cycles = 10000L;
17 private static Random rand = new Random(47);
18
19 public static void generateSource() {
20 StringBuilder sb = new StringBuilder();
21 int iter = (int) cycles;
22 for (int i = 0; i < iter; i++) {
23 sb.append((char) (a + (rand.nextInt(6))));
24 }
25 src = sb.toString();
26 }
27
28 public static void generatePattern() {
29 StringBuilder sb = new StringBuilder();
30 for (int i = 0; i < 7; i++) {
31 sb.append((char) (a + (rand.nextInt(6))));
32 }
33 pat = sb.toString();
34 }
35
36 /** *//**
37 * @param args
38 */
39 public static void main(String[] args) {
40 // TODO Auto-generated method stub
41 generatePattern();
42 StringMatch sm = new StringMatch();
43 long start, pre, dur;
44 start = System.nanoTime();
45 sm.fail(pat);
46 pre = System.nanoTime() - start;
47 System.out.println("
补充:软件开发 , Java ,