java爬虫

来源:互联网 发布:西昊 ergomax 知乎 编辑:程序博客网 时间:2024/05/02 01:14

思路:
1、获取网页资源

URL url=new URL("http://www.runoob.com/java/java-regular-expressions.html");BufferedReader read=new BufferedReader(new InputStreamReader(url.openStream()));

2、规定匹配模式

String regex = "\\w+ \\w+ void \\w+";Pattern p = Pattern.compile(regex);

3、获取匹配的内容

Matcher m = p.matcher(str);System.out.println(str);while(m.find()){    list.add(m.group()); //添加到list里面}

下面是完整例子:

package regex;import java.io.BufferedReader;import java.io.IOException;import java.io.InputStreamReader;import java.net.URL;import java.util.ArrayList;import java.util.List;import java.util.regex.Matcher;import java.util.regex.Pattern;public class Pachong {    public static void main(String[] args) {        List<String> list = getMethod();        for(String str : list){            System.out.println(str);        }    }    public static List<String> getMethod(){        List<String> list = new ArrayList<String>();        BufferedReader read=null;        try{            //获取资源            URL url=new URL("http://www.runoob.com/java/java-regular-expressions.html");            read=new BufferedReader(new InputStreamReader(url.openStream()));            //read=new BufferedReader(new FileReader("F:/myeclips_workplace/ioTest/src/字符流/CharArrayTest.java"));            //匹配模式            String regex = "\\w+ \\w+ void \\w+";            Pattern p = Pattern.compile(regex);            //获取匹配的内容            String str=null;            while((str=read.readLine())!=null){                Matcher m = p.matcher(str);                while(m.find()){                    list.add(m.group()); //添加到list里面                }            }          }catch(IOException e){              System.out.println("IO异常了!");          }finally{              if(read!=null){                  try {                    read.close();                } catch (IOException e) {                    e.printStackTrace();                }              }          }        return list;    }}

输出结果为:

public static void mainpublic static void mainpublic static void mainpublic static void mainpublic static void main
0 0
原创粉丝点击