java 爬虫分析xml
来源:互联网 发布:galgame认证回避软件 编辑:程序博客网 时间:2024/06/07 08:30
package com.example.xml;import java.io.BufferedReader;import java.io.File; import java.io.FileInputStream;import java.io.FileWriter;import java.io.IOException;import java.io.InputStreamReader;import java.io.PrintWriter;import java.util.ArrayList;import java.util.List; import java.util.regex.Matcher;import java.util.regex.Pattern; public class jx { public static void main(String[] args) throws IOException { PrintWriter pw = new PrintWriter(new FileWriter("C:/Users/Roronoa/Desktop/Fr/c.xml")); StringBuffer buf = new StringBuffer(2 * 1024 * 1024); String html = "C:/Users/Roronoa/Desktop/Fr/a.xml"; readTxtFile(html, buf); String lv = buf.toString(); List<String> a = new ArrayList<String>(); a = getKey(lv); List<String> b = new ArrayList<String>(); b = getString(lv); for (int i = 0; i < b.size(); i++) { System.out.println(a.get(i)); pw.write("<string name=\""+a.get(i)+"\">"); pw.write(b.get(i)); pw.write("</string>"); pw.write("\n"); } pw.close(); } public static void readTxtFile(String filePath, StringBuffer buf) { try { String encoding = "UTF-8"; File file = new File(filePath); if (file.isFile() && file.exists()) { // �ж��ļ��Ƿ���� InputStreamReader read = new InputStreamReader( new FileInputStream(file), encoding);// ���ǵ������ʽ BufferedReader bufferedReader = new BufferedReader(read); String lineTxt = null; while ((lineTxt = bufferedReader.readLine()) != null) { // System.out.println(lineTxt); buf.append(lineTxt); } read.close(); } else { System.out.println("!!!"); } } catch (Exception e) { System.out.println("!!!"); e.printStackTrace(); } } // public static String getLv(String html) throws IOException { // String lv = html.replaceAll("\"<key>", "");// return lv; // } public static List<String> getKey(String html) throws IOException { Pattern p = Pattern .compile("<key>(.+?)</key>"); Matcher m = p.matcher(html); ArrayList<String> alist = new ArrayList<String>(); while (m.find()) { alist.add(m.group(1)); System.out.println(m.group(1)); } return alist; } public static List<String> getString(String html) throws IOException { Pattern p = Pattern .compile("<string>(.+?)</string>"); Matcher m = p.matcher(html); ArrayList<String> alist = new ArrayList<String>(); while (m.find()) { alist.add(m.group(1)); System.out.println(m.group(1)); } return alist; } }
0 0
- java 爬虫分析xml
- 基于webmagic的java网页爬虫,抓取网页指定节点,然后使用dom4j分析xml数据
- 网络爬虫源码分析(java实现)
- 开源JAVA爬虫crawler4j源码分析
- Java爬虫那些事---下载分析
- Java解析xml技术分析
- JS爬虫,Java爬虫
- 开源JAVA爬虫crawler4j源码分析 - 1 开个头
- 开源JAVA爬虫crawler4j源码分析 - 2 开始使用crawler4j
- 开源JAVA爬虫crawler4j源码分析 - 3 线程管理
- Java爬虫:爬取豆瓣图片之分析
- java网络爬虫学习记录(二)抓包分析
- java爬虫框架webmagic-core-0.5.3源码分析
- JAVA爬虫
- Java 爬虫
- Java 爬虫
- Java 爬虫
- java爬虫
- 京东 www.jd.com 卖二手货,以后再也不买了
- 决策树个人总结
- 修改Tomcat主目录
- 如何使两个div在同一行显示
- 2015 Multi-University Training Contest 5(HDOJ5351)
- java 爬虫分析xml
- 南邮 OJ 1617 费用
- 类和动态内存分配
- iOS中的沙盒机制
- Android——Activity和Intent
- 南邮 OJ 1618 合法序列
- Atom飞行手册翻译: 3.7 调试
- Android ListView数据刷新
- Specialized Four-Digit Numbers