使用多线程进行大文件的处理

来源:互联网 发布:led电子屏幕软件 编辑:程序博客网 时间:2024/05/22 00:53

场景:

有一批数据由于种种原因需要重新处理。总量大概在180W左右。由于数据库的限制,不太好直接直接查库处理,已经由DBA 直接导出了。全都是流水号。具体实现为拿流水号去接口中查询数据,把符合条件的流水号通过接口同步到其他系统里面。


最开始的做法很简单,通过读取文件流一行一行的获取流水号,通过流水号再进行其他的处理。由于是单线程程序,加上接口的响应慢,处理的数据特别慢。一晚上才跑了20W 左右的数据。单线程的代码先贴上来

public static void main(String[] args) {init();String filePath = PATH + "1119.txt";readTxtFile(filePath);}
public static void readTxtFile(String filePath) {String lineTxt = null;try {String encoding = "GBK";File file = new File(filePath);if (file.isFile() && file.exists()) { // 判断文件是否存在InputStreamReader read = new InputStreamReader(new FileInputStream(file), encoding);// 考虑到编码格式BufferedReader bufferedReader = new BufferedReader(read);while ((lineTxt = bufferedReader.readLine()) != null) {String billCode = lineTxt;List<Map<String, Object>> list = getTrace(billCode);if (list == null) {write(JsonUtil.toJSON(billCode), "请求超时1.txt");continue;}if (list != null && list.size() == 0) {write(JsonUtil.toJSON(billCode), "无物流信息2.txt");}for (Map<String, Object> map : list) {String str = map.get("traces") + "";if ("".equals(str)) {System.out.println("出现异常   billCode : " + lineTxt+ "  data :" + map.toString());continue;}List<CommonTrace> traces = JsonUtil.parseArray(str,CommonTrace.class);List<CommonTrace> rec = new ArrayList<CommonTrace>();for (CommonTrace trace : traces) {if ("签收".equals(trace.getScanType())) {// pushXuanmi(trace, strs[0]);rec.add(trace);}}if (rec.size() < 1) {write(billCode, "未签收单号.txt");}else{buildTaoBaoData(rec, billCode, billCode);}}}read.close();} else {System.out.println("找不到指定的文件");}} catch (Exception e) {System.out.println("读取文件内容出错," + lineTxt);e.printStackTrace();}}

所有开始考虑使用多线程进行处理。初步想法是一次性读取10条左右,再把每个流水丢给每个线程去处理。

比较重要的API 是 bufferedReader.lines().limit(30) 方法。会一次读取30条数据,等同于 bufferedReader.readLine() ;

public static void main(String[] args) {init();String filePath = PATH + "1119.txt";readTxtFile(filePath);}
public static void readTxtFile(String filePath) {String lineTxt = null;try {String encoding = "GBK";File file = new File(filePath);if (file.isFile() && file.exists()) { // 判断文件是否存在InputStreamReader read = new InputStreamReader(new FileInputStream(file), encoding);// 考虑到编码格式BufferedReader bufferedReader = new BufferedReader(read);Object[] o;while ((o = bufferedReader.lines().limit(30).toArray()) != null) {List<Object> billCodes = Arrays.asList(o);for(int i=0;i<30;i++){new Thread(new Task(billCodes.get(i)+"")).start();Thread.sleep(10);}}read.close();} else {System.out.println("找不到指定的文件");}} catch (Exception e) {System.out.println("读取文件内容出错," + lineTxt);e.printStackTrace();}}

class Task implements Runnable {private String billCode;@Overridepublic void run() {try {List<Map<String, Object>> list = TaoBaoTracePushertest2.getTrace(billCode);if (list == null) {TaoBaoTracePushertest2.write(JsonUtil.toJSON(billCode), "请求超时1.txt");}if (list != null && list.size() == 0) {TaoBaoTracePushertest2.write(JsonUtil.toJSON(billCode), "无物流信息2.txt");}for (Map<String, Object> map : list) {String str = map.get("traces") + "";if ("".equals(str)) {System.out.println("出现异常   billCode : " + billCode+ "  data :" + map.toString());continue;}List<CommonTrace> traces = JsonUtil.parseArray(str,CommonTrace.class);List<CommonTrace> rec = new ArrayList<CommonTrace>();for (CommonTrace trace : traces) {if ("签收".equals(trace.getScanType())) {rec.add(trace);}}if (rec.size() < 1) {TaoBaoTracePushertest2.write(billCode, "未签收单号.txt");} else {TaoBaoTracePushertest2.buildTaoBaoData(rec, billCode, billCode);}}} catch (Exception e) {// TODO Auto-generated catch blocke.printStackTrace();}}public String getBillCode() {return billCode;}public void setBillCode(String billCode) {this.billCode = billCode;}public Task(String billCode) {super();this.billCode = billCode;}}



忽略 中间的业务逻辑。。。