Java对Excel(03,07)进行上传、解析、验证、入库

来源:互联网 发布:仓管员用什么软件 编辑:程序博客网 时间:2024/06/02 06:10
原文地址:Java对Excel(03,07)进行上传、解析、验证、入库作者:吴俊采

本文主要记录最近的工作内容,使用java实现对Excel(03,07)的上传、解析、验证和入库(PostgreSQL)。

 

一。上传

 实际就是实现文件上传至服务器即可,但是方法有很多,首先要考虑采用何种方法实现上传:

 

1.参考文章:

Java开发过程中文件上传的各种方式全面总结  http://javacrazyer.iteye.com/blog/707705

主要有以下几种方法:

JSP+Servlet(或纯JSP);Struts2;Struts;FTP;ExtJs;Flex;

 

2.我采用的是jsp+Servlet的方式来实现的。需要先从apache官网下载fileupload.jar。

UploadFileServlet.java的主要代码:

[转载]Java对Excel(03,07)进行上传、解析、验证、入库

testfileupload.jsp的主要代码:

[转载]Java对Excel(03,07)进行上传、解析、验证、入库


3.需要注意的几个问题

(1)上传文件的大小限制

 设置 sfu.setFileSizeMax(1024*1024*10000);单位byte.表示近10G。

测试上传1.5G的文件没问题,2.9G以上的文件上传失败,无错误提示,而是“无法显示该页面”的错误:

[转载]Java对Excel(03,07)进行上传、解析、验证、入库
所以目前不清楚到底支持多大的文件上传。

 

二。解析

1.java解析excel通常有两种方法:(1)jxl(2)poi.

至于选用哪种那做得看具体的需求,我这次主要是要支持03,07的excel,目前jxl更新慢,尚不支持07,所以只能选用poi;本次只涉及到excel的读操作,没有写操作,jxl的写快于poi,但jxl的读慢于poi.

综合下来,选用poi。这个poi具体是啥意思我不太清楚,但不是常说的感兴趣点(point of interest)哈.

 

2.是否支持大文件解析

excel03最多只能有65536条记录,65536行*256列;excel07:1048576行*16384列。

excel07文件超过65536条记录,另存为xls时会报错:

[转载]Java对Excel(03,07)进行上传、解析、验证、入库

所以要支持大文件,必须得用excel07格式来操作。

 

需要考虑这点,因为这决定了是用DOM做,还是用SAX。

就1和2来看,java解析excel比起.net解析excel是麻烦了很多很多啊,.net解析excel主要有三种方法:com组件;oledb;openxml。其中使用oledb最常用也最简单。

 

3.jar包下载

[转载]Java对Excel(03,07)进行上传、解析、验证、入库

jxl;http://www.andykhan.com/jexcelapi/download.html,最新版本为JExcelApiv2.6.12

poi:http://poi.apache.org/download.html#POI-3.9,最新版本为poi-bin-3.9-20121203.zip

 

4.poi+SAX,支持excel07大文件解析(EventUserModel)

 这种方式应该是最高级别的,能满足大数据量需求,也不会造成oom错误的。我这次需要采用的就是这种实现方式。

 主要定义了三个类来实现:

(1)Excel2003Reader.java,操作03Excel。


package com.cbe.excel;

import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

importorg.apache.poi.hssf.eventusermodel.EventWorkbookBuilder.SheetRecordCollectingListener;
importorg.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener;
import org.apache.poi.hssf.eventusermodel.HSSFEventFactory;
import org.apache.poi.hssf.eventusermodel.HSSFListener;
import org.apache.poi.hssf.eventusermodel.HSSFRequest;
importorg.apache.poi.hssf.eventusermodel.MissingRecordAwareHSSFListener;
importorg.apache.poi.hssf.eventusermodel.dummyrecord.LastCellOfRowDummyRecord;
importorg.apache.poi.hssf.eventusermodel.dummyrecord.MissingCellDummyRecord;
import org.apache.poi.hssf.model.HSSFFormulaParser;
import org.apache.poi.hssf.record.BOFRecord;
import org.apache.poi.hssf.record.BlankRecord;
import org.apache.poi.hssf.record.BoolErrRecord;
import org.apache.poi.hssf.record.BoundSheetRecord;
import org.apache.poi.hssf.record.FormulaRecord;
import org.apache.poi.hssf.record.LabelRecord;
import org.apache.poi.hssf.record.LabelSSTRecord;
import org.apache.poi.hssf.record.NumberRecord;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.hssf.record.SSTRecord;
import org.apache.poi.hssf.record.StringRecord;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;


public class Excel2003Reader implements HSSFListener {

 private int minColumns = -1;
 private POIFSFileSystem fs;
 private int lastRowNumber;
 private int lastColumnNumber;

 
 private boolean outputFormulaValues = true;

 
 private SheetRecordCollectingListenerworkbookBuildingListener;
 //excel2003工作薄
 private HSSFWorkbook stubWorkbook;

 // Records we pick up as we process
 private SSTRecord sstRecord;
 private FormatTrackingHSSFListenerformatListener;

 //表索引
 private int sheetIndex = -1;
 private BoundSheetRecord[] orderedBSRs;
 //@SuppressWarnings("unchecked")
 @SuppressWarnings("rawtypes")
 private ArrayList boundSheetRecords = newArrayList();

 // For handling formulas with stringresults
 private int nextRow;
 private int nextColumn;
 private boolean outputNextStringRecord;
 //当前行
 //private int curRow = 0;
 //存储行记录的容器
 privateList<String> rowlist = newArrayList<String>();
 
 privateList<List<String>>exceldata=newArrayList<List<String>>();
 @SuppressWarnings( "unused")
 private String sheetName;
 
 
  publicList<List<String>>getExcelData(){
    return exceldata;
    }
 
 
 public void process(String fileName) throwsIOException {
  this.fs = newPOIFSFileSystem(new FileInputStream(fileName));
  MissingRecordAwareHSSFListenerlistener = new MissingRecordAwareHSSFListener(
    this);
  formatListener = newFormatTrackingHSSFListener(listener);
  HSSFEventFactory factory = newHSSFEventFactory();
  HSSFRequest request = newHSSFRequest();
  if (outputFormulaValues){
   request.addListenerForAllRecords(formatListener);
  } else {
   workbookBuildingListener= new SheetRecordCollectingListener(
     formatListener);
   request.addListenerForAllRecords(workbookBuildingListener);
  }
  factory.processWorkbookEvents(request,fs);
 }
 
 
 @SuppressWarnings("unchecked")
 public void processRecord(Record record) {
    //List<String> rowlist = newArrayList<String>();
  int thisRow = -1;
  int thisColumn = -1;
  String thisStr = null;
  String value = null;
  switch (record.getSid()){
   caseBoundSheetRecord.sid:
    boundSheetRecords.add(record);
    break;
   caseBOFRecord.sid:
    BOFRecordbr = (BOFRecord) record;
    if(br.getType() == BOFRecord.TYPE_WORKSHEET) {
     //如果有需要,则建立子工作薄
     if(workbookBuildingListener != null&& stubWorkbook == null) {
      stubWorkbook= workbookBuildingListener
        .getStubHSSFWorkbook();
     }
     
     sheetIndex++;
     if(orderedBSRs == null) {
      orderedBSRs= BoundSheetRecord
        .orderByBofPosition(boundSheetRecords);
     }
     sheetName= orderedBSRs[sheetIndex].getSheetname();
    }
    break;
 
   caseSSTRecord.sid:
    sstRecord= (SSTRecord) record;
    break;
 
   caseBlankRecord.sid:
    BlankRecordbrec = (BlankRecord) record;
    thisRow= brec.getRow();
    thisColumn= brec.getColumn();
    thisStr= "";
    rowlist.add(thisColumn,thisStr);
    break;
   caseBoolErrRecord.sid: //单元格为布尔类型
    BoolErrRecordberec = (BoolErrRecord) record;
    thisRow= berec.getRow();
    thisColumn= berec.getColumn();
    thisStr= berec.getBooleanValue()+"";
    rowlist.add(thisColumn,thisStr);
    break;
 
   caseFormulaRecord.sid: //单元格为公式类型
    FormulaRecordfrec = (FormulaRecord) record;
    thisRow= frec.getRow();
    thisColumn= frec.getColumn();
    if(outputFormulaValues) {
     if(Double.isNaN(frec.getValue())) {
      //Formula result is a string
      //This is stored in the next record
      outputNextStringRecord= true;
      nextRow= frec.getRow();
      nextColumn= frec.getColumn();
     }else {
      thisStr= formatListener.formatNumberDateCell(frec);
     }
    }else {
     thisStr= '"' + HSSFFormulaParser.toFormulaString(stubWorkbook,
       frec.getParsedexpression_r())+ '"';
    }
    rowlist.add(thisColumn,thisStr);
    break;
   caseStringRecord.sid://单元格中公式的字符串
    if(outputNextStringRecord) {
     //String for formula
     StringRecordsrec = (StringRecord) record;
     thisStr= srec.getString();
     thisRow= nextRow;
     thisColumn= nextColumn;
     outputNextStringRecord= false;
    }
    break;
   caseLabelRecord.sid:
    LabelRecordlrec = (LabelRecord) record;
    //curRow= thisRow = lrec.getRow();
    thisColumn= lrec.getColumn();
    value= lrec.getValue().trim();
    value= value.equals("")?" ":value;
    rowlist.add(thisColumn,value);
    break;
   caseLabelSSTRecord.sid:  //单元格为字符串类型
    LabelSSTRecordlsrec = (LabelSSTRecord) record;
    //curRow= thisRow = lsrec.getRow();
    thisColumn= lsrec.getColumn();
    if(sstRecord == null) {
     rowlist.add(thisColumn," ");
    }else {
     value=  sstRecord
     .getString(lsrec.getSSTIndex()).toString().trim();
     value= value.equals("")?" ":value;
     rowlist.add(thisColumn,value);
    }
    break;
   caseNumberRecord.sid:  //单元格为数字类型
    NumberRecordnumrec = (NumberRecord) record;
    //curRow= thisRow = numrec.getRow();
    thisColumn= numrec.getColumn();
    value= formatListener.formatNumberDateCell(numrec).trim();
    value= value.equals("")?" ":value;
    //向容器加入列值
    rowlist.add(thisColumn,value);
    break;
   default:
    break;
  }

  // 遇到新行的操作
  if (thisRow != -1&& thisRow != lastRowNumber){
   lastColumnNumber= -1;
  }

  // 空值的操作
  if (record instanceofMissingCellDummyRecord) {
   MissingCellDummyRecordmc = (MissingCellDummyRecord) record;
   //curRow =thisRow = mc.getRow();
   thisColumn =mc.getColumn();
   rowlist.add(thisColumn,"");
  }

  // 更新行和列的值
  if (thisRow >-1)
   lastRowNumber= thisRow;
  if (thisColumn >-1)
   lastColumnNumber= thisColumn;

  // 行结束时的操作
  if (record instanceofLastCellOfRowDummyRecord) {
   if(minColumns > 0) {
    //列值重新置空
    if(lastColumnNumber == -1) {
     lastColumnNumber= 0;
    }
   }
   lastColumnNumber= -1;
    //每行结束时, 调用getRows() 方法
   //rowReader.getRows(sheetIndex,curRow,rowlist);
   
   List<String>rowlistcopy = newArrayList<String>();
   for (int i =0; i < rowlist.size(); i++) {
    rowlistcopy.add(rowlist.get(i));
   }
   exceldata.add(rowlistcopy);
   // 清空容器
   rowlist.clear();
   
  }
 }


}

(2)Excel2007ReaderNew.java,操作07excel,这个是核心,之所以有个New,是取代了之前的Excel2007Reader.java,解决有单元格为空的判定处理问题。

参考:http://gaosheng08.iteye.com/blog/624758


package com.cbe.excel;
importjava.io.InputStream;  
importjava.util.ArrayList;  
importjava.util.Iterator;  
import java.util.List;  
 
importorg.apache.poi.xssf.eventusermodel.XSSFReader;  
importorg.apache.poi.xssf.model.SharedStringsTable;  
importorg.apache.poi.xssf.usermodel.XSSFRichTextString;  
importorg.apache.poi.openxml4j.opc.OPCPackage;  
importorg.xml.sax.Attributes;  
importorg.xml.sax.InputSource;  
importorg.xml.sax.SAXException;  
importorg.xml.sax.XMLReader;  
importorg.xml.sax.helpers.DefaultHandler;  
importorg.xml.sax.helpers.XMLReaderFactory;  


public  classExcel2007ReaderNew   extendsDefaultHandler {

  private SharedStringsTablesst;  
    private StringlastContents;  
    private booleannextIsString;  
  
    @SuppressWarnings("unused")
  private int sheetIndex =-1;  
    private List<String> rowlist = newArrayList<String>();  
    private int curRow =0;    //当前行  
    private int curCol =0;    //当前列索引  
    private int preCol =0;    //上一列列索引  
    private int titleRow = 0;  //标题行,一般情况下为0  
    private int rowsize =0;   //列数  
       
    privateList<List<String>>exceldata=newArrayList<List<String>>();//整个excel数据
    publicList<List<String>>getExcelData(){
     return exceldata;
    }
    
    //excel记录行操作方法,以行索引和行元素列表为参数,对一行元素进行操作,元素为String类型  
 //  public abstract voidoptRows(int curRow, List<String>rowlist) throws SQLException;  
       
    //excel记录行操作方法,以sheet索引,行索引和行元素列表为参数,对sheet的一行元素进行操作,元素为String类型  
    // publicabstract void optRows(int sheetIndex,int curRow,List<String> rowlist) throwsSQLException;  
       
    //只遍历一个sheet,其中sheetId为要遍历的sheet索引,从1开始,1-3  
    public void processOneSheet(String filename,int sheetId) throwsException {  
        OPCPackage pkg =OPCPackage.open(filename);  
        XSSFReader r = newXSSFReader(pkg);  
        SharedStringsTable sst =r.getSharedStringsTable();  
           
        XMLReader parser =fetchSheetParser(sst);  
  
        // rId2 found by processing theWorkbook  
        // 根据 rId# 或 rSheet#查找sheet  
        InputStream sheet2 =r.getSheet("rId"+sheetId);  
        sheetIndex++;  
        InputSource sheetSource = newInputSource(sheet2);  
        parser.parse(sheetSource);  
        sheet2.close();  
        pkg.close();//add by lfc 20130710,解决文件使用后无法删除的问题
     
  
     
    public void process(String filename) throws Exception{  
        OPCPackage pkg =OPCPackage.open(filename);  
        XSSFReader r = newXSSFReader(pkg);  
        SharedStringsTable sst =r.getSharedStringsTable();  
  
        XMLReader parser =fetchSheetParser(sst);  
  
        Iterator<InputStream> sheets =r.getSheetsData();  
        while (sheets.hasNext()){  
            curRow = 0;  
            sheetIndex++;  
            InputStream sheet =sheets.next();  
            InputSource sheetSource = newInputSource(sheet);  
            parser.parse(sheetSource);  
            sheet.close();  
        }
        pkg.close();//add by lfc 20130710,解决文件使用后无法删除的问题
     
  
    public XMLReader fetchSheetParser(SharedStringsTablesst)  
            throws SAXException {  
        XMLReader parser =XMLReaderFactory  
                .createXMLReader("org.apache.xerces.parsers.SAXParser");  
        this.sst = sst;  
        parser.setContentHandler(this);  
        return parser;  
     
  
    public void startElement(String uri, String localName, Stringname,  
            Attributes attributes) throws SAXException{  
        // c =>单元格  
        if (name.equals("c")) {  
            // 如果下一个元素是 SST的索引,则将nextIsString标记为true  
            String cellType =attributes.getValue("t");  
            String rowStr =attributes.getValue("r");  
            curCol =this.getRowIndex(rowStr);  
            if (cellType != null &&cellType.equals("s")) {  
                nextIsString = true;  
            } else {  
                nextIsString = false;  
             
         
        // 置空  
        lastContents = "";  
     
  
    public void endElement(String uri, String localName, Stringname)  
            throws SAXException {  
        //根据SST的索引值的到单元格的真正要存储的字符串  
        //这时characters()方法可能会被调用多次  
        if (nextIsString) {  
            try {  
                int idx =Integer.parseInt(lastContents);  
                lastContents = newXSSFRichTextString(sst.getEntryAt(idx))  
                        .toString();  
            } catch (Exception e) {  
  
             
         
  
        // v =>单元格的值,如果单元格是字符串则v标签的值为该字符串在SST中的索引  
        //将单元格内容加入rowlist中,在这之前先去掉字符串前后的空白符  
        if (name.equals("v")) {  
            String value =lastContents.trim();  
            value = value.equals("")?"":value;  
            int cols = curCol-preCol;  
            if(cols>1){  
                for (int i = 0;i <cols-1;i++){  
                    rowlist.add(preCol,"");  
                 
             
            preCol = curCol;  
            rowlist.add(curCol-1,value);  
        }else {  
            //如果标签名称为 row ,这说明已到行尾,调用 optRows()方法  
            if (name.equals("row")) {  
                int tmpCols =rowlist.size();  
                if(curRow>this.titleRow&&tmpCols<this.rowsize){  
                    for (int i = 0;i <this.rowsize-tmpCols;i++){  
                        rowlist.add(rowlist.size(),"");  
                     
                 
                // add by lfc 20130710,注释掉
               
                if(curRow==this.titleRow){  
                    this.rowsize =rowlist.size();  
                }
                
                List<String> rowlistcopy = newArrayList<String>();
     for(int i = 0; i < rowlist.size(); i++) {
      rowlistcopy.add(rowlist.get(i));
     }
     exceldata.add(rowlistcopy);
     
                rowlist.clear();  
                curRow++;  
                curCol = 0;  
                preCol = 0;  
             
         
     
  
    public void characters(char[] ch, int start, intlength)  
            throws SAXException {  
        //得到单元格内容的值  
        lastContents += new String(ch, start,length);  
     
       
    //得到列索引,每一列c元素的r属性构成为字母加数字的形式,字母组合为列索引,数字组合为行索引,  
    //如AB45,表示为第(A-A+1)*26+(B-A+1)*26列,45行  
    public int getRowIndex(StringrowStr){  
        rowStr = rowStr.replaceAll("[^A-Z]","");  
        byte[] rowAbc =rowStr.getBytes();  
        int len = rowAbc.length;  
        float num = 0;  
        for (inti=0;i<len;i++){  
            num += (rowAbc[i]-'A'+1)*Math.pow(26,len-i-1);  
         
        return (int) num;  
     
  
    public int getTitleRow(){  
        return titleRow;  
     
  
    public void setTitleRow(int titleRow){  
        this.titleRow = titleRow;  
     
 
}

 

(3)ExcelReaderUtil.java

核心代码:


  public static final StringEXCEL03_EXTENSION = ".xls"; //excel2003扩展名
  public static final StringEXCEL07_EXTENSION = ".xlsx"; //excel2007扩展名
  
  
  public staticList<List<String>>readExcel(String fileName) throws Exception{
   List<List<String>>exceldata=newArrayList<List<String>>();
   if(fileName.endsWith(EXCEL03_EXTENSION)){// 处理excel2003文件
    Excel2003Readerexcel03 = new Excel2003Reader();
    excel03.process(fileName);
    exceldata=excel03.getExcelData();
     
   else if(fileName.endsWith(EXCEL07_EXTENSION)){// 处理excel2007文件
    //Excel2007Readerexcel07 = new Excel2007Reader();
    Excel2007ReaderNewexcel07 = new Excel2007ReaderNew();
    excel07.process(fileName);
    exceldata=excel07.getExcelData();
   }
   else {
    thrownew Exception("文件格式错误,fileName的扩展名只能是xls或xlsx!");
   }
   returnexceldata;
  }

 

5.poi+DOM,普通模式(UserModel)

 相对简单,但数据量大了报OOM错误,所以没有发展前景。

[转载]Java对Excel(03,07)进行上传、解析、验证、入库

ExcelUtil.java类,代码如下:

import java.io.FileInputStream;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;

import com.cbe.entity.Index;



public class ExcelUtil {

 public staticList<Index> GetDistrictIndex(Stringfilename){
  List<Index>indexsList=newArrayList<Index>();
  try{  
   
           }
  catch (Exception ex) {
   ex.printStackTrace();
   }
  
  return indexsList;
  
           }

 
 public staticList<List<Object>>ReadExcel(String fileName) throws Exception{
  Workbook workbook=null;
  try {
  if (fileName.endsWith(".xlsx")|| fileName.endsWith(".xls")){
   //注:add bylfc,20130702,采用这种方式同时支持xls和xlsx,而不必分别用HSSF和XSSF来做
   FileInputStreaminputStream = new FileInputStream(fileName);
   workbook=WorkbookFactory.create(inputStream);   
  }
  }
  catch (Exception e) {
  e.printStackTrace();
  }
  returnanalyzeWorkbook(workbook);
 }
  
 
 
 public static List<List<Object>>analyzeWorkbook(Workbook workbook){
  
  Sheet sheet =workbook.getSheetAt(0);//第一个表单
  List<String>columnNames=newArrayList<String>();//列
  List<Object>row=newArrayList<Object>();//一行
  List<List<Object>>rows=newArrayList<List<Object>>();//所有行
  int rowCount =sheet.getLastRowNum();//行数
  intcolumnCount=sheet.getRow(0).getPhysicalNumberOfCells();//列数
  
  for (int i = 0; i<columnCount; i++) {
   Cellcell=sheet.getRow(0).getCell(i);
   columnNames.add(getCellValue(cell).toString());
  }
  
   for(intj=0;j<=rowCount;j++){  //第一行为列名,若只是取数据则从第二行开始
    Rowr=sheet.getRow(j);
            for(Cell cell : r){ 
             row.add(getCellValue(cell));
            }
            rows.add(row);
   }
   return rows;
 }
 

 public static Object getCellValue(Cellcell){
  int cellType =cell.getCellType();
  switch (cellType) {
            case Cell.CELL_TYPE_STRING://1
                return cell.getStringCellValue();
            case Cell.CELL_TYPE_NUMERIC://0
             if (DateUtil.isCellDateFormatted(cell)){//日期格式
              Date t =cell.getDateCellValue();
              SimpleDateFormat format=newSimpleDateFormat("yyyy-MM-dd");
              return format.format(t);
             }
             else{
              returncell.getNumericCellValue();
             }
            case Cell.CELL_TYPE_BOOLEAN://4
                return cell.getBooleanCellValue();
            case Cell.CELL_TYPE_FORMULA://2
            case Cell.CELL_TYPE_BLANK://3
            case Cell.CELL_TYPE_ERROR://5
            default:
             return cell.getStringCellValue();
  }
 }
 
}

 

 

附:

测试的时候可能要监控一下耗时什么的,可以使用MyEclipse自带的jvisualvm.exe工具(我没测过)。

路径:D:ProgramFilesMyEclipseCommonbinarycom.sun.java.jdk.win32.x86_1.6.0.013bin

 

[转载]Java对Excel(03,07)进行上传、解析、验证、入库

初次运行时会提示要做jdk的校准标准化(calibration),没细研究是个什么概念,直接点确定就好。

运行后的主界面:

[转载]Java对Excel(03,07)进行上传、解析、验证、入库

 

三。验证

 这个就根据自己的需求,对特定的excel模板做特定的检查,废话就不多说了。

验证处理还是比较费劲的,所以还是控制好源头“excel模板”,整理好数据,否则问题会很多的,本身excel07采用xml的解析方式就存在不少问题的。

如图,不同的格式会有不同的xml,这个在调试程序的时候我也费了不少周折

[转载]Java对Excel(03,07)进行上传、解析、验证、入库


[转载]Java对Excel(03,07)进行上传、解析、验证、入库

 

四。入库

 这个工作就交个dao和daoImp包下的接口和类来实现了,自己写。

0 0
原创粉丝点击