POI-处理大Excel文件(xlsx)

来源:互联网 发布:黑客数据交易平台 编辑:程序博客网 时间:2024/05/29 19:58

上次简述了使用poi读取大xls文件,这里说下读取xlsx格式的文件的方法

环境模拟

先准备一个大的excel文件(xlsx大小5M),再将jvm的heap缩小到100m(JVM 参数 -Xmx100m)用于模拟OOM
并使用参数在OOM时dump内存 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=d://dump.hprof

使用XSSF读取

在gradle中引入解析xlsx需要的jar包

    compile 'org.apache.poi:poi:3.15'    compile 'org.apache.poi:poi-ooxml:3.15'    compile 'xerces:xercesImpl:2.11.0'

之后读取xlsx文件

    public static void main(String [] args) throws IOException {        InputStream is = new FileInputStream("d://large.xlsx");        Workbook wb = new XSSFWorkbook(is);    }

运行之后

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space    at java.util.Arrays.copyOf(Arrays.java:3236)    at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)    at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource$FakeZipEntry.<init>(ZipInputStreamZipEntrySource.java:136)    at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:56)    at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:99)    at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:342)    at org.apache.poi.util.PackageHelper.open(PackageHelper.java:37)    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:285)    at blog.excel.Xlsx.main(Xlsx.java:17)    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)    at java.lang.reflect.Method.invoke(Method.java:483)    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

同样报出了OOM,原因也是处理xlsx时会将数据完全读入内存,导致内存溢出。

使用EventApi流式读取

POI也为xlsx提供了流式读取的方式,用于减小内存的使用

public class ExampleEventUserModel{    public void processOneSheet(String filename) throws Exception {        OPCPackage pkg = OPCPackage.open(filename);        XSSFReader r = new XSSFReader( pkg );        SharedStringsTable sst = r.getSharedStringsTable();        XMLReader parser = fetchSheetParser(sst);        // 获得第一个sheet        InputStream sheet2 = r.getSheet("rId1");        InputSource sheetSource = new InputSource(sheet2);        parser.parse(sheetSource);        sheet2.close();    }    public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {        XMLReader parser =                XMLReaderFactory.createXMLReader(                        "org.apache.xerces.parsers.SAXParser"                );        ContentHandler handler = new SheetHandler(sst);        parser.setContentHandler(handler);        return parser;    }    /**     * 处理sax的handler     */    private static class SheetHandler extends DefaultHandler {        private SharedStringsTable sst;        private String lastContents;        private boolean nextIsString;        private SheetHandler(SharedStringsTable sst) {            this.sst = sst;        }        //元素开始时的handler        public void startElement(String uri, String localName, String name,                                 Attributes attributes) throws SAXException {            // c => 单元格            if(name.equals("c")) {                System.out.print(attributes.getValue("r") + " - ");                // 获取单元格类型                String cellType = attributes.getValue("t");                if(cellType != null && cellType.equals("s")) {                    nextIsString = true;                } else {                    nextIsString = false;                }            }            lastContents = "";        }        //元素结束时的handler        public void endElement(String uri, String localName, String name)                throws SAXException {            if(nextIsString) {                int idx = Integer.parseInt(lastContents);                lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();                nextIsString = false;            }            // v => 单元格内容            if(name.equals("v")) {                System.out.println(lastContents);            }        }        //读取元素间内容时的handler        public void characters(char[] ch, int start, int length)                throws SAXException {            lastContents += new String(ch, start, length);        }    }    public static void main(String[] args) throws Exception {        ExampleEventUserModel example = new ExampleEventUserModel();        example.processOneSheet("d://large.xlsx");    }}

不足

同样的使用这种方法可以流式读取打的xlsx文件,但是只限于读取内部的数据,而且无法进行修改操作。


0 0
原创粉丝点击