openxlsx包:读取/输出excel数据

来源:互联网 发布:sql server 分类查询 编辑:程序博客网 时间:2024/06/05 01:53

可以非常方便地读取、写入、编辑.xlsx文件,无需设置java,同时输出的table样式比write.csv从看起来更加舒适。

其他非.xlsx的excel文件可另存为.xlsx文件后使用这个包


描述:

openxlsx simplifies the the process of writing and styling Excel xlsx files from R and removes the dependency on Java.

简化了处理xlsx文件的过程

可以用于读取xlsx文件,将数据写入到xlsx文件,设置xlsx文件的样式


1. 从xlsx文件中读取数据

(1)read.xlsx()函数

Read data from an Excel file or Workbook object into a data.frame


用法:

read.xlsx(xlsxFile, sheet = 1, startRow = 1, colNames = TRUE, rowNames = FALSE, 

detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE, 

rows = NULL, cols = NULL, check.names = FALSE, namedRegion = NULL, na.strings = "NA", fillMergedCells = FALSE)

常用参数:

xlsxFile:An xlsx file, Workbook object or URL to xlsx file. xlsx文件路径
sheet:The name or index of the sheet to read data from. sheet名或者数字索引
startRow:first row to begin looking for data. Empty rows at the top of a file are always skipped, regardless of the value of startRow. 从哪一行开始读入
colNames:If TRUE, the first row of data will be used as column names. 如果TRUE,则第一行作为列名
rowNames:If TRUE, first column of data will be used as row names. 如果TRUE,则第一列作为行名
skipEmptyRows:If TRUE, empty rows are skipped else empty rows after the first row containing data will return a row of NAs.是否跳过空白行,否则该行返回NA
skipEmptyCols:If TRUE, empty columns are skipped. 是否跳过空白列
rows:A numeric vector specifying which rows in the Excel file to read. If NULL, all rows are read. 数值型向量,读入哪些行,默认为全部读入
cols:A numeric vector specifying which columns in the Excel file to read. If NULL, all columns are read.数值型向量,读入哪些列,默认为全部读入

返回值:数据框


(2)readWorkbook()

Read data from an Excel file or Workbook object into a data.frame


用法:

readWorkbook(xlsxFile, sheet = 1, startRow = 1, colNames = TRUE,  rowNames = FALSE, 

detectDates = FALSE, skipEmptyRows = TRUE,  skipEmptyCols = TRUE, 

rows = NULL, cols = NULL, check.names = FALSE, namedRegion = NULL, na.strings = "NA", fillMergedCells = FALSE)


参数:同read.xlsx


与read.xlsx()函数的区别:

使用上应该是没有区别吧(其他不确定)

通过查看源码发现:read.xlsx()为泛型函数,而readWorkbook()函数在内部只调用read.xlsx函数。


2. 向xlsx文件中写入数据

(1)方案一:write.xlsx()

write a data.frame or list of data.frames to an xlsx file

用法:

write.xlsx(x, file, asTable = FALSE, ...)

参数:

x:object or a list of objects that can be handled by writeData to write to file
file:xlsx file name
asTable:write using writeDataTable as opposed to writeData
...:optional parameters to pass to functions:

常用的可选参数:

firstRow = TRUE:冻结第一行

colWidths = ‘auto’:自动设置列宽

如果需要使用粳稻的可选参数设置样式,可以选择使用方案二写入数据


(2)方案二:createWorkbook()+ addWorksheet()+ writeData()或writeDataTable()+ saveWorkbook()

创建Workbook - 插入sheet - 在sheet中写入数据 - 保存workbook

(2.1)创建新的workbook:createWorkbook(creator = Sys.getenv("USERNAME"))

参数:creator:Creator of the workbook (your name). Defaults to login username

返回值:workbook对象

(2.2)向workbook中插入新的sheet表:

addWorksheet(wb, sheetName, gridLines = TRUE, ...

(2.3)向sheet中写入数据

writeData(wb, sheet, x, startCol = 1, startRow = 1, xy = NULL, colNames = TRUE, rowNames = FALSE, headerStyle = NULL,...)

writeDataTable(wb, sheet, x, startCol = 1, startRow = 1, xy = NULL, colNames = TRUE, rowNames = FALSE, headerStyle = NULL,...)

Write to a worksheet and format as an Excel table.

(2.5)设置样式

createStyle():设置边框,字体,字号,背景色等,注意如果设置多行或者多列不连续,需要使用循环结果依次设置

addStyle():将样式添加到sheet上

freezePane():设置冻结窗格

setColwidths():设置行宽

(2.4)保存workbook

saveWorkbook(wb, file, overwrite = FALSE)


3. 使用options()设置样式(全局)

options("openxlsx.borderColour" = "black"):表格边框颜色
options("openxlsx.borderStyle" = "thin"):表格边框样式
options("openxlsx.dateFormat" = "mm/dd/yyyy"):日期格式
options("openxlsx.datetimeFormat" = "yyyy-mm-dd hh:mm:ss"):时间日期格式
options("openxlsx.numFmt" = NULL):
options("openxlsx.paperSize" = 9) ## A4
options("openxlsx.orientation" = "portrait") ## page orientation


4. 额外设置样式

(1)创建一个style对象createStyle()

createStyle(fontName = NULL, fontSize = NULL, fontColour = NULL,
  numFmt = "GENERAL", border = NULL,
  borderColour = getOption("openxlsx.borderColour", "black"),
  borderStyle = getOption("openxlsx.borderStyle", "thin"), bgFill = NULL,
  fgFill = NULL, halign = NULL, valign = NULL, textDecoration = NULL,
  wrapText = FALSE, textRotation = NULL, indent = NULL)

(2)将创建的style对象用于某些单元格:addStyle(wb, sheet, style, rows, cols, gridExpand = FALSE, stack = FALSE)

(3)openXL(wb) ## opens a temp version 设置好样式后浏览下

(4)冻结窗格:freezePane(wb, sheet, firstActiveRow = NULL, firstActiveCol = NULL, firstRow = FALSE, firstCol = FALSE)

(5)设置列宽:setColWidths(wb, sheet, cols, widths = 8.43, hidden = rep(FALSE, length(cols)), ignoreMergedCells = FALSE)

如需自动列宽,将widths = ‘auto’



未解决问题:

在数据量较大时, 对数据框格式的数据使用write.xlsx()或其他写入excel文件的函数时,第一次可以调用成功,但是把project关闭重新打开后,经常会出现

Error in is.nan(tmp) : default method not implemented for type 'list'