Go如何按行读取文件及bufio.Split()函数的使用

来源：互联网发布：脚本编程课程编辑：程序博客网时间：2024/06/08 19:30

　　最近初接触了go这门语言，为了更加深入学习，完成了一个项目。将一个c语言实现的linux读取文件行命令程序修改为go语言实现。以下是项目地址：　　
原项目：https://www.ibm.com/developerworks/cn/linux/shell/clutil/index.html
go实现的项目：https://github.com/kangbb/go-learning/tree/master/selpg
　　
　　当然，项目做完后觉得很简单，不过，过程中还是遇到了不少麻烦，尤其是读写文件的时候。所以，对这方面的内容做了一下总结。

Go如何打开文件

　　主要有下面两个函数：

import "os"//1func Open(name string) (file *File, err error)//2func OpenFile(name string, flag int, perm FileMode) (file *File, err error)

　　其实两个函数差不多，一般来说，使用第一个就可以完成正常的读写。当然，更加推荐使用第二个，尤其是在linux下，有时候perm是必需的（例如创建文件的时候）。name是文件的地址及文件名，flag是一些打开文件选项的常量，perm是文件操作权限。更多请参考：
flag: https://go-zh.org/pkg/os/
perm:https://go-zh.org/pkg/os/#FileMode

Go如何按行读取文件

　　go按行读取文件主要有三种方式。前两种相对简单，第三种会比较难一些，但是我觉得用途更见广泛，用起来更见自由。

第一种方式：

import ("bufio""fmt")func useNewReader(filename string) {    var count int = 0    fin, error := os.OpenFile(filename, os.O_RDONLY, 0)    if error != nil {        panic(error)    }    defer fin.Close()    /*create a Reader*/    rd := bufio.NewReader(fin)    /*read the file and stop when meet err or EOF*/    for {        line, err := rd.ReadString('\n')        if err != nil || err == io.EOF {            break        }        count++        /*for each line, process it.          if you want it ouput format in command-line, you need clean the '\f'*/        line = strings.Replace(line, "\f", "", -1)        fmt.Printf("the line %d: %s", count, line)    }}

第二种方式：

import(  "fmt"  "os"  "bufio")func useNewScanner(filename string) {    var count int = 0    fin, error := os.OpenFile(filename, os.O_RDONLY, 0)    if error != nil {        panic(error)    }    defer fin.Close()    sc := bufio.NewScanner(fin)    /*default split the file use '\n'*/    for sc.Scan() {        count++        fmt.Printf("the line %d: %s\n", count, sc.Text())    }    if err := sc.Err(); err != nil{        fmt.Prinfln("An error has hippened")    }}

第三种方式：

import(  "fmt"  "os"  "bufio")var LineSplit = func(data []byte, atEOF bool) (advance int, token []byte, err error) {    /*read some*/    if atEOF && len(data) == 0 {        return 0, nil, nil    }    /*find the index of the byte '\n'      and find another line begin i+1      default token doesn't include '\n'*/    if i := bytes.IndexByte(data, '\n'); i > 0 {        return i + 1, dropCR(data[0:i]), nil    }    /*at EOF, we have a final, non-terminal line*/    if atEOF {        return len(data), dropCR(data), nil    }    /*read some more*/    return 0, nil, nil}func dropCR(data []byte) []byte {    /*drop the '\f'      if you don't need, you can delete it*/    if i := bytes.IndexByte(data, '\f'); i >= 0 {        tmp := [][]byte{data[0:i], data[(i + 1):]}        sep := []byte("")        data = bytes.Join(tmp, sep)    }    if len(data) > 0 && data[len(data)-1] == '\r' {        return data[0 : len(data)-1]    }    return data}func useSplit(filename string) {    var count int = 0    fin, error := os.OpenFile(filename, os.O_RDONLY, 0)    if error != nil {        panic(error)    }    defer fin.Close()    sc := bufio.NewScanner(fin)    /*Specifies the matching function, default read by lines*/    sc.Split(LineSplit)    /*begin scan*/    for sc.Scan() {        count++        fmt.Printf("the line %d: %s\n", count, sc.Text())    }    if err := sc.Err(); err != nil{        fmt.Prinfln("An error has hippened")    }}

　　整体看起来，第二种方法可能更加简单。因为它的代码最少。实际上，第三种和第二种一样，不过过换了一种写法。因为Scanner.Scan()默认按行读取，所以第二种方法中省略了：

  sc.Split(bufio.ScanLines)

　　如果你认真看过源码，你会发现，我的第三种方式实现的按行读取的LineSplit函数，实际上来自go的bufio.go包，以下是地址：
　　
bufio.go：https://go-zh.org/src/bufio/bufio.go

这里之所以列出来，是希望能够着重强调一下如何使用Scanner.Split()函数和Scanner.Scan()函数搭配来读取文件，或者分割字符串并输出。当然，这些也可以通过strings包来实现，相对来说，它的功能更加全面一些。需要向大家强调一点：当你不会使用go语言函数时，可以多看看官方提供的文档以及包中的源代码。

Go如何按页读取文件

　　既然按行读取已经实现，那么按页读取也很简单啦。只需要稍微修改依稀代码即可实现。
　　
对于第一种方式：

page, err := rd.ReadString('\f')

当然，这样一般情况下就可以了。但是，对于Reader.ReadString()函数来说，假如入它找不到分隔符’\f’，会再多读一些字节数据进去；假如再找不到，会再读一些，直到缓存区满，才返回已经读取的字节数据；但是，假如一直到遇到EOF都没有’\f’，并且缓存去足够大，这时候，如果按照刚刚的修改，就再屏幕上看不到输出了，所以，还需要修改：

if err != nil || err == io.EOF {    /*if it has no '\f' behind the last line*/    if err == io.EOF && len(page) != 0 {        count++        fmt.Printf("the page %d:\n%s\n", count, page)    }    break}

对于第三种方式：

var LineSplit = func(data []byte, atEOF bool)     (advance int, token []byte, err error) {    /*read some*/    if atEOF && len(data) == 0 {        return 0, nil, nil    }    /*find the index of the byte '\f'      and find another line begin i+1      default token doesn't include '\n'*/    if i := bytes.IndexByte(data, '\f'); i > 0 {        return i + 1, dropCR(data[0:i]), nil    }    /*at EOF, we have a final, non-terminal line*/    if atEOF {        return len(data), dropCR(data), nil    }    /*read some more*/    return 0, nil, nil}func dropCR(data []byte) []byte {    /*drop the '\f'      if you don't need, you can delete it*/    if i := bytes.IndexByte(data, '\f'); i >= 0 {        tmp := [][]byte{data[0:i], data[(i + 1):]}        sep := []byte("")        data = bytes.Join(tmp, sep)    }    return data}

　　以上便是对这次学习的总结，也希望对大家有所帮助。

源码下载地址：https://github.com/kangbb/go-learning/tree/master/readfile

阅读全文

0 0