用python实现文件比较

来源：互联网发布：windows 流媒体服务器编辑：程序博客网时间：2024/06/04 19:30

越来越发现python非常适合做一些日常开发的工具。

平时，我们经常用一些比较工具，比较目录、比较两个文本文件的变化。最近发现，python的标准库里居然带了这些功能的算法。自己处理一下，就可以写出一个很实用的比较工具了。

文件和目录比较Module叫做filecmp。最酷的是他提供了一个叫dircmp的类，可以直接比较两个目录，给出下面的结果。

left_list: Files and subdirectories in a, filtered by hide and ignore.

right_list

Files and subdirectories in b, filtered by hide andignore.

common

Files and subdirectories in both a and b.

left_only

Files and subdirectories only in a.

right_only

Files and subdirectories only in b.

common_dirs

Subdirectories in both a and b.

common_files

Files in both a and b

common_funny

Names in both a and b, such that the type differs between the directories, or names for whichos.stat() reports an error.

same_files

Files which are identical in both a and b.

diff_files

Files which are in both a and b, whose contents differ.

funny_files

Files which are in both a and b, but could not be compared.

subdirs

A dictionary mapping names in common_dirs to dircmp objects.

另外，它还提供了几个函数，可以递归比较子目录，输出一个文本报告。不过觉得这个功能用处不大，除非他的格式刚好满足要求。不过，代码倒是可以参考。

另一个重要的模块是使difflib。包括两个类SequenceMatcher，使这个模块的基础，可以用来比较任意两个序列的变化，list/string都没问题。另外一个Differ,可以比较两个文本文件的差别，产生一个文本文件的报告。还有一个更酷的HTMLDiff,可以直接产生HTML格式的比较报告。感觉后两个类的主要价值是提供了一个框架，我们可以做一些修改，按照自己的格式定制报告。如果对报告要求特别，可以直接使用SequenceMatcher。

贴个小例子：[这个代码是pyhton 2.x的]

def reportSingleFile(srcfile, basefile, rpt):

src = file(srcfile).read().split('')

base = file(basefile).read().split('')

import difflib

s = difflib.SequenceMatcher( lambda x: len(x.strip()) == 0, # ignore blank lines

base, src)

lstres = []

for tag, i1, i2, j1, j2 in s.get_opcodes():

print (tag, i1, i2, j1, j2)

#print lstres

if tag == 'equal':

pass

elif tag == 'delete' :

lstres.append('DELETE (line: %d)' % i1)

lstres += base[i1:i2]

lstres.append('')

elif tag == 'insert' :

lstres.append('INSERT (line: %d)' % j1)

lstres += src[j1:j2]

lstres.append('')

elif tag == 'replace' :

lstres.append('REPLACE:')

lstres.append('Before (line: %d) ' % i1)

lstres += base[i1:i2]

lstres.append('After (line: %d) ' % j1)

lstres += src[j1:j2]

lstres.append('')

else:

pass

print ''.join(lstres)

用久了C++，算法要么自己写，要么去网上找。用Python之后，感觉真的不一样了。虽然python的运行效率不高，但是开发效率确实非常高。很适合做一些平时用的小工具。