python脚本编程:批量对比文本文件,根据具体字段比较差异
来源:互联网 发布:sql查询去掉重复行 编辑:程序博客网 时间:2024/06/06 02:28
有时候又这样的需求,有两个文件(里面是表形式的数据,字段有重合也有不一样的),需要对比两个文件之间的差异数据记录并摘出来
文件示例
A文件表每条记录的格式:
03090000 00049993 9222100502392220106000000020000029000170124500019054 20170124 12:30:01622908347435512917 00049996
B文件表格式
01006530 00096900 000480 0124174505 6228480478369552177 000000004066 000000000000 00000000000 0200 000000 5411 00000021 100504754110404 003081009289 00 000000 01030000 000000 00 071 000000000005 000000000000 D00000000001 1 000 6 0 0124174510 01030000 0 03 00000000000 00010111001
其中A文件有若干条记录,B文件也有若干条记录,B文件中有些记录对应的索引号在A文件中没有,现在需要找出这些记录,比如:0124174510这个字段对应在A中9222100502392220106000000020000029000170124500019054字段的后12位,根据字符串分割去批量匹配出这样的缺失数据
代码
# dates to be compareddateArr = ["170124", "170125", "170130", "170206", "170211", "170228", "170304", "170309", "170314", "170321", "170325"]# local path that contains datasrc_dir = "./src_data"res_dir = "./res_data"# the exact merchant ID to be concernedgMchtId = "100502392220106"# read files and compare, then write as recordsprint "start to compare file..."for dateStr in dateArr: print "comparing " + dateStr + " files" mic_file_name = "M_IC" + dateStr + "OTRAD100502392220106" acom_file_name = "no_chongzhengIND" + dateStr + "01ACOM" # define mic set at this date micIndexSet = set() # read mic file and create index keys print "reading " + dateStr + " mic file" with open(src_dir + '/' + mic_file_name, 'r') as micFileStream: # process file line by line for micLineStr in micFileStream: # pass the empty line if len(micLineStr) == 0: print "empty mic line" break # slice strings micLineDataArray = micLineStr.split() combinedInfo = micLineDataArray[2] micMchtId = combinedInfo[4:19] # pass wrong merchant ids if micMchtId != gMchtId: continue # get query index micIndex = combinedInfo[-12:] # add to mic index set micIndexSet.add(micIndex) # define linestr array to save the result lines resultLineStr = list() # read acom file and compare index keys print "reading " + dateStr + " acom file" with open(src_dir + '/' + acom_file_name, 'r') as acomFileStream: # process file line by line for acomLineStr in acomFileStream: if len(acomLineStr) == 0: print "empty acom line" break acomLineDataArray = acomLineStr.split() acomMchtId = acomLineDataArray[12] if acomMchtId != gMchtId: continue acomIndex = acomLineDataArray[13] # save the diffed lines if acomIndex not in micIndexSet: resultLineStr.append(acomLineStr) # write the result lines to file print "write " + dateStr + " result file" with open(res_dir + '/' + dateStr + "_result", 'w') as resultFileStream: res_str = "" for line in resultLineStr: res_str += line + '\n' resultFileStream.write(res_str)print "compare over"
截图
根据文件夹里文件的日期去批量拼文件名,结果置于另一文件夹,python处理速度还是不错的
0 0
- python脚本编程:批量对比文本文件,根据具体字段比较差异
- 文本文件差异对比工具
- 文本文件比较脚本
- python脚本编程:批量压缩图片大小
- python脚本实现批量更新数据库中字段值
- ant脚本-比较文件夹差异
- 比较excel差异的脚本
- 一个简单比较oracle两个用户下表数据差异的python脚本
- python脚本编程:批量下载指定页面图片
- python脚本编程:批量修改指定目录内文件名
- python脚本编程:批量复制或删除文件
- Python批量运行脚本
- Python SQL批量脚本
- DOS命令比较两个文本文件txt的内容差异
- 根据字段关联批量修改关联数据
- toad比较数据库差异脚本操作步骤
- python学习之文件差异对比
- Python 3.6 list对比并输出差异
- Watson Explorer 入门(5):配置内容分析工作室(Studio)
- 记一次synchronized锁字符串引发的坑兼再谈Java字符串
- phpexcel乱码解决
- JAXB解析生成xml
- webpack入门指南
- python脚本编程:批量对比文本文件,根据具体字段比较差异
- 填充正方形
- eclipse的Server中没有tomcat选项的解决安装方法
- eclipse快速定位到错误处
- iOS开发网络篇—发送GET和POST请求(使用NSURLSession)
- vs2012快捷键总结
- Managing DNS zone files with dnspython
- jq 实时监听input变化的值
- Unity双击返回键退出游戏适用于Android与PC