使用python编写数据检索脚本

来源：互联网发布：霍元甲歌词知乎编辑：程序博客网时间：2024/06/10 23:21

日常工作中，时常会遇到因数据错误而引起的bug，这个问题，有时候会很头疼，费尽心力排查原因，最后发现非逻辑性问题，仅是由一个数据填错而引发的，真是觉得耗费时间，一次两次还能接受，但是对于一个数据量很大的工程来说，让你多次排查类似的错误，真是一件体力活。为了不再干这件苦差事，遂决定用python写一个数据检索的脚本，把错误提前暴露出来，解放了自己也提高了团队的工作效率。

环境
1. python3.6
2. win7 64位

python语法及函数使用请参照官方文档
python官方文档
python基础教程

定义windows控制台下的标准输出句柄和颜色宏
设置文本颜色，自定义格式化输出
脚本名称：data_review.py

import codecsimport ctypesimport sys# 标准输出句柄STD_INPUT_HANDLE    = -10STD_OUTPUT_HANDLE   = -11STD_ERROR_HANDLE    = -12# 前景色FOREGROUND_BLACK        = 0x0FOREGROUND_BLUE         = 0x01FOREGROUND_GREEN        = 0x02FOREGROUND_RED          = 0x04FOREGROUND_YELLOW       = 0x0eFOREGROUND_INTENSITY    = 0x08# 背景色BACKGROUND_BLUE         = 0x10BACKGROUND_GREEN        = 0x20BACKGROUND_RED          = 0x40BACKGROUND_YELLOW       = 0xe0BACKGROUND_INTENSITY    = 0x80# get handlestd_out_handle = ctypes.windll.kernel32.GetStdHandle(STD_OUTPUT_HANDLE)def set_text_color(color, handle = std_out_handle):    ctypes.windll.kernel32.SetConsoleTextAttribute(handle, color)def reset_color():    set_text_color(FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE)def print_red_msg(msg):    set_text_color(FOREGROUND_RED | FOREGROUND_INTENSITY)    print(msg)    reset_color()def print_green_msg(msg):    set_text_color(FOREGROUND_GREEN | FOREGROUND_INTENSITY)    print(msg)    reset_color()def print_blue_msg(msg):    set_text_color(FOREGROUND_BLUE | FOREGROUND_INTENSITY)    print(msg)    reset_color()def print_yellow_msg(msg):    set_text_color(FOREGROUND_YELLOW | FOREGROUND_INTENSITY)    print(msg)    reset_color()

定义数据字典
以任务表为例

# 数据字典Quest_Dict              = {}

定义数据检索函数
这里需要留意codecs.open的使用，参考官方文档说明

codecs.open(filename, mode='r', encoding=None, errors='strict', buffering=1)Open an encoded file using the given mode and return an instance of StreamReaderWriter, providing transparent encoding/decoding. The default file mode is 'r', meaning to open the file in read mode.Note: Underlying encoded files are always opened in binary mode. No automatic conversion of '\n' is done on reading and writing. The mode argument may be any binary mode acceptable to the built-in open() function; the 'b' is automatically added.encoding specifies the encoding which is to be used for the file. Any encoding that encodes to and decodes from bytes is allowed, and the data types supported by the file methods depend on the codec used.errors may be given to define the error handling. It defaults to 'strict' which causes a ValueError to be raised in case an encoding error occurs.buffering has the same meaning as for the built-in open() function. It defaults to line buffered.

对于codecs.open所使用的文件编码格式参照官方文档 7.2.3. Standard Encodings 中所列出的。我们这里文件使用的带BOM的UTF8的编码格式。注意 utf_8 和 utf_8_sig 的区别。

7.2.7. encodings.utf_8_sig — UTF-8 codec with BOM signatureThis module implements a variant of the UTF-8 codec: On encoding a UTF-8 encoded BOM will be prepended to the UTF-8 encoded bytes. For the stateful encoder this is only done once (on the first write to the byte stream). For decoding an optional UTF-8 encoded BOM at the start of the data will be skipped.

如何检索数据

def data_review_func():    # start data review    print_green_msg('start data review ... ...')    # 检测结果    bRet = True    ##############################################################     # 建立Quest字典    ##############################################################     try:        fd = codecs.open('./data/Quest.csv','r','utf_8_sig')    except Exception as err:        print_red_msg(str(err))        return    try:        for line in fd:            szQuestID,szData = line.split(',',1)            key_ID = int(szQuestID)            if key_ID <= 0:                print_red_msg('Quest.csv: invaild index for -1')                continue            if key_ID in Quest_Dict:                print_yellow_msg('Quest.csv: {0} is repeat'.format(key_ID))            else:                Quest_Dict[key_ID] = szData    except Exception as err:        print_red_msg('Quest.csv: ' + str(err))        bRet = False    finally:        fd.close()# 在RandQuest.csv中检索任务ID的有效性    try:        fd = codecs.open('./data/RandQuest.csv','r','utf_8_sig')    except Exception as err:        print_red_msg(str(err))        return    try:        for line in fd:            szDataList = line.split(',',-1)            for index in range(4,20,2):                if int(szDataList[index]) <= 0:                    continue                if not int(szDataList[index]) in Quest_Dict:                    print_red_msg('RandQuest.csv: Quest ID {0} is not found in Quest.csv'.format(szDataList[index]))                    bRet = False    except Exception as err:        print_red_msg('RandQuest.csv: ' + str(err))        bRet = False    finally:        fd.close()    # check over    if bRet:        print_green_msg('everything is OK !')    else:        print_red_msg('there is something wrong !')    return

相关语法和函数说明参照
python官方文档
python基础教程

在data_review.py末尾添加执行函数

# 检查数据表data_review_func()

window环境可以新建一个bat脚本用于执行python命令，bat脚本内容如下

python data_review.py pause

双击data_review.bat即开始执行data_review.py脚本

数据有误的测试效果
第一次检测结果

修改之后的测试效果
第二次检测结果

ps: 当输出的log很多时，终端无法一次性显示完全时，可以用管道符一页一页显示，具体做法：修改data_review.bat

python data_review.py | morepause

输出结果按space键翻页但log的文本颜色会被重置 (⊙o⊙)…

也可以将输出的log重定向到文件中，如：

python data_review.py >data_review_ret.txt@echo offecho.echo 检查结果已保存在dat_review_ret.txt中echo.pause

阅读全文

0 0