使用脚本简化工作

来源：互联网发布：安卓市场知乎编辑：程序博客网时间：2024/05/22 19:24

转载请注明出处:http://blog.csdn.net/horkychen

*今天新建了一个类别:生产力。主要想集中写一些提升工作效率的思考和实践，用技术解决问题。据说Google都有专门的生产力部门，专司研究工具和方法，可见生产力虽然零碎，但如果能有系统的加以研究，一定会些收获。

如果工作中有一些需要从网上获取资料的工作，比如获取哪些人还有多少Bug之类的事情，那么下面的内容或许可以给些启发。

网页数据基本是以HTTP Request获取数据。比如我这篇文章里提到了，使用XPath可以从CSDN博客上获得想要的数据。下面要讲的是服务器发回的是JSON数据。

首先找个HTTP Proxy或Web Sniffer之类的工具(参考这篇)，找到获取数据的请求包，就可以清楚看到Request包的内容。下图是Charles的截图:

其中Host后的内容加上GET后的内容就是完整的Request URL，Cookie是当服务器需要认证时，要传送回去的资料(正常登录后就可以看到. FireFox&Chrome都有相关的组件可以管理Cookies) User-Agent建议也要记下，因为有些服务器会对这个值进行判断，当传入不支持的UA时，有可能得不到数据。

整个数据处理过程如下:

1. 准备一个带有指定头数据(Cookie & UA)的请求包,并发送给服务器。

2. 读取返回的数据。

3. 转换数据为可识别的数据格式。

4. 分析加工。

下面就是相关的代码，很简单！如果Mac OS下，可以使用这篇文章提到的方法，写个脚本执行并保存结果到文件中，然后使用open打开结果文件，就更方便了。

#!/usr/bin/python # coding: utf-8import urllib2,jsonoutputKeys = {"id","name","description","text"}collectedRes = {}pointCount = 0txtHeader = { "Origin": "http://xxxxx",              "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5",              "Cookie":"theme=gray; xxxxx=xxxxx"}url = "http://xxxxxxxxxxxxxxxxx"#check the point if was reported by our members#If true, count it and print the detail.def checkKey(data):    if data.has_key("owner"):        name = data["owner"]        if members.count(name)>0:            printDetail(data) #Print out the detail information for reference            if collectedRes.has_key(name):                collectedRes[name] = collectedRes[name]+1            else:                collectedRes[name] = 1       #iterate the full dictionary and check each key    def iterateDictionary(dict):    for key in dict:         if not key['hasitems']:              checkKey(key)         else:              iterateDictionary(key["subitems"])#print the summary result                       def printResult():    #FIXME:这里没有重名处理    print "\n\nSummary:"    for key in collectedRes.keys():        print key, " : " ,collectedRes[key]def getURLData():    global url,txtHeader    urlOpener = urllib2.build_opener()    request = urllib2.Request(url, headers=txtHeader)    url = urlOpener.open(request)    page = url.read(500000)    return page# Main entry of the auto building toolif __name__ =="__main__":    #1. send the request and data    page = getURLData()    #2. convert JSON data to dictionary.     pointDict = json.loads(page)     #3. check all data in the dictionary.     iterateDictionary(pointDict)     #4. print the summary     result. printResult()

参考:

开发工作中使用的软件列表

懒人可以用Automator提高工作效率

使用脚本简化工作

程序员要学会偷懒---正确运用自动化技术

如何使用搜索技巧来成为一名高效的程序员

[《人件》摘录]: 生产力:赢得战役和输掉战争