屏幕抓取和其他实用程序

来源:互联网 发布:midi伴奏制作软件 编辑:程序博客网 时间:2024/06/06 00:03

使用谷歌地图 API 搜索公司地址

需要安装第三方库:

pip install pygeocoder
from pygeocoder import Geocoderdef search_business(business_name):    results = Geocoder.geocode(business_name)    for result in results:        print(result)if __name__ == '__main__':    business_name = 'google lnc., California'    print('Searching {}...'.format(business_name))    search_business(business_name)

OUTPUT

$ python3 search_business_addr.pySearching google lnc., California...1400 Crittenden Ln, Mountain View, CA 94043, USA

使用谷歌地图 URL 搜索地理坐标

需要使用 Python Client for Google Maps Services googlemaps

搜索维基百科中的文章

需要安装第三方库:

pip install wikipedia
import wikipediaimport argparseif __name__ == '__main__':    parser = argparse.ArgumentParser(description = 'Wikipedia search')    parser.add_argument('--query', action = 'store', dest = 'query',        required = True)    given_args = parser.parse_args()    search_term = given_args.query    print('Search Wikipedia for', search_term)    results = wikipedia.search(search_term)    print('Listing {} search results...'.format(len(results)))    for result in results:        item = wikipedia.page(result)        print('----- {} -----\n\t{}\n\t{}'.format(item.title,            item.url, wikipedia.summary(result, sentences = 1)))    print('----- End of search results -----')

OUTPUT

$ python3 search_article_in_wikipedia.py --query 'Islam'Search Wikipedia for IslamListing 10 search results...----- Islam -----    https://en.wikipedia.org/wiki/Islam    Islam () is an Abrahamic monotheistic religion teaching that there is only one incomparable God (Allah) and that Muhammad is the messenger of God.----- Hujjat al-Islam -----    https://en.wikipedia.org/wiki/Hujjat_al-Islam    Hujjat al-Islam (from Arabic: حجة الإسلام‎‎ ḥujjatu l-Islām) (also Hojatoleslam) is an honorific title meaning "authority on Islam" or "proof of Islam".----- Criticism of Islam -----    https://en.wikipedia.org/wiki/Criticism_of_Islam    Criticism of Islam has existed since its formative stages.----- Islam in Iran -----    https://en.wikipedia.org/wiki/Islam_in_Iran    The Islamic conquest of Persia (637651) led to the end of the Sasanian Empire and the eventual decline of the Zoroastrian religion in Persia.----- Five Pillars of Islam -----    https://en.wikipedia.org/wiki/Five_Pillars_of_Islam    The Five Pillars of Islam (arkān al-Islām أركان الإسلام; also arkān al-dīn أركان الدين "pillars of the religion") are five basic acts in Islam, considered mandatory by believers and are the foundation of Muslim life.----- Apostasy in Islam -----    https://en.wikipedia.org/wiki/Apostasy_in_Islam    Apostasy in Islam (Arabic: ردة‎‎ riddah or ارتداد irtidād) is commonly defined as the conscious abandonment of Islam by a Muslim in word or through deed.----- Women in Islam -----    https://en.wikipedia.org/wiki/Women_in_Islam    The experiences of Muslim women vary widely between and within different societies.----- Islam and violence -----    https://en.wikipedia.org/wiki/Islam_and_violence    Mainstream Islamic law stipulates detailed regulations for the use of violence, including the use of violence within the family or household, the use of corporal and capital punishment, as well as how, when and against whom to wage war.----- Christianity and Islam -----    https://en.wikipedia.org/wiki/Christianity_and_Islam    Christianity and Islam are the largest religions in the world and share a historical and traditional connection, with some major theological differences.----- Spread of Islam -----    https://en.wikipedia.org/wiki/Spread_of_Islam    The expansion of the Muslim Empire in the years following the Prophet Muhammad's death led to the creation of the caliphates, occupying a vast geographical area and conversion to Islam was boosted by missionary activities particularly those of Imams, who easily intermingled with local populace to propagate the religious teachings.----- End of search results -----

使用谷歌搜索股票

需要使用第三方库:

$ pip install googlefinance
#!/usr/bin/env python3from googlefinance import getQuotes# import jsonimport argparsedef get_quote(symbol):    quote = getQuotes(symbol)[0]    for (idx, detail) in quote.items():        print('  {}: {}'.format(idx, detail))if __name__ == '__main__':    parser = argparse.ArgumentParser(description = 'stock quote search')    parser.add_argument('--symbol', action = 'store', dest = 'symbol',        required = True)    given_args = parser.parse_args()    symbol = given_args.symbol    print('Searching stock quote for symbol:', symbol)    get_quote(symbol)

OUTPUT

$ ./google_stock_quote.py --symbol googSearching stock quote for symbol: goog    Index: NASDAQ  StockSymbol: GOOG  LastTradeTime: 10:11AM EDT  LastTradeDateTime: 2017-08-10T10:11:24Z  LastTradePrice: 913.22  LastTradeDateTimeLong: Aug 10, 10:11AM EDT  ID: 304466804484872  LastTradeWithCurrency: 913.22

搜索 GitHub 中的源代码仓库

#!/usr/bin/env python3import requestsimport argparseSEARCH_URL_BASE = 'https://api.github.com/repos'def search_repository(author, repo, search_for):    url = '{}/{}/{}'.format(SEARCH_URL_BASE, author, repo)    print('Searching Repo URL:',url)    r = requests.get(url)    if(r.status_code == requests.codes.ok):        repo_info = r.json()        print('Github repository info for:', repo)        r = 'No result found!'        keys = []        for key, value in repo_info.items():            if search_for in key:                r = value        return rif __name__ == '__main__':    parser = argparse.ArgumentParser(description = 'Github search')    parser.add_argument('--author', '-a', action = 'store', dest = 'author',        required = True)    parser.add_argument('--repo', '-r', action = 'store', dest = 'repo',        required = True)    parser.add_argument('--search_for', '-s', action = 'store',        dest = 'search_for', required = True)    given_args = parser.parse_args()    r = search_repository(given_args.author, given_args.repo,        given_args.search_for)    if isinstance(r, dict):        print('Got result for {}...'.format(given_args.search_for))        for key, value in r.items():            print('{} ==> {}'.format(key, value))    else:        print('Got result for {}: {}'.format(given_args.search_for, r))

OUTPUT

$ ./search_code_github.py -a django -r django -s ownerSearching Repo URL: https://api.github.com/repos/django/djangoGithub repository info for: djangoGot result for owner...received_events_url ==> https://api.github.com/users/django/received_eventsavatar_url ==> https://avatars2.githubusercontent.com/u/27804?v=4type ==> Organizationgists_url ==> https://api.github.com/users/django/gists{/gist_id}site_admin ==> Falsesubscriptions_url ==> https://api.github.com/users/django/subscriptionsurl ==> https://api.github.com/users/djangogravatar_id ==>html_url ==> https://github.com/djangofollowing_url ==> https://api.github.com/users/django/following{/other_user}repos_url ==> https://api.github.com/users/django/reposlogin ==> djangostarred_url ==> https://api.github.com/users/django/starred{/owner}{/repo}followers_url ==> https://api.github.com/users/django/followersevents_url ==> https://api.github.com/users/django/events{/privacy}id ==> 27804organizations_url ==> https://api.github.com/users/django/orgs

读取 Gank.io 的订阅源

#!/usr/bin/env python3from datetime import datetimeimport feedparserBASE_URL = 'http://gank.io/feed'def read_feed(feed_url):    try:        data = feedparser.parse(feed_url)    except Exception as e:        print(e)    for entry in data.entries:        print(entry.title)        print('({})'.format(entry.link))        # print(entry.description)        print('\n')if __name__ == '__main__':    print('----- Reading feed from gank.io ({})-----'.format(datetime.today()))    read_feed(BASE_URL)    print('----- End of Gank.io feed -----')

OUTPUT

$ ./read_gank_feed.py----- Reading feed from gank.io (2017-08-11 09:44:30.011551)-----今日力推:基于 GitHub Comment 实现的论坛评论系统 / 潘多拉播放器,做的超漂亮(http://gank.io/2017/08/09?utm_medium=rss&utm_source=gank.io)今日力推:超轻量级区块链实现 / 气泡风格的 SeekBar / swift版“Luban"(http://gank.io/2017/08/08?utm_medium=rss&utm_source=gank.io)今日力推:Android 动手实现 VR / Java 实现的 DHT 协议,其实就是 BitTorrent /Swift Gif 提取器和制作工具。(http://gank.io/2017/08/03?utm_medium=rss&utm_source=gank.io)今日力推:三个优秀的Android图表开源控件 / 按钮加载动画效果 Demo,简单易懂(http://gank.io/2017/08/02?utm_medium=rss&utm_source=gank.io)今日力推:全部干货(http://gank.io/2017/08/01?utm_medium=rss&utm_source=gank.io)今日力推:获取百度网盘(高速)下载链接的chrome插件 / Android 安全逆向:篡改你的位置信息 / iOS上的一个简单,实用的无限循环轮播图组件(http://gank.io/2017/07/27?utm_medium=rss&utm_source=gank.io)今日力推:Shadowsocks 流量嗅探 / 轻松学习正则表达式(http://gank.io/2017/07/26?utm_medium=rss&utm_source=gank.io)今日力推:HenCoder 绘制 3 练习项目 / 基于 Vue 2.x 和 GitHub Issue 实现的博客系统 / 弱密码 Wifi 破解思路导引(http://gank.io/2017/07/25?utm_medium=rss&utm_source=gank.io)今日力推:酷酷的 Android 刷新组件 / CotEditor 开源了 / macOS 全局快捷键实现 / Android 简洁优雅的文件选择器(http://gank.io/2017/07/24?utm_medium=rss&utm_source=gank.io)今日力推:一款非常漂亮的 Material Design 风格的音乐播放器 / 理解与设计自适应图标(http://gank.io/2017/07/21?utm_medium=rss&utm_source=gank.io)今日力推: Android 日历组件 / OC 实现的 Easing Function(http://gank.io/2017/07/20?utm_medium=rss&utm_source=gank.io)今日力推:漂亮的二选一按钮效果 / 仿Google Play商店沉侵式样式 / 浅谈 MVC、MVP 和 MVVM 架构模式(http://gank.io/2017/07/19?utm_medium=rss&utm_source=gank.io)今日力推:用 Android 实现一条小金鱼(超棒) / PhysicsBasedAnimation学习 / 通过 14 个小项目,上手 Swift macOS 开发(http://gank.io/2017/07/18?utm_medium=rss&utm_source=gank.io)今日力推:非常 Material Design 风格的 Dropdown 效果 / What's New in LLVM 9.(http://gank.io/2017/07/17?utm_medium=rss&utm_source=gank.io)R.I.P(http://gank.io/2017/07/14?utm_medium=rss&utm_source=gank.io)今日力推:我是如何逆向了星巴克 App 的 / 如何写一个优雅的 Android Launcher / 在线 Sketch PSD 转换 / 基于 Javascript 实现的 JVM 虚拟机(http://gank.io/2017/07/13?utm_medium=rss&utm_source=gank.io)[...]----- End of Gank.io feed -----

爬取网页中的链接

#!/usr/bin/env python3import argparseimport sysimport reimport requestsprocessed = []def search_links(url, depth, search):    url_is_processed = (url in processed)    if url.startswith('http://') and (not url_is_processed):        processed.append(url)        url = host = url.replace('http://', '', 1)        path = '/'        urlparts = url.split('/')        if len(urlparts) > 1:            host = urlparts[0]            path = url.replace(host, '', 1)        print('Crawling URL path:{}{} '.format(host, path))        r = requests.get('http://' + host + path)        contents = r.text        all_links = re.findall('href="(.*?)"', contents)        if search in contents:            print('Found {} at {}'.format(search, url))        print('====> {}: processing {} links'.format(depth, len(all_links)))        for href in all_links:            if (not href.startswith('//')) and href.startswith('/'):                href = 'http://'+host+href            if depth > 0:                search_links(href, depth - 1, search)    else:        print('Skipping link: {} ...'.format(url))if __name__ == '__main__':    parser = argparse.ArgumentParser(description = 'Webpage link crawler')    parser.add_argument('--url', '-u', action = 'store', dest = 'url',        required = True)    parser.add_argument('--query', '-q', action = 'store', dest = 'query',        required = True)    parser.add_argument('--depth', '-d', action = 'store', dest = 'depth',        default = 2)    given_args = parser.parse_args()    try:        search_links(given_args.url, given_args.depth, given_args.query)    except KeyboardInterrupt:        print('Aborting search by user requests.')

OUTPUT

$ ./python_link_crawler.py -u http://python.org -q pythonCrawling URL path:python.org/Found python at python.org====> 2: processing 221 linksSkipping link: //ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js ...Crawling URL path:python.org/static/stylesheets/style.cssFound python at python.org/static/stylesheets/style.css====> 1: processing 0 linksCrawling URL path:python.org/static/stylesheets/mq.cssFound python at python.org/static/stylesheets/mq.css====> 1: processing 0 linksCrawling URL path:python.org/static/stylesheets/no-mq.cssFound python at python.org/static/stylesheets/no-mq.css====> 1: processing 0 linksCrawling URL path:python.org/static/favicon.ico====> 1: processing 0 linksCrawling URL path:python.org/static/apple-touch-icon-144x144-precomposed.png====> 1: processing 0 linksCrawling URL path:python.org/static/apple-touch-icon-114x114-precomposed.png====> 1: processing 0 linksCrawling URL path:python.org/static/apple-touch-icon-72x72-precomposed.png====> 1: processing 0 linksCrawling URL path:python.org/static/apple-touch-icon-precomposed.png====> 1: processing 0 linksSkipping link: http://python.org/static/apple-touch-icon-precomposed.png ...Crawling URL path:python.org/static/humans.txt[...]^CAborting search by user requests.