python Some Crawl
来源:互联网 发布:备忘录软件哪个好 编辑:程序博客网 时间:2024/06/05 23:06
re
xpath
json analysis
use chrome driver
The simplest one:
#encoding=utf-8 import urllibimport redef youdao(keyword): url='http://www.youdao.com/w/eng/'+keyword page=urllib.urlopen(url).read() find_result=re.findall(r'<div class="trans-container">(.*?)</div>',page,re.S|re.M) return_string=find_result[0].strip() return_string=re.sub('<(.*?)>','',return_string).strip() num=max(map(len,return_string.split('\n'))) print(''.join(['*']*num)) print return_string print(''.join(['*'*num])) return '\n'+keyword+' : '+return_string+'\n'youdao('你好')youdao('hello')
Demo2:
import urllibfrom lxml import etree url="http://www.dioenglish.com/home.php?mod=space&uid=114322&do=blog&id=55535"xp = '//div[@id="blog_article"]'def get(url,xp): t = urllib.urlopen(url).read() sele = etree.HTML(t) #content = sele.xpath('//div[@id="blog_article"]/p/span/font/text()') if xp[-2:] =='()': info = sele.xpath(xp) else: content = sele.xpath(xp) info = content[0].xpath('string(.)').encode('utf-8') return info
阅读全文
0 0
- python Some Crawl
- python crawl
- some python code
- Some python print
- some tips about python
- Python/Pandas Some Tricks
- Crawl GB2312 encoded webpages with Python 3.x
- python scrapy crawl csdnblog出现importError:No modul named items
- some tips about python One
- some tips about python Two
- some tips about python Three
- some tips about python Four
- some tips about python Six
- Some tips about python Seven
- crawl.py
- 爬虫 crawl
- Pub crawl
- Crawl AJAX dynamic web page using Python 2.x and 3.x
- BZOJ 1497 [NOI2006]最大获利 最大权闭合子图
- 写着玩儿:数三退一
- 2.Dagger2模块化引入
- Struct2入门四
- 将文件快速 拷贝/移动 到某文件夹下
- python Some Crawl
- Windows 10下keras+theano安装教程(极速)
- LSTM GRU tensorflow代码 和 原理图中的箭头 的对应关系
- vue开发:vue目录结构
- deepin OS Service auto start
- LightOJ1106
- 解决gradle升级报错 This version of Android Studio is incompatible with the Gradle Plugin used
- 最后的作业——NP完全问题证明
- Swift_二维码、条形码的生成