Python爬虫入门

来源：互联网发布：python concat函数编辑：程序博客网时间：2024/05/17 01:55

用Python编写简单的网络爬虫

今天看了菜鸟教程的Python教程，准备做个小作业写个爬虫程序。其中主要涉及到基本语法、正则表达式、urllib和re两个模块。

爬虫实现

1.获取网页

import urllib  #加载模块import redef getHtml(url):    page=urllib.urlopen(url)    html=page.read()    return htmlhtml = getHtml("要爬取网页的URL")print html

2.获取想要爬取的资源

def getImg(html):    reg = r'src="(.*?\.jpg)" '    imgre = re.compile(reg)    imglist = re.findall(imgre,html)    x = 0    for imgurl in imglist:        urllib.urlretrieve(imgurl, '%s.jpg' %x)        x += 1html = getHtml("要爬取网页的URL")getImg(html)

文章参考了虫师的博客
一个不错的Python爬虫教程

0 0

Python爬虫 | Python爬虫入门
python爬虫入门简单爬虫
Python爬虫入门
Python爬虫入门
Python爬虫入门基础
如何入门 Python 爬虫？
python 爬虫入门
如何入门 Python 爬虫？
Python 爬虫入门《上》
Python 爬虫入门《中》
Python爬虫入门《下》
python 爬虫入门
python爬虫入门
Python爬虫入门
Python 爬虫入门实例
爬虫入门：Python
如何入门 Python 爬虫？
[Python]爬虫入门
C++11新特性之 std::array container
NSNotificationCenter用法总结
SQL基础学习8
Java基础知识
JavaScript垃圾收集
Python爬虫入门
隐式图--HDU - 2717 Catch That Cow
HTTP集群之nginx+keepalived
android获取string.xml的值
linux 常用命令(ubuntu)
Linux之旅--Bash
自定义组合控件动态，静态设置属性的步骤
maven
编程之美-连连看游戏设计方法整理