python 爬虫（1）

来源：互联网发布：电子数据取证产品编辑：程序博客网时间：2024/05/16 00:47

基础的爬行语句包括获取网页的html和解析，获取网页html内容主要用到urllib2库里的urlopen函数，返回值是个file-like object,记该对象为response，采用response.read()即可获得网页的html内容。然后利用BeautifulSoup函数解析网页即可。如下例：

import urllib2from bs4 import BeautifulSoupurl="https://www.baidu.com/"response=urllib2.urlopen(url)html=response.read()bs0=BeautifulSoup(html)print(bs0.script)     #打印<script>标签及其包含的内容

0 0

python 爬虫（1）
python 爬虫（1）
python-爬虫（1）
【python爬虫】python爬虫入门攻略（1）
Python爬虫初学（1）
小小Python爬虫（1）
Python网络爬虫（1）
Python爬虫（bs4）-1
python 爬虫教程（1）
Python爬虫开发（1）
Python爬虫入门（1）
Python爬虫笔记----爬虫技术入门（1）
【网络爬虫】【python】网络爬虫（一）：python爬虫概述
Python爬虫入门（1）：综述
Python爬虫入门（1）：综述
python写一个爬虫（1）
python爬虫（1）：爬取糗百内容
Python网络爬虫（1）获取网页
《ROS精品入门》学习笔记三：ROS客户端
Axis2 webServer调用
华南理工大学软件工程专业的课程
select机制的驱动实现及原理
js碰撞解析
python 爬虫（1）
POJ3069 Saruman's Army
java旅行--第七站--Spring的注解
android 快捷键补充
扩展jQuery读书笔记:第一章，jQuery扩展
Bigbluebutton服务器搭建
iOS开发之runtime的运用-获取当前网络状态
云服务器安装jdk，tomcat
安卓任务二打招呼