Crawl AJAX dynamic web page using Python 2.x and 3.x
来源:互联网 发布:花生壳映射80端口失败 编辑:程序博客网 时间:2024/06/06 03:50
The term AJAX is short for Asynchronous Javascript and XML. It uses the Javascript XMLHttpRequest function to create a tunnel between the client's browser and the server to transmit information back and forth without having to refresh the page.
To crawl the contents created by AJAX, sometimes it's easy to identify the URL requested by the AJAX directly. Take the IE 11 as an example. First, press F12 and enter the developer tools mode. Select the "Network" tab, click the button to trigger the XMLHttpRequest, notice the URL tab and find out the URL links caused by the AJAX.
However, sometimes we cannot identify the URL caused by XMLHttpRequest directly. In this case, we have to build up the URL Request manually.
1. identify the URL with the POST protocol.
2. double click the above URL and copy the value of "User-Agent"
3. select the Request body tab and copy the values.
4. the python code:
Python 2.x
import urllib2import urllibimport jsonurl = 'http://www.huxiu.com/v2_action/article_list'user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0)'data = {'huxiu_hash_code' : '63b69ec3342ee8c7e6ec4cab561482c9', 'page':2, 'last_dateline':1466664240}data = urllib.urlencode(data)request = urllib2.Request(url=url,data=data)response = urllib2.urlopen(request)result = json.loads(response.read())print result
Python 3.x
import urllibimport jsonurl = 'http://www.huxiu.com/v2_action/article_list'user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0)'data = {'huxiu_hash_code' : '63b69ec3342ee8c7e6ec4cab561482c9', 'page':2, 'last_dateline':1466664240}data = (urllib.parse.urlencode(data)).encode('utf-8')response = urllib.request.urlopen(url, data)#parse jsonresult = json.loads(response.read().decode('utf-8'))print (response)print (result)
- Crawl AJAX dynamic web page using Python 2.x and 3.x
- Crawl GB2312 encoded webpages with Python 3.x
- python的2.x和3.x
- python 3.x 与 2.x区别
- python 3.x 2.x 区别
- Python 2.x vs 3.x
- python 2.x和3.x区别
- 【Python】2.x与3.x区别
- python 2.x to 3.x
- junit4.x and 3.x
- Python 3.x Web框架之bottle
- python 3.x中安装web.py
- 解决Cannot change version of project facet Dynamic web module to 2.x
- Cannot change version of project facet Dynamic web module to 2.x
- <python>python 2.x 与 3.x 的区别
- Python 3.x和 Python 2.x的区别
- python 2.x转换成python 3.x
- Python 3.x 与Python 2.x的区别
- vim常用命令--visual模式下粘贴、复制
- centos 7 源码安装及 php-fpm 配置与 nginx 集成
- ACM/ICPC竞赛之STL--map
- 稍后
- 【有道专辑】-【时尚の音乐五周年贺】美国新世纪钢琴家: Painted Echoes - (画的回声) Tim Glemser 蒂姆.葛兰森(2011)
- Crawl AJAX dynamic web page using Python 2.x and 3.x
- [leetcode] 353. Design Snake Game 解题报告
- js选项卡
- ACM/ICPC竞赛之STL--algorithm
- LeetCode 365. Water and Jug Problem
- LintCode Inverted Index
- [leetcode] 356. Line Reflection 解题报告
- 缠手胶再也不求人,看图学龙骨手胶的缠法
- 我认识的我