python2 urllib2 重定向模拟用户登录图书馆(二)

来源:互联网 发布:喜欢的工作环境知乎 编辑:程序博客网 时间:2024/06/08 12:14

在之前的文章介绍了利用python的requests库进行图书馆用户登录和利用urllib2进行图书馆登录(只进行一次post请求,就可以爬取登录成功的页面),还介绍了urllib2的重定向解决。这里是根据学校图书馆进行的操作,其他情况要分别作出调整。

现在是将urllib2的重定向截断,利用其cookie进行提交。依然以图书馆登录为例,在之前的requests库登录中,浏览器发起了一次post请求,可是却进行了两次操作,一次是post,接着是get,这里运用了重定向。在这里,我们就需要进行两次提交,一次是post,得到我们需要的cookie参数,利用cookie参数,再进行第二次请求,返回登录成功界面。

以下为代码,这是第一次post请求,

import urllib2import urllibimport cookielib# 以登陆图书馆为例,图书馆302-200两次请求才会进入自己的界面# 禁止重定向,输出响应页面的头部,其包含的cookie可作为重定向url的头部。url = "http://210.32.205.60/login.aspx"header = {"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Encoding":"gzip, deflate","Accept-Language":"zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3","Connection":"keep-alive","Content-Type":"application/x-www-form-urlencoded","Host":"210.32.205.60","Referer":"http://210.32.205.60/login.aspx","Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 6.1; rv:54.0) Gecko/20100101 Firefox/54.0"}body = b'__VIEWSTATE=%2FwEPDwUILTE0Mjc2MjEPZBYCAgMPZBYCAg8PDxYCHgdWaXNpYmxlaGRkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYCBQxJbWFnZUJ1dHRvbjEFDEltYWdlQnV0dG9uMtIIHXEj%2BaKpKEuN3WBMS5An9yfKLqy76FI5Cs0ie1No&__VIEWSTATEGENERATOR=C2EE9ABB&__EVENTVALIDATION=%2FwEdAAbAeS%2BByzNg%2FXW9jIKItJSsl0eOxoEPS0IDqf0EHRx3vxEghZBVv0boc2NaC2%2FzVFQdp1z%2BnYWZ%2BpirZkxjR3dz6ZACrx5RZnllKSerU%2BIuKmLNV%2B2mZgnOAlNG5DVTg1uHvSo3x4u7p65TqmriJkDgirf2cB43UeZMqMyeVeS88Q%3D%3D&DropDownList1=0&TextBox1=2111512071&TextBox2=2111512071&ImageButton1.x=44&ImageButton1.y=12'class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor):    def http_response(self, request, response):        code, msg, hdrs = response.code, response.msg, response.info()        # only add this line to stop 302 redirection.        if code == 302: return response        if not (200 <= code < 300):            response = self.parent.error(                'http', request, response, code, msg, hdrs)        return response    https_response = http_responseclass NoRedirection(urllib2.HTTPErrorProcessor):    def http_response(self, request, response):        return response    https_response = http_responsecj = cookielib.CookieJar()opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj))response = opener.open(url, body)if response.code == 302:    redirection_target = response.headers['Location']print response.geturl()print response.info()print response.headers   # 头部信息,包含cookie,这个是我们需要在第二次get请求中带上的参数print response.read()   # post返回的页面
第二次进行get请求,带上cookie

aspid = response.headers["Set-Cookie"]  # cookie在此url ="http://210.32.205.60/Default.aspx"header = {"Accept": "image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, */*","Referer": "http://210.32.205.60/login.aspx","Accept-Language": "zh-CN","User-Agent": "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)","Accept-Encoding": "gzip, deflate","Host": "210.32.205.60","Connection": "Keep-Alive","Pragma": "no-cache","Cookie": aspid}request = urllib2.Request(url=url,headers=header)r = urllib2.urlopen(request)print r.read()