[Python]网络数据采集概述(3)—穿越网页表单、登录窗口进行采集

来源:互联网 发布:linux 关闭防火墙命令 编辑:程序博客网 时间:2024/06/06 06:41

  • Python Requests库提交表单
  • 提交文件和图像
  • 处理登陆和CookieSessionHttp基本认证
    • Cookie
    • Session
    • HTTP基本接入认证

Python Requests库提交表单

params = {"firstname": "Liu", "lastname": "Vi"}r = requests.post("http://pythonscraping.com/files/processing.php", data=params)print(r.text)

如果不了解字段name、value或者提交的路径,可以通过查看网页源码或者控制台中查看Network

提交文件和图像

files = {'uploadFile': open("1.jpg", 'rb')}r = requests.post("http://pythonscraping.com/files/processing2.php", files= files)print(r.text)

处理登陆和Cookie、Session、Http基本认证

params = {"username": "vi", "password": "password"}    r = requests.post("http://pythonscraping.com/pages/cookies/welcome.php", data= params)    print("Cookie is set to: ")    print(r.cookies.get_dict())    print("--------------------")    print("Going to profile page...")    r = requests.get("http://pythonscraping.com/pages/cookies/profile.php", cookies= r.cookies)    print(r.text)

Session

session = requests.Session()params = {'username': 'vi', 'password': 'password'}s = session.post("http://pythonscraping.com/pages/cookies/welcome.php", data= params)print("Cookie is set to: ")print(s.cookies.get_dict())print("-------------------")print("Going to profile page...")s = session.get("http://pythonscraping.com/pages/cookies/profile.php")print(s.text)print(session.headers)print('---------------')print(session.cookies)

HTTP基本接入认证

auth = HTTPBasicAuth('vi', 'password')r = requests.post(url= "http://pythonscraping.com/pages/auth/login.php", auth= auth)print(r.text)

参考书籍:
《Python网络数据采集》

原创粉丝点击