python 爬虫访问网页时候,总是报错404错误

来源:互联网 发布:wps mac免费版 编辑:程序博客网 时间:2024/05/16 15:34

python 爬虫访问网页时候,总是报错404错误

> 如题,可能是因为没有增加header,网站认为你在非法攻击。

可以用如下代码调试

代码块语法遵循标准markdown代码,例如:

import urllib2import jsonpathimport time# get city idrequest_city= urllib2.Request("http://www.xj.10086.cn/support/bussinesshall/")try:    response= urllib2.urlopen(request_city, timeout=1000)    print response.info()    print response.read()except urllib2.HTTPError, e:    print e.getcode()    print e.reason    print e.geturl()    print "-------------------------"    print e.info()    print e.read()运行结果:/usr/bin/python2.7 /home/jason/code/2018-8-29_meizu12/meizu_sell/main_spider.py404Not Foundhttp://www.xj.10086.cn/support/bussinesshall/-------------------------Server: nginx/1.8.1Date: Wed, 30 Aug 2017 17:25:06 GMTContent-Type: text/htmlContent-Length: 1801Connection: closePowered-By-ChinaCache: MISS from 39100013g1Powered-By-ChinaCache: MISS from 39100013fA<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'><html xmlns='http://www.w3.org/1999/xhtml'><head><meta http-equiv='Content-Type' content='text/html; charset=utf-8' /><title>提示</title><style type='text/css'>* {margin:0px;padding:0px;cursor:default;font-size:12px;font-family:Tahoma;}html, body, .page {width:100%;height:100%;}.page {position:relative;min-width:500px;min-height:270px;display:table;overflow:hidden;}.container {*position:absolute;width:100%;top:50%; left:0px;display:table-cell;vertical-align:middle;}.main {position:relative;top:-50%;margin:0px auto;width:500px;height:270px;}.infobox {position:relative;width:100%;height:100%;}.infobox-shadow {position:absolute;top:8px;left:8px;z-index:1;width:100%;height:100%;background:#000;filter:Alpha(opacity=20);opacity:0.2;}.infobox-texts {position:absolute;top:0;left:0;z-index:1;width:100%;height:100%;background:#FFF;border:1px #444 solid;}.it-title {width:97%;height:32px;line-height:32px;margin:0px auto;font-size:17px;color:#000;font-weight:bold;border-bottom:1px #444 solid;overflow:hidden;}.it-memo {width:97%;height:250px;overflow:auto;line-height:25px;margin:10px auto;color:#444;}.it-memo p {font-size:15px;}</style></head><body><div class='page'><div class='container'><div class='main'><div class='infobox'><div class='infobox-shadow'></div><div class='infobox-texts'><div class='it-title'>请不要使用非法的URL地址访问</div><div class='it-memo'><p style='text-indent:12px;'>最可能的原因是:</p><p style='text-indent:24px;'>• 您正在试图非法攻击。</p><p style='text-indent:24px;'>• 您访问的URL地址不被允许。</p><p style='text-indent:12px;margin-top:15px;'></p></div></div></div></div></div></div></body></html>Process finished with exit code 0

可以知道也,确实是被网页屏蔽了,加上header就行

生成一个脚注1.


  1. 这里是 脚注内容. ↩
阅读全文
0 0
原创粉丝点击