问题记录-python写mapper测试时出现urllib.error.HTTPError: HTTP Error 404: Not Found

来源:互联网 发布:人工智能计算器 安卓 编辑:程序博客网 时间:2024/05/22 15:53
hadoop@ub1401:~/python/pythonfile$ cat keyword.txtsheep2dog,3firework 3hadoop@ub1401:~/python/pythonfile$ cat keyword.txt | ./mappertest1-1.pyTraceback (most recent call last):  File "./mappertest1-1.py", line 58, in <module>    response = urllib.request.urlopen('https://www.bing.com/images/asvnc?q=' + urllib.parse.quote_plus(keyword) + '&async=content&first=' + str(current) + '&adlt=' + adlt)  File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen    return opener.open(url, data, timeout)  File "/usr/lib/python3.4/urllib/request.py", line 469, in open    response = meth(req, response)  File "/usr/lib/python3.4/urllib/request.py", line 579, in http_response    'http', request, response, code, msg, hdrs)  File "/usr/lib/python3.4/urllib/request.py", line 501, in error    result = self._call_chain(*args)  File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain    result = func(*args)  File "/usr/lib/python3.4/urllib/request.py", line 684, in http_error_302    return self.parent.open(new, timeout=req.timeout)  File "/usr/lib/python3.4/urllib/request.py", line 469, in open    response = meth(req, response)  File "/usr/lib/python3.4/urllib/request.py", line 579, in http_response    'http', request, response, code, msg, hdrs)  File "/usr/lib/python3.4/urllib/request.py", line 507, in error    return self._call_chain(*args)  File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain    result = func(*args)  File "/usr/lib/python3.4/urllib/request.py", line 587, in http_error_default    raise HTTPError(req.full_url, code, msg, hdrs, fp)urllib.error.HTTPError: HTTP Error 404: Not Found

功能实现是从文本中获取关键词然后搜索下载图片,在直接赋字符串时可以实现搜索下载。

但mapper输入应该是从sys.stdin按行获取如下

for line in sys.std.in:

***************具体实现

在这样写入时反而出现了以上的问题,url打开出问题,在图片下载上使用了多线程,目前不知是哪里的问题


居然是网址的问题!!!

原网址是设置

response = urllib.request.urlopen('https://www.bing.com/images/asvnc?q=' + urllib.parse.quote_plus(keyword) + '&async=content&first=' + str(current) + '&adlt=' + adlt)

改后

response = urllib.request.urlopen('https://cn.bing.com/images/async?q=' + urllib.parse.quote_plus(keyword) + '&async=content&first=' + str(current) + '&adlt=' + adlt)

初步解释:

在直接访问时使用www.bing.com提示连接出错然后直接强制跳转到cn.bing.com了,不知道什么原因。

在访问bing的官网时也是直接访问cn.bing.com,点击了switch to english 出现的网址是http://global.bing.com/?FORM=HPCNEN&setmkt=en-us&setlang=en-us,而不是ww.bing.com

0 0