pyspider爬虫框架无法返回datetime对象的问题

来源:互联网 发布:linux jdk1.7安装教程 编辑:程序博客网 时间:2024/05/21 16:23
这几天我一直在研究利用pyspider框架爬取新闻网站。但是由于诸多原因,这一工作进展缓慢。接下来我阐述的问题困惑了我很长时间。先给出问题代码:
raw_time="YYYY年mm月dd日 HH:MM"FORMAT='%Y年%m月%d日 %H:%M'raw_time=datetime.datetime.strptime(raw_time,FORMAT)...return {            "content":"<p>"+content+"</p",            "title":title,            "url": response.url,            "time":raw_time,//返回datetime对象            "source":source,            "title": title        }

但是在运行时候,会遇到如下错误信息:

[E 170517 15:01:59 result_worker:63] Object of type 'datetime' is not JSON serializable    Traceback (most recent call last):      File "/usr/local/lib/python3.6/site-packages/pyspider/result/result_worker.py", line 54, in run        self.on_result(task, result)      File "/usr/local/lib/python3.6/site-packages/pyspider/result/result_worker.py", line 38, in on_result        result=result      File "/usr/local/lib/python3.6/site-packages/pyspider/database/sqlite/resultdb.py", line 58, in save        return self._replace(tablename, **self._stringify(obj))      File "/usr/local/lib/python3.6/site-packages/pyspider/database/sqlite/resultdb.py", line 44, in _stringify        data['result'] = json.dumps(data['result'])      File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 231, in dumps        return _default_encoder.encode(obj)      File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 199, in encode        chunks = self.iterencode(o, _one_shot=True)      File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 257, in iterencode        return _iterencode(o, 0)      File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 180, in default        o.__class__.__name__)    TypeError: Object of type 'datetime' is not JSON serializable
实际上,类似代码在我自己环境中是可行的,显然问题不在于代码本身。可以看到第一行的错误信息时在:
File "/usr/local/lib/python3.6/site-packages/pyspider/result/result_worker.py", line 54, in run        self.on_result(task, result)
这个result_worker.py文件是框架内的py文件。我们可以猜测:此框架不支持返回非字符串类型对象尝试转化datetime类型对象为字符串
 raw_time=raw_time.strftime('%Y-%m-%d %H:%M')   
再次运行,无错误信息。^v^
原创粉丝点击