Python3爬虫小程序——爬取各类天气信息（4）

来源：互联网发布：网页游戏源码网编辑：程序博客网时间：2024/05/22 05:31

【爬取动态页面的数据】

上一次讲到用工具对动态页面进行数据爬取，但是感觉难度不小，而且运行效率简直低下。。。

于是乎从网上查资料，有好多说可以通过获取网站的json数据来进行动态页面爬取，然后我就找到气象数据权威——中央气象台的官网（http://www.nmc.cn/），开始数据的爬取。

然后怎么去找这个json数据呢？在后台开着抓包软件Fiddler的情况下，我打开了北京的天气的页面（http://www.nmc.cn/publish/forecast/ABJ/beijing.html）。。。后台抓包如下图：

然后发现json文件还不少。。突然点开了第二个，就是那个/f/rest/real/54511?_=149...的，发现了真相……

我天，数据都在这里啊，第一个是数据更新时间，第二个是城市信息，第三个是预警信息（没有的话都是9999），第四个是天气信息，第五个是风的信息。

然后根据这个的url开始试验，发现主要变化的是/f/rest/real/54511的54511，当城市变化的时候这串数也会变，且跟问好后面的好像没什么太大关系，把全部url（http://www.nmc.cn/f/rest/real/54511）复制粘贴到浏览器中访问。也可以得到数据。好了，确定url就是它了。然后改变后面的数字串就可以了。

代码就跟以前的类似了，根据json数据分别得到数据更新时间、天气、风向、风速、温度、湿度，然后写到文本文件中（当然也可以写到数据库里）。别忘了异常处理。

代码放上：

#coding=utf-8import jsonimport urllib.requestimport timeimport traceback#模拟成浏览器headers={"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",         "Accept-Encoding":"gbk,utf-8,gb2312",         "Accept-Language":"zh-CN,zh;q=0.8",         "User-Agent":"Mozilla/5.0(Windows NT 10.0; WOW64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36",         "Connection":"keep-alive"}opener=urllib.request.build_opener()headall=[]for key,value in headers.items():    item=(key,value)    headall.append(item)opener.addheaders=headall#将opener安装为全局urllib.request.install_opener(opener)#构造数据，城市分别是北京、天津、石家庄、太原、济南、沈阳、呼和浩特、郑州city_id = ['54511','54517','53698','53772','54823','54342','53463','57083']def getcityid(city):    if city == 'beijing':        return city_id[0]    elif city == 'tianjin':        return city_id[1]    elif city == 'shijiazhuang':        return city_id[2]    elif city == 'taiyuan':        return city_id[3]    elif city == 'jinan':        return city_id[4]    elif city == 'shenyang':        return city_id[5]    elif city == 'huhehaote':        return city_id[6]    else:        return city_id[7]def getweather(city):    try:        url = "http://www.nmc.cn/f/rest/real/"+getcityid(city)        stdout = urllib.request.urlopen(url)        weatherInfo = stdout.read().decode('utf-8')        jsonData = json.loads(weatherInfo)        weatherlist = []        # 读取JSON数据，添加到列表中        szDate = jsonData["publish_time"]        weatherlist.append(szDate)        szCity = jsonData["station"]["city"]        print("城市: "+str(szCity))        szWeather = jsonData["weather"]["info"]        weatherlist.append(szWeather)        szdirect = jsonData["wind"]["direct"]        weatherlist.append(szdirect)        szspeed = str(jsonData["wind"]["speed"]) + "m/s"        weatherlist.append(szspeed)        szTemp = str(jsonData["weather"]["temperature"]) + "℃"        weatherlist.append(szTemp)        szhumidity = str(int(jsonData["weather"]["humidity"])) + "%"        weatherlist.append(szhumidity)        print("数据更新时间，天气，风向，风速，实时温度，相对湿度：")        print(weatherlist)        writefiles_weather(city,weatherlist)    except urllib.error.URLError as e:        print("获取天气状况数据出现URLERROR！一分钟后重试……")        if hasattr(e, "code"):            print(e.code)        if hasattr(e, "reason"):            print(e.reason)        time.sleep(60)        # 出现异常则过一段时间重新执行此部分        getweather(city)    except Exception as e:        print("获取天气状况数据出现EXCEPTION！十秒钟后重试……")        print("Exception：" + str(e))        traceback.print_exc()  # 获得错误行数        time.sleep(10)        # 出现异常则过一段时间重新执行此部分        getweather(city)def writefiles_weather(filename,weatherlist):    try:        #将获取的数据写入文件中，数据分别为数据更新时间，天气，风向，风速（m/s），实时温度（℃），相对湿度（%）。        with open("D:\mydata\data_weather\data_weather_"+filename+".txt","a",errors="ignore") as f:            for weather in weatherlist:                f.write(str(weather))                f.write(",")            f.write("\n")        print("该条天气数据已添加到文件中！")    except Exception as e:        print("天气状况数据写入文件函数出现异常！将跳过此部分……")        print("Exception："+str(e))        traceback.print_exc()  #获得错误行数        passif __name__ == '__main__':    while(True):        print("==========开始工作==========")        getweather("beijing")        getweather("tianjin")        getweather("shijiazhuang")        getweather("taiyuan")        getweather("jinan")        getweather("shenyang")        getweather("huhehaote")        getweather("zhengzhou")        #休息一小时        print("【休息中……】")        time.sleep(3600)

pycharm运行结果：

【总结】对于我这种急性子来说，通过这种方式获取数据终于可以马上获得数据了，不用再一个一个等了（还容易出异常……）！！

阅读全文

0 0