censys 数据库地理信息自定义接口(python版)

来源:互联网 发布:node js前景 知乎 编辑:程序博客网 时间:2024/06/10 15:31

公司内部的ip信息库覆盖面不是很够
导致日志处理的时候ip经常差不到
有人推荐,censys比较权威,
但是没有文档,而且接口不太好用,所以自己写了一个查ip的接口
首先
到官网逛了逛,censys特殊之处在于注册了才能用api
注册以后有Secret,API_ID,在查询时需要用到
百度了一下发现也没什么有用的教程,又看了看官方的介绍以及源码
得到了最初的版本

import censysfrom censys import *Secret=“”API_ID=“”self.api = censys.ipv4.CensysIPv4(api_id=self.API_ID, api_secret=self.Secret)res = self.api.view(ip)geo = res['location']

后来发现,这个库不是专业的地理信息库,这样查询很多ip的地址view不到。。
但是在网页上面却是可以显示地理信息的,想了想,准备直接用url发请求

import requestsurl="https://www.censys.io/ipv4/%s"%ipres = requests.get(url, auth=(API_ID, Secret))s=res.content

也可以用urllib2

import urllib2values ={"user":API_ID,"passwd":Secret}jdata = json.dumps(values)req = urllib2.Request(url, jdata)response = urllib2.urlopen(req)s=response.read()

两者差不太多吧,我用的是第一种
之后就是解析html了
上网找了找,发现神器bs4
搞了搞发现好方便,直接贴代码

from bs4 import BeautifulSoupsoup = BeautifulSoup(s, "html5lib")

这样html就被解析出来了,结合censys返回的html,可以解析出地理信息

b=soup.find_all("dl","dl-horizontal dl-hostbox")if len(b) == 0:    print "not found"geo=b[0].find_all('dd')

接下来继续解析出所需各项

lat_long=geo[3].string.split(',')country=geo[2].string.split(' ')json_data = {    "ip"    :           ip,     "latitude":        float(lat_long[0]),     "country":         str(country[0]),     "country_code":    str(country[1][1:-1]),     "longitude":       float(lat_long[1]),     "province":        str(geo[1].string),     "city":            str(geo[0].string)    }

这样也就可以用了
但是。。
有的ip地理信息竟然是残缺的,所以根据实际情况修改了一下解析过程
思路很简单,就是如果geo中五项都有,就直接过,缺项的时候,就看看有什么,然后加什么,没有的用“unknow”代替

try:    city = str(geo[0].string)    provice = str(geo[1].string)    country = geo[2].string.split(' ')    lat_long= geo[3].string.split(',')    Timezone= str(geo[4].string)except:    print "did not get enough info at ip%s"%ip              filed = b[0].find_all('dt')    lst=[]    for i in filed:        lst.append(i.string)        city = "unknow"    provice = "unknow"    country = ["unknow",("unknow")]    lat_long=[0.0,0.0]                      fileds= ["City","Province","Country","Lat/Long","Timezone"]         num=0    for i in lst:        index=fileds.index(i)        #print index        if index == 0:            city = str(geo[num].string)        elif index == 1:            provice = str(geo[num].string)        elif index == 2:            country = geo[num].string.split(' ')        elif index == 3:            lat_long= geo[num].string.split(',')                else :            continue        num=num+1json_data = {    "ip"    :           ip,     "latitude":        float(lat_long[0]),     "country":         str(country[0]),     "country_code":    str(country[1][1:-1]),     "longitude":       float(lat_long[1]),     "province":        provice,     "city":            city    }

这样完整的地理信息就解析出来了
这里放一下完整的api

import jsonimport urllib2import censysfrom censys import *import requestsfrom bs4 import BeautifulSoupclass censys_ip():    debug = False    Secret=""    API_ID=""    def __init__(self):        self.api = censys.ipv4.CensysIPv4(api_id=self.API_ID, api_secret=self.Secret)       def censys_html_search(self,ip):        url="https://www.censys.io/ipv4/%s"%ip        res = requests.get(url, auth=(self.API_ID, self.Secret))        s=res.content        soup = BeautifulSoup(s, "html5lib")        b=soup.find_all("dl","dl-horizontal dl-hostbox")        if len(b) == 0:            return {}        geo=b[0].find_all('dd')        try:            city = str(geo[0].string)            provice = str(geo[1].string)            country = geo[2].string.split(' ')            lat_long= geo[3].string.split(',')            Timezone= str(geo[4].string)        except:            print "did not get enough info at ip%s"%ip                      filed = b[0].find_all('dt')            lst=[]            for i in filed:                lst.append(i.string)                city = "unknow"            provice = "unknow"            country = ["unknow",("unknow")]            lat_long=[0.0,0.0]                              fileds= ["City","Province","Country","Lat/Long","Timezone"]                 num=0            for i in lst:                index=fileds.index(i)                #print index                if index == 0:                    city = str(geo[num].string)                elif index == 1:                    provice = str(geo[num].string)                elif index == 2:                    country = geo[num].string.split(' ')                elif index == 3:                    lat_long= geo[num].string.split(',')                        else :                    continue                num=num+1        json_data = {            "ip"    :           ip,             "latitude":        float(lat_long[0]),             "country":         str(country[0]),             "country_code":    str(country[1][1:-1]),             "longitude":       float(lat_long[1]),             "province":        provice,             "city":            city            }        return json_data            def search(self,ip):        try:            res = self.api.view(ip)            geo = res['location']            json_data = {                "ip"    :           ip,                 "latitude":        float(geo["longitude"]),                 "country":         geo["country"],                 "country_code":    geo["country_code"],                 "longitude":       float(geo["latitude"]),                 "province":        geo["province"],                 "city":            geo["city"]                }            return json_data        except:            json_data=self.censys_html_search(ip)            return json_data            def get_geo(self,ip):               json_data=self.search(ip)        if len(json_data) == 0:            print "can not find ip: %s"%ip            return -1        print "get geo of ip: %s"%ip        print json_data        return 1    def main(self,ip_lst):        lst=[]        for ip in ip_lst:            print  "========================"            finish_num = self.get_geo(ip)            if finish_num == -1 :                lst.append(ip)        print lstif __name__ == '__main__':    ip_lst=["8.8.8.8"]    print "=================================start=========================================="    a=censys_ip()    a.main(ip_lst)    print "=================================end=========================================="

反正还可以凑活着用吧

最后,他们告诉我有genip…………………………………………………

无所谓啦,反正写着玩的

0 0