sqlmap百度爬虫
来源:互联网 发布:网络空间安全杂志社 编辑:程序博客网 时间:2024/05/18 20:32
bugs:
中文支持不好,没做特殊处理,仅是个人业余爱好
--------------------------------------------------------------------------
相信用过sqlmap的人都有自己一套找bug website的方法。
我也只是刚刚听身边朋友说这个利器,看他天天百度找url找的挺累了,
因此写了一个python脚本给他批量抓url,
然后再将抓到的url给sqlmap检测
工具:python 2.7(32位)、lxml-3.1.1.win32-py2.7、pyquery-1.2.13、requests-2.10.0
#!/usr/bin/python#coding=GBKimport reimport requestsfrom pyquery import PyQuery as Pqclass BaiduSearchSpider(object): def __init__(self, searchText): self.url = "http://www.baidu.com/baidu?wd=%s&tn=monline_4_dg" % searchText self.headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/600.5.17 (KHTML, like Gecko) Version/8.0.5 Safari/600.5.17"} self._page = None def rePage(self,url): self.url = url self._page = None @property def page(self): if not self._page: r = requests.get(self.url, headers=self.headers) r.encoding = 'utf-8' self._page = Pq(r.text) return self._page @property def baiduURLs(self): return [(site.attr('href'), site.text().encode('utf-8')) for site in self.page('div.result.c-container h3.t a').items()] @property def nextPageUrl(self): return [(site.attr('href'), site.text().encode('utf-8')) for site in self.page('div#page a.n').items()] @property def originalURLs(self): tmpURLs = self.baiduURLs #print tmpURLs originalURLs = [] for tmpurl in tmpURLs: tmpPage = requests.get(tmpurl[0], allow_redirects=False) if tmpPage.status_code == 200: urlMatch = re.search(r'URL=\'(.*?)\'', tmpPage.text.encode('utf-8'), re.S) #originalURLs.append((urlMatch.group(1), tmpurl[1])) originalURLs.append((urlMatch.group(1))) elif tmpPage.status_code == 302: #originalURLs.append((tmpPage.headers.get('location'), tmpurl[1])) originalURLs.append((tmpPage.headers.get('location'))) else: print 'No URL found!!' return originalURLssearchText = raw_input("搜索内容是:") print searchTextbdsearch = BaiduSearchSpider(searchText) count = 0while (count < 100): originalurls = bdsearch.originalURLs #print originalurls for urlStr in originalurls: f = open('recode.txt','a') f.write(urlStr+'\n') f.close() pagesUrl = bdsearch.nextPageUrl nextUrl = '' if (len(pagesUrl) == 2) : nextUrl = "http://www.baidu.com"+pagesUrl[1][0] elif(count==0): nextUrl = "http://www.baidu.com"+pagesUrl[0][0] else: print "search end" exit() #print nextUrl bdsearch.rePage(nextUrl) count = count + 1 print "count = "+str(count)
C:\Users\Administrator\Desktop\python\httptest\httptest>python getUrl.py
搜索内容是:inurl:asp?id=
inurl:asp?id=
count = 1
count = 2
count = 3
count = 4
count = 5
count = 6
count = 7
count = 8
count = 9
count = 10
count = 11
count = 12
count = 13
count = 14
count = 15
count = 16
在生成的recode.txt中保存这搜索结果
http://www.kemflo.net/news.php?id=45
http://www.lxjx.cn/news.php?id=259
http://www.hnccgc.com/jcxx/get_news.php?id=10069
http://www.southsurvey.com/public/news.php?id=1120
http://www.7daysinn.cn/news.php?id=2421
http://www.hzfc.gov.cn/zwgk/zwgknews.php?id=214690
http://www.hwqh.com.cn/viewnews.php?id=43923
http://www.neweekly.com.cn/newsview.php?id=2905
http://www.hnccgc.com/jcxx/get_news.php?id=11648
http://xwzx.cqupt.edu.cn/xwzx.?news.php?id=26533
http://www.hnccgc.com/xwzx/get_news.php?id=13015
http://www.sxsfgl.gov.cn/news.php?id=1281&root_lanmu=52
http://www.ccmt.org.cn/shownews.php?id=15031
http://www.hnccgc.com/jcxx/get_news.php?id=13353
http://www.bjmtgnews.com/paper/news.php?id=5909
http://www.ltzxw.com/news.php?id=3767
http://www.f0580.com/news/news.php?id=5284
http://www.cwhweb.com/news.php?id=3488
http://www.ks-lxjy.com/news/news.php?id=7066
http://www.chinawalking.net.cn/newsite/readnews.php?id=2298
http://www.oebrand.cn/news.php?id=12080
http://www.boosoochina.com/news/shownews.php?id=1496&lang=cn
http://www.badmintoncn.com/news.php?id=18300
http://www.dcfever.com/news/readnews.php?id=8150
http://www.stat-nba.com/news.php?id=6
http://tuan.zjcheshi.com/news.php?id=65991
http://www.fwol.cn/shownews.php?id=25087
- sqlmap百度爬虫
- sqlmap
- sqlmap
- sqlmap
- SQLmap
- SQLMap
- sqlmap
- sqlmap
- SQLMAP
- sqlmap
- sqlmap
- 百度贴吧爬虫
- 屏蔽百度爬虫搜索
- 百度贴吧爬虫
- 百度贴吧小爬虫
- 百度图片爬虫
- 百度云爬虫_python
- 百度百科爬虫
- 最小生成树
- 【软件工程】——软件工程用图
- post 和get 区别
- nginx的安装与使用
- Modifying the Parameters for an I/O Operation
- sqlmap百度爬虫
- IO技术(七)Properties、文件切割合并的初步改进
- 解忧杂货店 --- 东野圭吾
- 12个Linux进程管理命令介绍
- hibernate工具类
- 树状数组+dfs(hdu 5877)
- Android 相册和拍照设置头像功能
- NYOJ-467-中缀式变后缀式(模拟)
- IO技术(八)把对象存储到持久化设备中