用python爬取我的百度经验目录

来源:互联网 发布:淘宝网图书松下幸之助 编辑:程序博客网 时间:2024/05/18 02:48

获得每一篇文章的链接:

import reimport urllib.requestdef getHtml(url):    page = urllib.request.urlopen(url)    html = page.read()    return htmldef getImg(html):    reg = r'<a href="([.*\S]*\.html)" title='    imgre = re.compile(reg);    imglist = re.findall(imgre, html)    return imglisturl = "https://jingyan.baidu.com/user/npublic/?uid=d1b612bceb0dc22ba8ffe137&pn="for i in range(0,89*7,7):    i = str(i)    a = url+i    html = getHtml(a)    html = html.decode('UTF-8')    for i in getImg(html):        print("https://jingyan.baidu.com"+i)

爬取标题:

import reimport urllib.requestdef getHtml(url):    page = urllib.request.urlopen(url)    html = page.read()    return htmldef getImg(html):    reg = r'<a href="([.*\S]*\.html)" title='    reg = r'title="([.*\S]*\?)" target='    imgre = re.compile(reg);    imglist = re.findall(imgre, html)    return imglisturl = "https://jingyan.baidu.com/user/npublic/?uid=d1b612bceb0dc22ba8ffe137&pn="for i in range(0,89*7,7):    i = str(i)    a = url+i    html = getHtml(a)    html = html.decode('UTF-8')    for i in getImg(html):        print(i)
原创粉丝点击