Python.Following Links in HTML Using BeautifulSoup

来源:互联网 发布:c语言for语句 编辑:程序博客网 时间:2024/06/06 02:35

The program will use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find.

Find the link at position 18 (the first name is 1). Follow that link. Repeat this process 7 times. The answer is the last name that you retrieve.
Hint: The first character of the name of the last page that you will load is: M

HTML地址:http://python-data.dr-chuck.net/known_by_Cleo.html

Python源码:

<span style="font-size:12px;">import urllibfrom bs4 import BeautifulSoupurl = raw_input('Enter - ')count = int(raw_input('Enter count:'))position = int(raw_input('Enter position:'))for tag in xrange(count):html = urllib.urlopen(url).read()soup = BeautifulSoup(html,'html.parser')tags = soup.findAll('a')url = tags[position-1].get('href', None)print url</span>

运行结果:
Enter - http://python-data.dr-chuck.net/known_by_Cleo.htmlEnter count:7Enter position:18http://python-data.dr-chuck.net/known_by_Mirrin.html

0 0
原创粉丝点击