我的Python爬虫(二):买肾六

来源:互联网 发布:mac上好用的翻译软件 编辑:程序博客网 时间:2024/05/14 19:31

项目内容:

用Python从香港的apple网上商店(http://store.apple.com/hk-zh/buy-iphone/iphone6)购买iPhone6

任务分解:

(一)到目前为止,iPhone6一直处于供不应求的状态。因此,我们首先要做的是不断刷新网页,判断货物有无。

(二)当有货时,发一个短信给手机。

(三)当有货时,自动下单购买。

由于iPhone6暂时缺货,所以以下所有代码可能会有iPhone5s和iPhone6两个版本。

任务(1):

实际上这是一个网页抓取的小程序。要点有两个:

(1)不断刷新。这里需用到time模块,time.sleep()函数

(2)抓取的内容为中文,在处理的时候需要做编码转换。

抓取的HTML应该是:


程序如下:

<span style="font-size:24px;">import urllib</span>
<span style="font-size:24px;">import urllib2</span>
<span style="font-size:24px;">import reimport timeurl = 'http://store.apple.com/hk-zh/buy-iphone/iphone6'while 1:    html = urllib.urlopen(url).read()    reg = '{"dimensionScreensize":"(.*?)","dimensionColor":"(.*?)","dimensionCapacity":"(.*?)","partNumber":".*?","price":"(.*?)","displayShippingQuote":"(.*?)".*?}'    mes = re.compile(reg).findall(html)    for i in mes:        print i[0]+' ',i[1]+' ',i[2]+' ','HK$'+i[3]+' ',i[4]        if i[4].decode('utf-8') == u'有現貨':            print "有货"        else:            print "缺货"    print "120秒刷新一次"    time.sleep(120)</span></span>

结果如下:


iPhone5s的版本为:

<span style="font-size:24px;">import urllibimport urllib2import reurl = 'http://store.apple.com/hk-zh/buy-iphone/iphone5s'html = urllib.urlopen(url).read()reg = '{"dimensionColor":"(.*?)","dimensionCapacity":"(.*?)","partNumber":"(.*?)","price":"(.*?)","displayShippingQuote":"(.*?)".*?}'mes = re.compile(reg).findall(html)for i in mes:    print i[0]+' ',i[1]+' ',i[2]+' ','HK$'+i[3]+' ',i[4]    if cmp(i[4].decode('utf-8'),u'有現貨'):        print "success"    else:        print "failure"</span>


任务(二):

实际上这是一个用Python发短信的任务,方法有很多。比如

(1)利用google calendar的api,在日历里新建一个事件,来给自己发短信。

(2)用老点的手机+串口+at指令发送。

(3)给QQ邮箱发邮件,微信通知。

(4)买个短信mao,然后串口编程。

(5)移动139邮箱。

上述五个方法来自豆瓣,问题是:我一个都不会,但据说最简单的是第五种;移动139邮箱。

这时候,就需要想起google啊,百度啊这类东西的存在,反正我没找的到,也可能是因为我搜商不高。

总之,代码如下:

<span style="font-size:24px;">import poplibimport smtplibimport emailimport mimetypesfrom email.MIMEMultipart import MIMEMultipartfrom email.mime.text import MIMETextdef sendmail(to_list,sub,con):            mail_host="smtp.139.com"      mail_user="手机号码@139.com"      mail_pass="密码"      mail_postfix="mail.139.com"            me = mail_user+"<"+mail_user+"@"+mail_postfix+">"            msg = MIMEMultipart('related')      msg['Subject'] = email.Header.Header(sub,'utf-8')      msg['From'] = me      msg['To'] = ";".join(to_list)      msg.preamble = 'This is a multi-part message in MIME format.'            msgAlternative = MIMEMultipart('alternative')      msgText = MIMEText(con, 'plain', 'utf-8')      msgAlternative.attach(msgText)      msg.attach(msgAlternative)            try:          s = smtplib.SMTP()          s.connect(mail_host)          s.login(mail_user,mail_pass)          s.sendmail(me, to_list, msg.as_string())          s.quit()            except Exception,e:          return False      return Trueif sendmail("*****@139.com","hahaha","hahaha"):    print "succsess"else:    print "failure"</span>

到目前为止,这里面的函数我还没弄明白,但是这个是肯定能运行的。以后会好好研究一下用Python发短信的问题的。莫名地感觉这块的水好像很深呐……

任务(三):

由于iPhone6缺货,所以做了一个iPhone5s的。

如果按照上一篇的爬虫做法,就会发现用httpfox监视到的全是get方法,完全没有想象中的post。按照别人的做法试了一下,也没有。后来还是用IE浏览器+httpwatch看到了登录时的post方法。

这样就得到了登录的URL和提交的postdata表单。

后来又监视选好型号后加入购物车的过程,发现有一个get方法中包含了我的所有信息,所以,只要打开这个get方法指向的网站,就可以下单了。不需要提交post表单。网址如下:

http://store.apple.com/hk-zh/buy-iphone/iphone5s?
ao.iphone5scasegrid_leather=none&
ao.applecareplus=none&
ao.iphone5sdock=none&
ao.lightning_usb_cable=none&
ao.add_5w_usb_power_adapter=none&
ao.lightning_30pin_adapter=none&
ao.lightning_30pin_02m=none&ao.lightning_av=none&
ao.apple_tv=none&
ao.urbeats_inear=none&
ao.beats_solo2=none&
ao.iphone_printers=none&
add-to-cart=add-to-cart&
cppart=UNLOCKED%2FWW&
product=MF353P%2FA&
step=accessories&
dimensionCapacity=16gb&
dimensionColor=silver&
complete=true

我是为了方便看把它重新排版了一下。

所以,可以写出下单的代码如下:

<span style="font-size:24px;">import urllibimport urllib2import reimport cookielibimport smtplibimport stringurl = 'http://store.apple.com/hk-zh/buy-iphone/iphone5s'cookie = cookielib.CookieJar()handler = urllib2.HTTPCookieProcessor(cookie)opener = urllib2.build_opener(handler)logindata = {'login-appleId':'账号',         'login-password':'密码',         'fdcBroserData':'%7B%22U%22%3A%22Mozilla%2F5.0%20(compatible%3B%20MSIE%209.0%3B%20Windows%20NT%206.1%3B%20Trident%2F5.0%3B%20SLCC2%3B%20.NET%20CLR%202.0.50727%3B%20.NET%20CLR%203.5.30729%3B%20.NET%20CLR%203.0.30729%3B%20.NET4.0C%3B%20.NET4.0E)%22%2C%22L%22%3A%22zh-cn%22%2C%22Z%22%3A%22GMT%2B08%3A00%22%2C%22V%22%3A%221.0%22%7D',         '_a':'login.sign',         'c':'aHR0cDovL3N0b3JlLmFwcGxlLmNvbS9oay16aC9idXktaXBob25lL2lwaG9uZTVzfDFhb3MwNjFkZjU2NWNhNTI4YWZiNzY2N2Y5NTZhOWI2MmNjYjViODIxNDYz',         '_fid':'si',         'r':'SCDHYHP7CY4H9XK2H',         's':'aHR0cDovL3N0b3JlLmFwcGxlLmNvbS9oay16aC9idXktaXBob25lL2lwaG9uZTVzfDFhb3MwNjFkZjU2NWNhNTI4YWZiNzY2N2Y5NTZhOWI2MmNjYjViODIxNDYz',         't':'S99KKATD9FP9FHCP4'}login = urllib.urlencode(logindata)headers = {'User_Agent':'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'}request = urllib2.Request('https://secure1.store.apple.com/hk-zh/sentryx/sign_in',login,headers)opener.open(request)data = urllib.urlencode({})headers = {'User-Agent':'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'}response2 = urllib2.Request("http://store.apple.com/hk-zh/buy-iphone/iphone5s?ao.iphone5scasegrid_leather=none&ao.applecareplus=none&ao.iphone5sdock=none&ao.lightning_usb_cable=none&ao.add_5w_usb_power_adapter=none&ao.lightning_30pin_adapter=none&ao.lightning_30pin_02m=none&ao.lightning_av=none&ao.apple_tv=none&ao.urbeats_inear=none&ao.beats_solo2=none&ao.iphone_printers=none&add-to-cart=add-to-cart&cppart=UNLOCKED%2FWW&product=MF353ZP%2FA&step=accessories&complete=true",data,headers)opener.open(response2)print 'success'</span>

同样的,只要有iPhone6加入购物车时跳转的网址,就可以写出iPhone6的版本了。如果有大神有其他方法,请告诉我,我始终觉得这样做不太好。

最终实现代码:

综合上述过程,得到代码如下:

<span style="font-size:24px;"># -*- coding: cp936 -*-import cookielibimport urllibimport urllib2import reimport timeimport poplibimport smtplibimport emailimport mimetypesfrom email.MIMEMultipart import MIMEMultipartfrom email.mime.text import MIMETextdef sendmail(to_list,sub,con):            mail_host="smtp.139.com"      mail_user="手机号码@139.com"      mail_pass="密码"      mail_postfix="mail.139.com"            me = mail_user+"<"+mail_user+"@"+mail_postfix+">"            msg = MIMEMultipart('related')      msg['Subject'] = email.Header.Header(sub,'utf-8')      msg['From'] = me      msg['To'] = ";".join(to_list)      msg.preamble = 'This is a multi-part message in MIME format.'            msgAlternative = MIMEMultipart('alternative')      msgText = MIMEText(con, 'plain', 'utf-8')      msgAlternative.attach(msgText)      msg.attach(msgAlternative)            try:          s = smtplib.SMTP()          s.connect(mail_host)          s.login(mail_user,mail_pass)          s.sendmail(me, to_list, msg.as_string())          s.quit()            except Exception,e:          return False      return Trueurl = 'http://store.apple.com/hk-zh/buy-iphone/iphone6'count1 = 0while 1:    html = urllib.urlopen(url).read()    reg = '{"dimensionScreensize":"(.*?)","dimensionColor":"(.*?)","dimensionCapacity":"(.*?)","partNumber":".*?","price":"(.*?)","displayShippingQuote":"(.*?)".*?}'    mes = re.compile(reg).findall(html)    count = 0    for i in mes:        print i[0]+' ',i[1]+' ',i[2]+' ','HK$'+i[3]+' ',i[4]        if i[4].decode('utf-8') == u'有現貨':            content = "尺寸:"+i[0]+"\t颜色:"+i[1]+"\t内存:"+i[2]+"\t价格:"+i[3]            if sendmail("18862112022@139.com","iPhone6有现货",content):                print '发送邮件成功'                count = count + 1            #iPhone6下单没法写……    if (count == 0)and(count1 == 0):        print "iPhone6缺货,购买iPhone5s"        url = 'http://store.apple.com/hk-zh/buy-iphone/iphone6'        cookie = cookielib.CookieJar()        handler = urllib2.HTTPCookieProcessor(cookie)        opener = urllib2.build_opener(handler)        logindata = {'login-appleId':'deepexpert_shenjing@hotmail.com',                    'login-password':'PpNn13More',                    'fdcBroserData':'%7B%22U%22%3A%22Mozilla%2F5.0%20(compatible%3B%20MSIE%209.0%3B%20Windows%20NT%206.1%3B%20Trident%2F5.0%3B%20SLCC2%3B%20.NET%20CLR%202.0.50727%3B%20.NET%20CLR%203.5.30729%3B%20.NET%20CLR%203.0.30729%3B%20.NET4.0C%3B%20.NET4.0E)%22%2C%22L%22%3A%22zh-cn%22%2C%22Z%22%3A%22GMT%2B08%3A00%22%2C%22V%22%3A%221.0%22%7D',                    '_a':'login.sign',                    'c':'aHR0cDovL3N0b3JlLmFwcGxlLmNvbS9oay16aC9idXktaXBob25lL2lwaG9uZTVzfDFhb3MwNjFkZjU2NWNhNTI4YWZiNzY2N2Y5NTZhOWI2MmNjYjViODIxNDYz',                    '_fid':'si',                    'r':'SCDHYHP7CY4H9XK2H',                    's':'aHR0cDovL3N0b3JlLmFwcGxlLmNvbS9oay16aC9idXktaXBob25lL2lwaG9uZTVzfDFhb3MwNjFkZjU2NWNhNTI4YWZiNzY2N2Y5NTZhOWI2MmNjYjViODIxNDYz',                    't':'S99KKATD9FP9FHCP4'}        login = urllib.urlencode(logindata)        headers = {'User_Agent':'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'}        request = urllib2.Request('https://secure1.store.apple.com/hk-zh/sentryx/sign_in',login,headers)        opener.open(request)        data = urllib.urlencode({})        headers = {'User-Agent':'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'}        response2 = urllib2.Request("http://store.apple.com/hk-zh/buy-iphone/iphone5s?ao.iphone5scasegrid_leather=none&ao.applecareplus=none&ao.iphone5sdock=none&ao.lightning_usb_cable=none&ao.add_5w_usb_power_adapter=none&ao.lightning_30pin_adapter=none&ao.lightning_30pin_02m=none&ao.lightning_av=none&ao.apple_tv=none&ao.urbeats_inear=none&ao.beats_solo2=none&ao.iphone_printers=none&add-to-cart=add-to-cart&cppart=UNLOCKED%2FWW&product=MF353ZP%2FA&step=accessories&complete=true",data,headers)        opener.open(response2)        print 'iPhone5s下单成功'        content = "iPhone5s success"        sub = "iPhone5s"        if sendmail("18862112022@139.com",sub,content):            print '发送邮件成功'        count1 = 1    print "120秒刷新一次"    time.sleep(120)</span>

要注意的一点是:发送邮件的题目sub和内容content中尽量不要用中文字符,转换起来很麻烦,我始终搞不清楚。

实现结果就不贴出来了。程序是肯定能够运行的,有兴趣的话可以试一试。


0 0
原创粉丝点击