python scrapy re正则表达式

来源：互联网发布：网络时时彩赌博案量刑编辑：程序博客网时间：2024/06/04 00:51

推荐一个unicode转换网址http://tool.chinaz.com/Tools/Unicode.aspx

re正则表达式

re.findall(u'\u8f6c\u53d1\[(\d+)\]',selector.xpath('//div[not(@class)]/span[1]/a/text()').extract_first())//提取“评论[11]”中的数字11re.findall(u'\u8bc4\u8bba\[(\d+)\]',selector.xpath('//div[not(@class)]/span[2]/a/text()').extract_first())//提取“转发[11]”中的数字11re.findall(u'\u8d5e\[(\d+)\]',selector.xpath('//div[not(@class)]/span[3]/a/text()').extract_first()) //提取“赞[11]”中的数字“11”re.findall(u'\s(\d+)/',selector.xpath('//input[@type="submit"]/text()').extract_first())//提取“ 11/150”中的数字“11”（“\s”匹配空格，“／”不需要转义）   re.findall(u'\u5173\u6ce8\[(\d+)\]',selector.xpath('//div[@class="tip2"]/a[1]/text()').extract_first())//提取“关注[11]”中的数字“11”re.findall(u'\u7c89\u4e1d\[(\d+)\]',selector.xpath('//div[@class="tip2"]/a[2]/text()').extract_first())//提取“粉丝[11]”中的数字“11”re.findall(u'\u5fae\u535a\[(\d+)\]',selector.xpath('//div[@class="tip2"]/span[@class="tc"]/text()').extract_first())//提取“微博[11]”中的数字“11”

有个要注意的问题是，re.findall()返回的是list数组，若确认list数组合法后，最好采用re.findall()[0]

1 0