python正则表达式提取文本中的电话号码和邮箱

来源:互联网 发布:linux服务器文件同步 编辑:程序博客网 时间:2024/05/18 02:37

代码:

#! python3import pyperclip,rephoneregex = re.compile(r'''(\d{3}|\(\d{3}\))?              # area code(\s|-|\.)?                       # separator(\d{3})                          # first 3 digits(\s|-|\.)                        # separator(\d{4})                          # last 4 digits(\s*(ext|x|ext.)\s*(\d{2,5}))? # extension''', re.VERBOSE)emailregex = re.compile(r'''([a-zA-Z0-9._%+-]+               #username@                               #@symbol[a-zA-Z0-9.-]+                  #domain name(\.[a-zA-Z]{2,4})               #dot-something)''',re.VERBOSE)text = str(pyperclip.paste())matches=[]print(phoneregex.findall(text))for groups in phoneregex.findall(text):    print(groups)    phonenum='-'.join([groups[0],groups[2],groups[4]])    if groups[7] !='':        phonenum+=' x'+groups[7]    matches.append(phonenum)for groups in emailregex.findall(text):    matches.append(groups[0])if len(matches)>0:    pyperclip.copy('\n'.join(matches))    print('copied to clipbpard:')    print('\n'.join(matches))else:    print('no phone numbers or eamil addresses found.')
输出:

[('800', '-', '420', '-', '7240', '', '', ''), ('415', '-', '863', '-', '9900', '', '', ''), ('415', '-', '863', '-', '9950', '', '', '')]
('800', '-', '420', '-', '7240', '', '', '')
('415', '-', '863', '-', '9900', '', '', '')
('415', '-', '863', '-', '9950', '', '', '')
copied to clipbpard:
800-420-7240
415-863-9900
415-863-9950
info@nostarch.com
media@nostarch.com
academic@nostarch.com
info@nostarch.com

说明:

书中r'''之后有个括号,所以findall会先返还整个匹配成功对象,后面的大括号同理,extension部分先返回整个括号匹配的,在返回两个小括号匹配的


原创粉丝点击