Python-学习-项目1-即时标记-1

来源:互联网 发布:医疗网络咨询招聘 编辑:程序博客网 时间:2024/06/05 02:34

买了一本Python入门,奈何看不下去,只能是先看后面的项目,看到那里不懂的时候在回去学习。


项目名字:即时标记


大致的意思就是把一个纯文本文件标记成自己想要的格式文件。


首先就是待处理文本,我找不到电子版的,就自己手写了一份


Welcome to World Wide Spam, IncThese are the corporate web pages of *World Wide Spam*, Inc. We hopeyou find your enjoyable, and that you will sample many of our productsA short history of the companyWorld Wide Spam was started in the summer of 2000. The business concept was to ride the dot-com wave and to make money both through bulk email and by selling canned meat onlineAfter receiving several complaints from customer who weren't satisfied bu their bulk email .World Wide Spam altered their profile. and foused 100% on canned goods. Today they rank as the world's 13.892nd online suppler of SPAMDestinationsFrom this page you may visit several of our interesting web pages:    -What is SPAM?(WWW.baidu.com)    -How do they make it?(WWW.baidu.com)    -Why should i eat is?(WWWW.baidu.com)How to get in touch with usYou can get in touch with us in *many* ways: By phone(123456789). by email(dream_dog@163.com) or by visiting our customer feedback page(wwww.baidu.com)

第一步,就是那文件切分成段落。

找出块的一个简单方法就是搜集遇到的所有行,知道遇到一个空行,然后返回已经搜集的行。那些返回的行就是一个块,之后在开始收集,不需要手机空行,也不要返回空块,同时要确保文件的最后一行是空行,否则程序就不知道什么时候结束


编写一个文件快生成器

def lines(file):    for line in file:yield line    yield '\n'def blocks(file):    block = []    for line in lines(file):        if line.strip():            block.append(line)        elif block:            yield ' '.join(block).strip()            block = []

代码中,lines生成器只是在文件尾追加一行空行,blocks生成器实现了前面说的方法。


添加一些标记

import sys. refrom util import *print('<html><head><title>...</title><body>')title = Truefor block in blocks(sys.stdin):    block = re.sub(r'\*(.+?)\*',r'<em>\1</em>',block)    if title:        print('<h1>')        print(block)        print('</h1>')        title = false    else:        print('<p>')        print(block)        print('</p>')print('</body></html>')



这里抱错了,显示我的sys不是一个包




1 0