Head First Python(推导数据)

来源:互联网 发布:js动态为div添加style 编辑:程序博客网 时间:2024/05/21 01:43

Kelly教练需要你的帮助

从各个文件将数据读入各自的列表,编写程序,处理各个文件,为各个选手的数据分别创建列表,并在屏幕显示。

<span style="font-size:18px;">with open('james.txt') as jaf: data = jaf.readline()james = data.strip().split(',')with open('julie.txt') as juf: data = juf.readline()julie = data.strip().split(',')with open('mikey.txt') as mif: data = mif.readline()mikey = data.strip().split(',')with open('sarah.txt') as saf: data = saf.readline()sarah = data.strip().split(',')print(jiames)print(julie)print(mikey)print(sarah)</span>
排序有两种方式

原地排序(In-place sorting)是指按你指定的顺序排列数据,然后用排序后的数据替换原来的数据。原来的数据会丢失。对于列表,sort()方法会提供原地排序。

复制排序(Copied sorting)是指按你指定的顺序排列数据,然后返回原数据的一个副本。原数据的顺序依然保留,只是对一个副本排序。sorted()BIF支持复制排序。

print(sorted(james))print(sorted(julie))print(sorted(mikey))print(sorted(sarah))
时间的麻烦

Python可以对字符串排序,排序时,短横线在点号前面,点号则在冒号前面。所有这些字符串都以2开头,各个字符串的下一个字符相当于一个分组机制。

数据中存在的不一致性导致排序失败。

2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55

编写sanitize()函数,处理字符串,将短横线或者冒号替换为点号。

<span style="font-size:18px;">def sanitize(time_string):    if '-' in time_string:        spliter='-'    elif ':' in time_string:        spliter=':'    else:        return(time_string)    (mins,secs)=time_string.split(spliter)    return(mins + '.' + secs)</span>
编写代码,将现有的数据转换为经过清理的版本。

<span style="font-size:18px;">clean_james=[]clean_julie=[]clean_mikey=[]clean_sarah=[]for each_t in james:    clean_james.append(sanitize(each_t))for each_t in julie:    clean_julie.append(sanitize(each_t))for each_t in mikey:    clean_mikey.append(sanitize(each_t))for each_t in sarah:    clean_sarah.append(sanitize(each_t))print(sorted(clean_james))print(sorted(clean_julie))print(sorted(clean_mikey))print(sorted(clean_sarah))</span>
默认的,sort()方法和sorted()BIF都会按升序对数据排序。要以降序对数据排序,需要向sort()或sorted()传入参数reverse=True。

推导列表

将一个列表转换为另一个列表时需要做的四件事:

  1. 创建一个新的列表来存放转换后的数据。
  2. 迭代处理原列表中的各个数据项。
  3. 每次迭代时完成转换。
  4. 将转换后的列表追加到新列表。
<span style="font-size:18px;">clean_james=[]for each_t in james:    clean_james.append(sanitize(each_t))</span>
缩减为一行代码

clean_james=[sanitize(each_t) for each_t in james]
新的列表推导代码

print(sorted([sanitize(t) for t in james]))print(sorted([sanitize(t) for t in julie]))print(sorted([sanitize(t) for t in mikey]))print(sorted([sanitize(t) for t in sarah]))
访问列表的前三项数据

james[0:3]  #使用一个列表分片访问列表中从第0项到(但不包括)第3项的数据项。
迭代删除重复数据

unique_james=[]   #需要重复四次代码for each_t in james:    if each_t not in unique_james:        unique_james.append(each_t)print(unique_james[0:3])   
用集合删除重复项
print(sorted(set([sanitize(t) for t in james]))[0:3])
改善之后的代码
import pickleimport nesterimport osos.chdir('D:\\Python33\HeadFirstPython\chapter5')def sanitize(time_string):    if '-' in time_string:        spliter='-'    elif ':' in time_string:        spliter=':'    else:        return(time_string)    (mins,secs)=time_string.split(spliter)    return(mins + '.' + secs)with open('james.txt') as jaf: data = jaf.readline()james = data.strip().split(',')with open('julie.txt') as juf: data = juf.readline()julie = data.strip().split(',')with open('mikey.txt') as mif: data = mif.readline()mikey = data.strip().split(',')with open('sarah.txt') as saf: data = saf.readline()sarah = data.strip().split(',')print(sorted(set([sanitize(t) for t in james]))[0:3])print(sorted(set([sanitize(t) for t in julie]))[0:3])print(sorted(set([sanitize(t) for t in mikey]))[0:3])print(sorted(set([sanitize(t) for t in sarah]))[0:3])
代码中有一些重复代码,尝试将其抽取到一个小函数中。然后各个选手数据文件调用这个函数。

import pickleimport nesterimport osos.chdir('D:\\Python33\HeadFirstPython\chapter5')def sanitize(time_string):    if '-' in time_string:        spliter='-'    elif ':' in time_string:        spliter=':'    else:        return(time_string)    (mins,secs)=time_string.split(spliter)    return(mins + '.' + secs)def get_coach_data(filename):    try:        with open(filename) as f:            data=f.readline()            return(data.strip().split(','))    except IOError as err:            print('File error:' + str(err))            return(None)james=get_coach_data('james.txt')julie=get_coach_data('julie.txt')mikey=get_coach_data('mikey.txt')sarah=get_coach_data('sarah.txt')print(sorted(set([sanitize(t) for t in james]))[0:3])print(sorted(set([sanitize(t) for t in julie]))[0:3])print(sorted(set([sanitize(t) for t in mikey]))[0:3])print(sorted(set([sanitize(t) for t in sarah]))[0:3])
运行结果:

>>> ['2.01', '2.22', '2.34']['2.11', '2.23', '2.59']['2.22', '2.38', '2.49']['2.18', '2.25', '2.39']














0 0