python基础之数据篇一

来源：互联网发布：em253编程实例编辑：程序博客网时间：2024/06/06 21:40

写在前面：学习一门语言，最为重要的当属它的数据结构了，这点是毋庸置疑的，不论php、java、py外表大体看来都一致，不一样的只是灵魂和思想。基础一些的就不总结了，直接上干货笔记吧。

环境：
py版本：python 2.7
操作系统：本机‘win7,64位’，虚拟机ubuntu16.04

一、迭代

0.判断是否可迭代

from collections import Iterableisinstance('...', Iterable)

1.list的迭代：

# -*- coding:utf-8 -*-   shoplist = ['foods', 'games', 'video', 'music']for x in shoplist:    print x

java或者javascript的小伙伴可能有些懵逼，迭代数组时候都是从下标0一直循环到下标最大。而它更加的简洁、抽象；但是有时候会遇到迭代同时求下标的情况，所以py模拟实现方法：

  # -*- coding:utf-8 -*-shoplist = ['foods', 'games', 'video', 'music']for i, x in enumerate(shoplist):    print i, x

2.tuple的迭代：和list的迭代一致；不同的是元祖的元素固定
3.dict的迭代：

# -*- coding:utf-8 -*-shoplist = {'foods':10, 'games':5, 'video':6, 'music':2}for x in shoplist:    print x输出：videofoodsgamesmusic

如果按照list的迭代方式，那么结果只会是循环迭代了字典的key，而忽略了value；
(1)获取整个列表；采用的办法是采用items()或者iteritems()方法：

# -*- coding:utf-8 -*-  shoplist = {'foods':10, 'games':5, 'video':6, 'music':2}for x in shoplist.iteritems():    print x输出：('video', 6)('foods', 10)('games', 5)('music', 2)

(2)；单单获取value值；可采用itervalues():

# -*- coding:utf-8 -*-shoplist = {'foods':10, 'games':5, 'video':6, 'music':2}for x in shoplist.itervalues():    print x输出：61052

二、生成器：

1.列表生成器：主要是对迭代的简写形式

[k for k, v in shoplist.iteritems() if v>5]

类似于[变量 for迭代判断] 这样的形式简化，内部可以使用任意的函数

2.生成器(generator)：将‘[’换为 ‘(‘这样；主要存储的算法，而非列表，节省空间：

  generater=(k for k, v in shoplist.iteritems() if v>5)  generater.next()

三、公共键（dict）

对于几个类似的dict中找出公共键采取的办法：

# -*- coding:utf-8 -*-#取出公共keyfrom random  import randint, sampletotalkey = []first  = {x:randint(1,4) for x in sample('abcdefgh', randint(4,8))}second = {x:randint(1,4) for x in sample('abcdefgh', randint(4,8))}third  = {x:randint(1,4) for x in sample('abcdefgh', randint(4,8))}print firstprint secondprint third#方法一：直接迭代for x in first:    if x in second and x in third:       totalkey.append(x)print '第一种：', totalkey#方法二：使用viewkeys取出key的交集# set:类似于dict，但不同于它；主要特点是# 1.只存储key；2.key唯一；3.属于集合，可以进行集合运算print '第二种：', first.viewkeys() & second.viewkeys() & third.viewkeys()#方法三：使用map、reduce与lambda匿名函数# map(函数, list列表)；与JavaScript中的map类似，相当于回调函数；# reduce(函数, list列表)；不同于map的直接迭代，# 它是类似于前面两个运算后的结果再和第三个运算，一直迭代到末尾print '第三种：', reduce(lambda x, y:x & y,map(dict.viewkeys,[first,second,third]))输出结果：{'a': 4, 'c': 4, 'e': 3, 'd': 2, 'g': 3, 'f': 3, 'h': 1}{'a': 4, 'c': 3, 'b': 1, 'e': 3, 'd': 4, 'g': 4, 'f': 1, 'h': 1}{'b': 4, 'e': 2, 'd': 1, 'f': 2}第一种： ['e', 'd', 'f']第二种： set(['e', 'd', 'f'])第三种： set(['e', 'd', 'f'])

四：list、dict的过滤与排序

在平时中，经常会遇到依照某一条件过滤掉一部分，我们采取的思维应该有

1：直接迭代；
2 : 列表过滤；
3：原生函数;

# -*- coding:utf-8 -*-#list、dict过滤from random import randintnumtotal= []numlist = [x for x in xrange(-5,10)]print '原list：', numlist#方法一：list直接迭代(过滤掉正数)for x in numlist:    if x <= 0:        numtotal.append(x)print '方法一：', numtotal#方法二：list使用filter(过滤掉负数)print  '方法二：', filter(lambda s:s>=0, numlist)#方法三：dict的过滤，采用列表生成式：gards  = {x:randint(80,100) for x in xrange(0,10)}print '方法三：', {x:y for x,y in gards.iteritems() if y>=92}#方法四：set的过滤，采用列表生成式：setnum = set([x for x in xrange(1,10)])print '方法四：', {x for x in setnum if x%3 == 0}输出结果：原list： [-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]方法一： [-5, -4, -3, -2, -1, 0]方法二： [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]{0: 99, 9: 100, 2: 95}set([9, 3, 6])

五、命名的艺术

在操作list或者tuple时候，我们难免会碰到下标为0,1,2,3…….类似的这种命名，我们要记录一组数据比如:
tuplecon = (‘linda’,’femal’,’Amercia’,37)
这种带语义的在编写程序时候tuplecon[0]这种，难免有些反人类；
在比如c或者java、php中，我们可以采用一个常量去define它，使它富有含义。py中采取的方式有：

#方法一:直接定义 NAME = 1 SEX  = 2 COUNTRY = 3 AGE  = 4#方法二：枚举 NAME,SEX,COUNTRY,AGE = xchange(0,4)#方法三：采用类对象方式处理;namedtuple为对象，其他年龄、性别为成员属性from collections import namedtuplePeople     = namedtuple('People',['name','sex','age','country'])lixiaolong = People('lixiaolong','male','Amercia',32)print lixiaolong.age

六、统计、排序的实现

在爬虫制作过程中；比如爬取《华尔街日报》的一篇报道；你要统计“创业”、“经济”…等词出现的次数，那么你应该怎么做呢？

首先，当然是爬取整篇文章；然后正则[\W]+分割开来；最终放置到一个l字典中统计，或者存储到mysql、access中读取统计；

我们记录的是利用字典进行统计：

# -*- coding:utf-8 -*-#list、dict统计from random import randint#构造列表listnum = [randint(0,10) for _ in xrange(40)]#构造字典dictnum = dict.fromkeys(listnum,0)dictnum = {x:0 for x in listnum}#迭代加1for x in listnum:    dictnum[x] += 1print dictnum输出结果：{0: 5, 1: 3, 2: 3, 3: 1, 4: 2, 5: 5, 6: 3, 7: 3, 8: 3, 9: 6, 10: 6}

2.使用collections.Counter() 排序：按照从大到小排序
当然排序的算法很多，比如冒泡、快排等等，这里我们用py的库来排序：

# -*- coding:utf-8 -*-#list、dict统计from random import randintfrom collections import Counter#构造列表s = [randint(0,10) for _ in xrange(40)]#构造字典dictnum = {x:0 for x in s}#迭代加1for x in s:    dictnum[x] += 1print Counter(dictnum).most_common()

3.使用sorted排序：首先使用zip函数压缩字典为列表

# -*- coding:utf-8 -*-#list、dict统计from random import randintfrom collections import Counter#构造列表s = [randint(0,10) for _ in xrange(40)]#构造字典dictnum = {x:0 for x in s}#迭代加1for x in s:    dictnum[x] += 1print dictnum.keys()print dictnum.values()print sorted(zip(dictnum.values(),dictnum.keys()))输出结果:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10][4, 1, 4, 5, 3, 3, 3, 6, 3, 4, 4][(1, 1), (3, 4), (3, 5), (3, 6), (3, 8), (4, 0), (4, 2), (4, 9), (4, 10), (5, 3), (6, 7)]

阅读全文

0 0