Dataquest Data Scientist Path 整理笔记（1）

来源：互联网发布：mysql 降序 desc 编辑：程序博客网时间：2024/06/05 13:46

在Dataquest中学习Data Scientist方向的知识要点整理笔记。

Step1：Introduction to Python

打开文件

f = open("crime_rates.csv","r")g = f.read()

分割list

sample = "john,plastic,joe"split_list = sample.split(",")

string中符号替换

text = "Howdy,my,name"text = text.replace(",", "")

Function

def clean_text(string_value，clean = False):#clean默认为False，可通过将实例中相应参数设为Ture来实现函数功能。    if clean:        cleaned_value = string_value.replace(",", "")    return(cleaned_value)sentence = "Howdy,james,bond!"sentence = clean_text(sentence,Ture)

String大写变小写

words = "Michael JACKSON Thriller"lower_words = words.lower()

Model
利用csv中的.reader功能读取csv文件

import csvf = open("my_data.csv")csvreader = csv.reader(f)my_data = list(csvreader)

Class

class Car():    def __init__(self, name):        self.name = name        self.color = "black"        self.make = "honda"        self.model = "accord"    def print_name(self):        print(self.name)Car.print_name()

Set
去重

unique_animals = set(["Dog", "Cat", "Hippo", "Dog", "Cat", "Dog", "Dog", "Cat"])

Try/Except Blocks

numbers = [1,2,3,4,5,6,7,8,9,10]for i in numbers:    try:        int('')    except Exception:        print("There was an error")#如果不希望print，可换为pass，不进行处理

enumerate()
将door_count中的元素加入cars中的每一行，add columns to list of list。

door_count = [4, 4]cars = [        ["black", "honda", "accord"],        ["red", "toyota", "corolla"]       ]for i, car in enumerate(cars):    car.append(door_count[i])

List Comprehensions

animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]animal_lengths = [len(animal) for animal in animals]

items()

fruits = {"apple": 2, "orange": 5, "melon": 10}for fruit, rating in fruits.items():    print(rating)

Regular Expressions 正则表达式
“.”：通配符，可表示任意字符。
“^”：匹配以本符号开头的字符串，如”^abc”。
“$”：匹配以本符号结束的字符串，如”abc$”。
“|”：匹配本符号前字符开头或本符号后字符结尾的字符串，如”cat|dog”可匹配”catalog”或”hotdog”。
“[]”：中括号中为或的关系，如”[bcr]at” 可以匹配 “bat”, “cat”, “rat”。
“[-]”：”[0-9]”表示0至9任意一个数字，”[h-y]”表示h至y之间任意一个字母。
“{}”：”{4}”表示前边的字符重复4次，”[0-9]{4}”可以匹配年份。
“\” ：转义字符，如”.”表示”.”这一字符。

import rere.search(regex, string)#在"regex"中查询是否存在"string"，如存在，则返回match这个object，如不存在则返回"None"re.sub("yo", "hello", "yo world")#用"hello"替换"yo world"中的"yo"，可用于标准化字符串re.findall("[1-2][0-9]{3}", years_string)#在"years_string"中匹配"1000"到"2999"的字符串

time.time()
Unix timestamps 标准时间戳：表示相对于新纪元过了多长时间

import timecurrent_time = time.time()#取得现在的标准时间戳

time.gmtime()
更为易读的时间格式，部分属性如下：
tm_year： The year of the timestamp
tm_mon： The month of the timestamp (1-12)
tm_mday： The day in the month of the timestamp (1-31)
tm_hour： The hour of the timestamp (0-23)
tm_min： The minute of the timestamp (0-59)
UTC
Coordinated Universal Time 格林威治时间，通过datetime表示，属性如下：
year
month
day
hour
minute
second
microsecond

import datetimecurrent_datetime = datetime.datetime.now()current_year = current_datetime.yearcurrent_month = current_datetime.month

timedelta
datetime中的一个类，用于表示时间跨度，有以下参数：
weeks
days
hours
minutes
seconds
milliseconds
microseconds

diff = datetime.timedelta(weeks = 3, days = 2)

datetime.strftime()
datetime中的一个方法，用于将时间表达为想要的易读的形式，方法介绍

import datetimemarch3 = datetime.datetime(year = 2010, month = 3, day = 3)pretty_march3 = march3.strftime("%b %d, %Y")

datetime.datetime.strptime()
函数，用于将一个表示时间的字符串转换为datetime实例

march3 = datetime.datetime.strptime("Mar 03, 2010", "%b %d, %Y")

datetime.datetime.fromtimestamp()
函数，用于将一个Unix timestamps转换为datetime对象

datetime_object = datetime.datetime.fromtimestamp(1433213314.0)

阅读全文

1 0