Dataquest Data Scientist Path 整理笔记(1)

来源:互联网 发布:mysql 降序 desc 编辑:程序博客网 时间:2024/06/05 13:46

在Dataquest中学习Data Scientist方向的知识要点整理笔记。

Step1:Introduction to Python

  • 打开文件
f = open("crime_rates.csv","r")g = f.read()
  • 分割list
sample = "john,plastic,joe"split_list = sample.split(",")
  • string中符号替换
text = "Howdy,my,name"text = text.replace(",", "")
  • Function
def clean_text(string_value,clean = False):#clean默认为False,可通过将实例中相应参数设为Ture来实现函数功能。    if clean:        cleaned_value = string_value.replace(",", "")    return(cleaned_value)sentence = "Howdy,james,bond!"sentence = clean_text(sentence,Ture)
  • String大写变小写
words = "Michael JACKSON Thriller"lower_words = words.lower()
  • Model
    利用csv中的.reader功能读取csv文件
import csvf = open("my_data.csv")csvreader = csv.reader(f)my_data = list(csvreader)
  • Class
class Car():    def __init__(self, name):        self.name = name        self.color = "black"        self.make = "honda"        self.model = "accord"    def print_name(self):        print(self.name)Car.print_name()
  • Set
    去重
unique_animals = set(["Dog", "Cat", "Hippo", "Dog", "Cat", "Dog", "Dog", "Cat"])
  • Try/Except Blocks
numbers = [1,2,3,4,5,6,7,8,9,10]for i in numbers:    try:        int('')    except Exception:        print("There was an error")#如果不希望print,可换为pass,不进行处理
  • enumerate()
    将door_count中的元素加入cars中的每一行,add columns to list of list。
door_count = [4, 4]cars = [        ["black", "honda", "accord"],        ["red", "toyota", "corolla"]       ]for i, car in enumerate(cars):    car.append(door_count[i])
  • List Comprehensions
animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]animal_lengths = [len(animal) for animal in animals]
  • items()
fruits = {"apple": 2, "orange": 5, "melon": 10}for fruit, rating in fruits.items():    print(rating)
  • Regular Expressions 正则表达式
    “.”:通配符,可表示任意字符。
    “^”:匹配以本符号开头的字符串,如”^abc”。
    $”:匹配以本符号结束的字符串,如”abc$”。
    “|”:匹配本符号前字符开头或本符号后字符结尾的字符串,如”cat|dog”可匹配”catalog”或”hotdog”。
    “[]”:中括号中为或的关系,如”[bcr]at” 可以匹配 “bat”, “cat”, “rat”。
    “[-]”:”[0-9]”表示0至9任意一个数字,”[h-y]”表示h至y之间任意一个字母。
    “{}”:”{4}”表示前边的字符重复4次,”[0-9]{4}”可以匹配年份。
    “\” :转义字符,如”.”表示”.”这一字符。
import rere.search(regex, string)#在"regex"中查询是否存在"string",如存在,则返回match这个object,如不存在则返回"None"re.sub("yo", "hello", "yo world")#用"hello"替换"yo world"中的"yo",可用于标准化字符串re.findall("[1-2][0-9]{3}", years_string)#在"years_string"中匹配"1000"到"2999"的字符串
  • time.time()
    Unix timestamps 标准时间戳:表示相对于新纪元过了多长时间
import timecurrent_time = time.time()#取得现在的标准时间戳
  • time.gmtime()
    更为易读的时间格式,部分属性如下:
    tm_year: The year of the timestamp
    tm_mon: The month of the timestamp (1-12)
    tm_mday: The day in the month of the timestamp (1-31)
    tm_hour: The hour of the timestamp (0-23)
    tm_min: The minute of the timestamp (0-59)

  • UTC
    Coordinated Universal Time 格林威治时间,通过datetime表示,属性如下:
    year
    month
    day
    hour
    minute
    second
    microsecond

import datetimecurrent_datetime = datetime.datetime.now()current_year = current_datetime.yearcurrent_month = current_datetime.month
  • timedelta
    datetime中的一个类,用于表示时间跨度,有以下参数:
    weeks
    days
    hours
    minutes
    seconds
    milliseconds
    microseconds
diff = datetime.timedelta(weeks = 3, days = 2)
  • datetime.strftime()
    datetime中的一个方法,用于将时间表达为想要的易读的形式,方法介绍
import datetimemarch3 = datetime.datetime(year = 2010, month = 3, day = 3)pretty_march3 = march3.strftime("%b %d, %Y")
  • datetime.datetime.strptime()
    函数,用于将一个表示时间的字符串转换为datetime实例
march3 = datetime.datetime.strptime("Mar 03, 2010", "%b %d, %Y")
  • datetime.datetime.fromtimestamp()
    函数,用于将一个Unix timestamps转换为datetime对象
datetime_object = datetime.datetime.fromtimestamp(1433213314.0)
原创粉丝点击