titanic survival 1

来源:互联网 发布:java 重载的意义 编辑:程序博客网 时间:2024/03/29 20:08

###python has a nice csv reader,which reads each line of a file into memory.You can read in each row and just append a list.From there,you can 

####quickly trun it into an array.The first thing to do is to import the relevant package,that i will need for my script.These include the numpy (for 

#####maths and arrays),and csv for reading and writing csv files .If I want to use something from this I need to call csv.[function] or np.[function]

###first


import csv as csv

import numpy as np


#######open up the csv file in to a Python object

csv_file_object=csv.reader(open(D:\udacity P2/train.csv','rb''))

header=csv_file_object.next()     #### the next() command just skipts the first line which is a header

data=[]                                              ######Creat a variable called  'data'

for row in csv_file_object:             ####run through each row in the csv file,

    data.append(row)                      ####adding each row to  data  variable 

data=np.array(data)                      ####then convert from a list to an array.Be aware that each item is currently a string in this format



#######now if you want to call a specific column fo data,say,the gender column,i can just typt data[0::,4,remembering that "0::" means all 

######(from start to end), and Python starts indices from 0(not 1).You should be aware that the csv reader works by default wiht strings,so you

####will need to convert to floats in order to do numerical calculations.For example,you can turn the Pclass variable into floats by using 

#####data[0::,2].astype(np.float).Using this,we can calculate the proportion of survivors on the Titanic:

##### The size() function counts how many elements are in  the array and sum() (as you would expects) sums up the elements in array.


number_passagers=np.size(data[0::,1].astype(np.float))

number_survived=np.sum(data[0::,1].astype(np.float))

proportion_survivors=number_survived / number_passengers


######numpy has some lovely functions.For example,we can search the gender column,find wherw any elements equal female(and for males

######'do not equal female'),and then use this to determine the numver of females and males that survived:


women_only_stats=data[0::,4]=="female"  ###this finds where all the elements in the gender column that equals "female"

men_only_stats=data[0::,4]!="female"        ####this finds where all the elements do not equal female (i.e.male)


########we use these two new variables as "mask" on our original train data,so we can select only those women,and only those men on 

########board,then calculate the proportion of those who survived:

######using the index from above we select the females and males separately 

women_onboard=data[women_only_stats,1].astype(np.float)

men_onboard=data[men_only_stats,1].astype(np.float)


####then we finds the proportions of them that survived 

proportion_women_survived=\

                                                  np.sum(women_onboard)/np.size(women_onboard)

proportion_men_survived=\

                                                 np.sum(men_onboard)/np.size(men_onboard)


####and then print it out

print 'Proportion of women who survived is %s' % proportion_women_survived

print'Proportion fo men who surivived is %s' % proportion_men_survived



#####now that i have my indication that women were much more likely to survive,I am done with the training set.

######reading the test data and writing the gender modle as a csv 

######as before,we need to read in the test file by opening a python object to read and another to write.First,we read in the test.csc file and 

####skip the header line:


test_file=open('D:\udacity P2/test.csv','rb')

test_file_object=csv.reader(test_file)

header=test_file_object.next()


#####now,let's open a pointer to a new file so we can write to it (this file does not exist yet).Call it something descriptive so that it si recognizable 

#####whnen we ipload it:


prediction_file=open("genderbasedmodel.csv","rb")

prediction_file_object=csv.writer(prediction_file)


#####we now want to read in the test file row by row,see if it is female or male,and writer our survival prediciton to a new file

prideiction_file_object.writerow(["PassengerId","Survived"])

for row in test_file_object:          #######for each row in test.csv

    if row [3] =='female':               ############is it a female,if yes then 

        prediction_file_obgect.writerow([row[0],'1')    #############predict 1

    else:

        prediction_file_object.writerow([row[0],'0'])     #########predict  0

test_file.close()

prediction_file.close()


0 0