《机器学习实战》——2.k-近邻算法（KNN ）函数总结

来源：互联网发布：反转二叉树 js 编辑：程序博客网时间：2024/05/23 07:25

* 在《机器学习实战》第二章中，使用了KNN算法改进约会网站配对效果。本文为在实现过程中使用到的一些函数基本用法的总结。*

1. str.strip()方法

语法：str.strip([chars])
功能：返回移除字符串头尾指定字符后生成的新字符串
参数： chars 需要移除字符串头尾的指定字符（默认是空格）
官方文档描述： Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped

示例：

>>>comment_string = '#....... Section 3.2.1 Issue #32 .......'>>>comment_string.strip('.#! ')'Section 3.2.1 Issue #32'

2.ndarray.shape

功能：Tuple of array dimensions. For a matrix with n rows and m columns shape will be (n,m).
对于一个二维数组来说：

ndarray.shape[0]  #返回行数ndarray.shape[0]  #返回列数

3.axis

经常会遇到axis这个参数。总结为一句话：设axis = i ，则numpy沿着第i个下标变化的方向进行操作。所以对于二维数组来说：

axis = 0 #对各个列进行操作axis = 1 #对各个行进行操作#引用程序清单2-1当中的一句代码：>>>sqDistances = sqDiffMat.sum(axis=1) #对sqDiffMat矩阵各行求和

4.numpy.argsort(a, axis=-1, kind=’quicksort’, order=None)

参数：

a: array_like    Array to sortaxis : int or None, optional    Axis along which to sort. The default is -1 (the last axis). If None, the flattened array is used.

功能：Returns the indices that would sort an array. 将a中的元素从小到大排列，最后返回其对应的下标索引。

5. numpy.tile(A,reps)

参数：

A：array_like    The input arrayreps: array_like    The number of repetitions of A along each axis.

功能：Construct an array by repeating A the number of times given by reps.

6.range(stop) range(start, stop[, step])

默认步长为1.
版本之间的区别：在python 2中 range返回的是一个列表；在在python 3中 range返回的是一个迭代值

示例：

>>>list(range(10))[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

7.dict.get(key[, default])

功能：Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError. 返回指定键的值，如果值不存在字典中则返回默认值（None）
区别：dict[‘key’]只能获取存在的值，如果不存在则会报错。

8.dict.items()

功能：Return a new view of the dictionary’s items ((key, value) pairs). 成对返回可遍历的（键，值）

9.sorted(iterable, *, key=None, reverse=False)

功能：Return a new sorted list from the items in iterable.
参数：

iterable ：可迭代对象key:指定可迭代对象里的一个元素进行比较，默认为None<直接比较>reverse：是一个布尔值，True为降序，False为升序（默认）

区别：sort 是应用在list上的方法。sorted可用于所有可迭代对象。list的sort方法返回的是对已存在的列表进行操作。sorted方法返回的是一个新的list，而不是在原有基础上进行的操作。
官方文档上有这样的一句话：The built-in sorted() function is guaranteed to be stable.

10.operator.itemgetter()

功能：用于获取对象指定维的数据。参数为序号
注意：该函数获取的不是值，而是定义了一个函数。通过函数作用到对象上，才能获取值
示例：

>>>import operator>>>a = [1,2,3]>>>b = operator.itemgetter(1)>>>b(a)2

11.IO 中的read(),readline(),readlines()

①read([size]) 从文件当前位置读取size个字节，若无参数size，则到结束为止，返回字符串类型
②readline() 每次读取文件一行内容，读取时占用内存小，返回字符串对象。
③readlines() 读取整个文件所有行，保存在一个列表(list)变量中，每行作为一个元素。读取大文件时比较占内存。

阅读全文

1 0