csr_matrix计算tf

来源:互联网 发布:java method的反射 编辑:程序博客网 时间:2024/05/22 04:30
from scipy.sparse import csr_matrixdef tf(docs):    """    As an example of how to construct a CSR matrix incrementally, the    following snippet builds a term-document matrix from texts:    :type docs:List[List[str]]    :param docs:    :return:    """    data = []    indices = []    indptr = [0]    vocabulary = {}    for doc in docs:        for term in doc:            data.append(1)            indices.append(vocabulary.setdefault(term, len(vocabulary)))        indptr.append(len(indices))    return csr_matrix((data, indices, indptr)).toarray()corpus = open('/home/fhqplzj/IdeaProjects/DocumentClustering/target/data/ap').readlines()corpus = map(lambda line: line.strip().split(), corpus)print tf(corpus)

0 0
原创粉丝点击