NLP之路-一点小语言工具函数

来源:互联网 发布:淘宝企业店铺登陆 编辑:程序博客网 时间:2024/05/27 21:01

统计工具

#coding=utf-8def lexical_diversity(my_text_data):word_count=len(my_text_data)vocal_size=len(set(my_text_data))diversity_score=word_count/vocal_sizereturn diversity_scoremy_text_data="The problem of nearest neighbor search is one of major importance in a variety of applications such as image recognition, data compression, pattern recognition and classi?cation, machine learning, document retrieval systems, statistics and data analysis. However, solving this problem in high dimensional spaces seems to be a very di?cult task and there is no algorithm that performs signi?cantly better than the standard brute-force search. This has lead to an increasing interest in a class of algorithms that perform approximate nearest neighbor searches, which have proven to be a good-enough approximation in most practical applications and in most cases, orders of magnitude faster that the algorithms performing the exact searches"print len(my_text_data)print len(set(my_text_data))print lexical_diversity(my_text_data)

0 0