词频 term frequency
来源:互联网 发布:java sleep用法 编辑:程序博客网 时间:2024/06/05 20:20
/*
*michzel new java files
*
*Created on 2010-10-2
*
*Copyright 2010 Anchora info company. all rights reserved
*/
package TFIDF;
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import org.wltea.analyzer.IKSegmentation;
import org.wltea.analyzer.Lexeme;
public class IKtf
{
public static void main(String[] args)
{
String filepathfrom=System.getProperty("user.dir")+"//南宋生活顾问1.txt";
String filepathto=System.getProperty("user.dir")+"//resulttest.txt";
String text=TextManager.Read(filepathfrom);
List<WordsCounter> wordsCountList=new ArrayList<WordsCounter>();
List<String> wordsList=new ArrayList<String>();
//分词部分代码
System.out.println(text);
IKSegmentation ikSeg = new IKSegmentation(new StringReader(text) , false);
try
{
Lexeme l = null;
while( (l = ikSeg.next()) != null)
{
System.out.println(l);
wordsList.add(l.getLexemeText());
System.out.println(+wordsList.size());
}
}
catch (IOException e)
{
e.printStackTrace();
}
System.out.println("***************");
//统计词汇频数
for(String word:wordsList)
{
boolean match=false;
for(int i=0;i<wordsCountList.size();i++)
{
if(word.equals(wordsCountList.get(i).text))
{
wordsCountList.get(i).count++;
match=true;
break;
}
}
if(match==false)
{
wordsCountList.add(new WordsCounter(word,1));
}
}
//将统计结果写入文本文档
String resultString="";
for(WordsCounter wordCounter:wordsCountList)
{
resultString+=wordCounter.text+":"+wordCounter.count+"/r/n";
System.out.println(wordCounter.text+":"+wordCounter.count);
double tf= (double) wordCounter.count/wordsList.size();
System.out.println(+tf);
}
TextManager.Write(filepathto,resultString);
}
}
- 词频 term frequency
- TF-IDF Term frequency - inverse document frequency
- TF-IDF(term frequency–inverse document frequency)
- TF-IDF(term frequency–inverse document frequency)
- tf-idf(term frequency–inverse document frequency)含义
- Get Term frequency in Lucene using Zend Framework
- 【CBIR】TF-IDF (term frequency–inverse document frequency) 倒排文档索引
- Term
- Character frequency
- Word Frequency
- Word Frequency
- Word Frequency
- Frequency Distribution
- Word Frequency
- Frequency bin
- term _0524
- term vectors
- Software Term
- Repeater实现没有数据的时候显示提示语句
- http referer 解释及用法
- 自学能不能找到工作
- 编写一个函数,实现接受输入的字符串,然后取反
- SQL server 2000 的安装(图解)
- 词频 term frequency
- asterisk manager api 协议
- oracle创建新用户
- xp系统安装_ghost的使用(图解)
- Tomcat初探
- org.apache对javamail的封装
- 产生多于的class$1.class的原因
- Java中的反射学习及反射解耦应用
- 跟刘峰六学C语言(5) 线程栅栏