倒排索引和term-document关联矩阵

来源:互联网 发布:tt手机语音软件 编辑:程序博客网 时间:2024/05/20 22:28

Web信息处理第三次作业

PB10210016徐波

考虑下面的文档:
    Doc 1    new home sales top forecasts 
    Doc 2    home sales rise in july 
    Doc 3    increase in home sales in july 
    Doc 4    july new home sales rise
(1)画出该文档集对应的term-document关联矩阵,假定每个单词都作为一个索引词项
(2)画出该文档集对应的倒排索引,假定每个单词都作为一个索引词项。要求每个词项包含document frenquency以及term frenquency

(1)

 

doc1

doc2

doc3

doc4

new

1

0

0

1

home

1

1

1

1

sales

1

1

1

1

top

1

0

0

0

forecasts

1

0

0

0

rise

0

1

0

1

in

0

1

1

0

july

0

1

1

1

increase

0

0

1

0

(2)

 

doc.freq

trem.freq

new

2

1 [freq.=1]; 4 [freq.=1]

home

4

1 [freq.=1]; 2 [freq.=1]; 3 [freq.=1]; 4 [freq.=1];

sales

4

1 [freq.=1]; 2 [freq.=1]; 3 [freq.=1]; 4 [freq.=1];

top

1

1 [freq.=1]; 

forecasts

1

1 [freq.=1]; 

rise

2

2 [freq.=1]; 4 [freq.=1];

in

3

2 [freq.=1]; 3 [freq.=2]; 

july

3

2 [freq.=1]; 3 [freq.=1]; 4 [freq.=1];

increase

1

3 [freq.=1]; 

 

 

原创粉丝点击