zoj3013Word Segmenting (字典树+dp)
来源:互联网 发布:如何打开c语言 编辑:程序博客网 时间:2024/05/18 06:25
One key technology in Chinese search engine is Word Segmenting, which is more difficult than English Word Segmenting, as there is no space between words.
A traditional solution is Forward Maximum Match (FMM) or Backward Maximum Match(BMM). The FMM alogrithm goes forward from the begining of the passage, selecting the word with the maximum length as the current match word from the word library, and jump to the word's next position, of course, the selected word must match the current position. If there is no word matches the current position, simply go to next position. The BMM is similarly to the FMM, while the BMM goes backward from the end of the passage.
But the problem is that we can't guarantee a high accuracy, so that the performance of the search engine isn't very well. So, new strategies are brought in, such as semantic analysis, probability analysis and so on.
Here we just care about the strategy based on word library. A popular strategy is Minimum Word Fragment(MWF), that is, the one has the minimum total word fragment length among the segmentations of a passage is the best. The word fragment is defined to be the characters not matched.
Now you are invited to design an algorithm to implement the MWF strategy.
Input
The first line of each test case contains two integers M and N. M is the number of words in the word library and N is the number of passages to be processed. (1 <= M <= 12500, 1 <= N <= 100 ) The next M lines is the word library, with each line containing one word; While the next N lines is the passage, with each line containing one passage. For similarity, the words and passages contain only English letters 'a' to 'z' (no spaces in it) . The length of each word in the word library is no more than 8, and the length of each passage is no more than 4k, namely 4096 bytes.
Process to the end of input.
Output
For each test case, output a integer in the first line, which is the minimum total word fragment length. In the second line, output the corresponding segmentation. Use '#'s to separate words and word fragments. If there are multiple segmentations satisfy MWF, output anyone.
Sample Input
3 1 abc efg cdef abcdefg
Sample Output
1 abc#d#efg
Note
In the sample above, there are two segmentations, which are "abc#d#efg" and "ab#cdef#g". In the first segmentation letter 'd' is not matched, so the total word fragment length is 1, while in the second segmentation the letters 'a','b','g' are not matched, so the total word fragment length is 3. We output the first segmentation according to MWF.
注意:数组要开得大一些,我没有注意到询问的字符串长度一直SF。这题交了无数次后终于20MS AC了。不容易啊!!
解题:用字典树保存图书库.
#include<stdio.h>#include<string.h>#define inf 9999999typedef struct nnn{ bool flag; struct nnn *ch[26];}node;node *root;node *builde(){ node *p; p=new node; p->flag=0; for(int i=0;i<26;i++) p->ch[i]=NULL; return p;}void set(char str[]){ int len=strlen(str); node *p=root; for(int i=0;i<len;i++) { if(p->ch[str[i]-'a']==NULL) p->ch[str[i]-'a']=builde(); p=p->ch[str[i]-'a']; } p->flag=1;}char str[5000];int findstr(int st,int end){ node *p=root; for(int i=st;i<=end;i++) { if(p->ch[str[i]-'a']==NULL) return 0; p=p->ch[str[i]-'a']; } if(p->flag)return 1; return -1;}int main(){ int n,m,dp[5000],pre[5000],len,vist[5000]; while(scanf("%d%d",&m,&n)>0){ root=builde(); while(m--) { scanf("%s",str); set(str); } memset(vist,0,sizeof(vist)); memset(dp,9999999,sizeof(dp)); while(n--) { scanf("%s",str); len=strlen(str); pre[0]=dp[0]=0; for(int i=1;i<=len;i++) { if(dp[i]>dp[i-1]+1) dp[i]=dp[i-1]+1,pre[i]=i-1; for(int j=0;j+i<=len;j++) { int tem=findstr(i-1,i-1+j); if(tem==1) { if(dp[i-1]<dp[i+j]) { dp[i+j]=dp[i-1]; pre[i+j]=i-1; } } else if(tem==0) break; } } int j=pre[len]; while(j) { vist[j]=1;j=pre[j]; } printf("%d\n",dp[len]);dp[len]=inf; for(int i=0;i<len;i++) { if(vist[i]) printf("#"); printf("%c",str[i]); vist[i]=0; dp[i]=inf; } printf("\n"); } }}
- zoj3013Word Segmenting (字典树+dp)
- 字典树 + DP
- LA3942 字典树+ dp
- uva3942 字典树加dp
- NEU 1007 (字典树 DP)
- Find MaxXorSum 字典树+DP
- UVa 1401 字典树+DP
- UVALive 3942 字典树+dp
- EOJ-3261 字典树 + dp
- EOJ 3261 字典树+dp 字典树模版
- NEFU English Game 字符串 dp 字典树
- nefu 680(字典树DP)
- leetcode Word Break 字典树+dp
- ural 1002 Phone Numbers 字典树+dp
- UVA3942(字典树加DP)
- ural 1002 Phone Numbers 字典树+dp
- NUBT 1222 English Game 【字典树 + dp】
- ZSOI2012 捡金子 字典树+树形DP
- HDU 1281 棋盘游戏(二分图最大匹配:关键边)
- NSTimer知识点和倒计时的实现
- 关于Serializable知识点的整理(序列化)
- c++成员函数的存储
- Sublime text 2 支持编译C C++
- zoj3013Word Segmenting (字典树+dp)
- java实现对象的排序
- Windows Phone 7开发环境搭建
- Javascript URL中参数包含特殊字符的处理
- 算法排序之最简单最快的排序--桶排序(Bucket Sort)
- 一段幸福而又美丽的英语之旅
- MD5 SHA1 UUID 算法的C语言实现
- 利用NFS在主机上挂载开发板文件
- Excel的导入与导出