【杭电2015年12月校赛H】【模拟 STL-MAP STL-SET stringstream】Study Words 从article中提取中10个没有学过的频率最高单词
来源:互联网 发布:更相减损术的c语言 编辑:程序博客网 时间:2024/06/02 19:33
Study Words
Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)Total Submission(s): 226 Accepted Submission(s): 80
Problem Description
Learning English is not easy, vocabulary troubles me a lot.
One day an idea came up to me: I download an article every day, choose the 10 most popular new words to study.
A word's popularity is calculated by the number of its occurrences.
Sometimes two or more words have the same occurrences, and then the word with a smaller lexicographic has a higher popularity.
One day an idea came up to me: I download an article every day, choose the 10 most popular new words to study.
A word's popularity is calculated by the number of its occurrences.
Sometimes two or more words have the same occurrences, and then the word with a smaller lexicographic has a higher popularity.
Input
T in the first line is case number.
Each case has two parts.
<oldwords>
...
</oldwords>
<article>
...
</article>
Between <oldwords> and </oldwords> are some old words (no more than 10000) I have already learned, that is, I don't need to learn them any more.
Words between <oldwords> and </oldwords> contain letters ('a'~'z','A'~'Z') only, separated by blank characters (' ','\n' or '\t').
Between <article> and </article> is an article (contains fewer than 1000000 characters).
Only continuous letters ('a'~'z','A'~'Z') make up a word. Thus words like "don't" are regarded as two words "don" and "t”, that's OK.
Treat the uppercase as lowercase, so "Thanks" equals to "thanks". No words will be longer than 100.
As the article is downloaded from the internet, it may contain some Chinese words, which I don't need to study.
Each case has two parts.
<oldwords>
...
</oldwords>
<article>
...
</article>
Between <oldwords> and </oldwords> are some old words (no more than 10000) I have already learned, that is, I don't need to learn them any more.
Words between <oldwords> and </oldwords> contain letters ('a'~'z','A'~'Z') only, separated by blank characters (' ','\n' or '\t').
Between <article> and </article> is an article (contains fewer than 1000000 characters).
Only continuous letters ('a'~'z','A'~'Z') make up a word. Thus words like "don't" are regarded as two words "don" and "t”, that's OK.
Treat the uppercase as lowercase, so "Thanks" equals to "thanks". No words will be longer than 100.
As the article is downloaded from the internet, it may contain some Chinese words, which I don't need to study.
Output
For each case, output the top 10 new words I should study, one in a line.
If there are fewer than 10 new words, output all of them.
Output a blank line after each case.
If there are fewer than 10 new words, output all of them.
Output a blank line after each case.
Sample Input
2<oldwords>how aRe you</oldwords><article>--How old are you?--Twenty.</article><oldwords>google cn huluobo net i</oldwords><article>文章内容:I love google,dropbox,firefox very much.Everyday I open my computer , open firefox , and enjoy surfing on the inter-net.But these days it's strange that searching "huluobo" is unavail-able.What's wrong with "huluobo"?</article>
Sample Output
oldtwentyfirefoxopensableandbutcomputerdaysdropboxenjoy
#include<stdio.h>#include<iostream>#include<sstream>#include<algorithm>#include<ctype.h>#include<string.h>#include<vector>#include<set>#include<map>using namespace std;int casenum,casei;typedef long long LL;const int N=105;int n,m;char s[N];char oldwords[]="</oldwords>";char article[]="</article>";set<string>sot;map<string,int>mop;map<string,int>::iterator it;const int L=1e6+10;char ss[L];vector<pair<int,string> >b;int main(){ scanf("%d",&casenum); for(casei=1;casei<=casenum;++casei) { sot.clear();mop.clear(); while(1) { scanf("%s",s); for(int i=0;s[i];++i)s[i]=tolower(s[i]); if(!strcmp(s,oldwords))break; sot.insert(s); }scanf("%s",s);getchar(); int l=0; while(1) { gets(ss+l);int len=strlen(ss+l); if(!strcmp(ss+l,article))break; for(int i=l;ss[i];++i) {if(!isalpha(ss[i]))ss[i]=' '; else ss[i]=tolower(ss[i]); } l+=len; ss[l++]=' '; }ss[l]=0; stringstream cinn(ss); while(cinn>>s) { if(sot.find(s)==sot.end())++mop[s]; } b.clear(); for(it=mop.begin();it!=mop.end();++it) { b.push_back(make_pair(-it->second,it->first)); } sort(b.begin(),b.end()); for(int i=0;i<min(10,(int)b.size());++i)cout<<b[i].second<<endl; puts(""); } return 0;}/*【trick&&吐槽】1,这题也是读错题了= = 如果一行的末尾就是英文字符,我们是需要把换行符也给加进来的,否则会和下一行的连成一个单词。比赛的时候就是因为这个傻叉错误,浪费了一个小时+4次罚时。。。2,中文字符的特点是,连续两个字符的Ascii码都为负数3,我怕strcmp慢,手写了比较函数...4,这题数据一定很水,都没有下面这样的数据干扰>_<<oldwords></oldwords><article>/article/article></article</article>【题意】对于每组数据,数据都是以下的形式<oldwords>...</oldwords><article>...</article>对于article中的单词,我们要找出10个出现频率最高的英文单词要求:1,oldwords中的单词是我们已经学会的,以后就不用再学习了。2,不区分大小写3,忽略中文符号4,凡是不连在一起的英文单词,哪怕是don't 都要拆分成两个单词don 和 t来处理5,如果频率相同,以字典序小的优先6,如果不到10个,按照(频率,字典序)的双关键字标准,有几个单词就输出几个。7,每个单词的长度不超过1008,article的长度不超过1e7【类型】模拟 STL-SET STL-MAP【分析】我的做法是这样的——1,先转小写。2,SET存所有需要去除的单词3,提取中所有单词具体实现,有方便的技巧。比如我们可以用——(1)stringstream cinn(s)(2)scanf(%[^])4,MAP中记录所有单词的频率5,把MAP中的所有单词,按照(频率,字典序)排序,输出前10个即可。【时间复杂度&&优化】O(1e6 log(1e6))0msAC,说明数据真的很弱= =*/
1 0
- 【杭电2015年12月校赛H】【模拟 STL-MAP STL-SET stringstream】Study Words 从article中提取中10个没有学过的频率最高单词
- 统计一TXT文档中单词出现频率,输出频率最高的10个单词
- 查找文本中n个出现频率最高的单词
- 找出文件中最高频率的前k个单词
- 查找文本中n个出现频率最高的单词
- STL统计英文中单词出现频率的问题
- STL中set、map的比较函数
- STL中map,set的使用
- STL中map-set的使用
- 一篇文章中求出现频率最高的10个单词(C++实现tanglanting)
- 分析一个文本文件中各个单词出现的频率,把频率最高的10个词打印出来
- 分析一个文本文件中各个单词出现的频率,把频率最高的10个词打印出来
- 分析一个文本文件中各个单词出现的频率,把频率最高的10个词打印出来
- stl中map与set
- 计算一篇文章中单词出现的频率,并把输出频率最高的十五个单词输出来
- linux shell查找文本中n个出现频率最高的单词
- Linux下统计文本文件中前n个出现频率最高的单词
- 【UKIEPC2015 C】【STL set map stringstream】Conversation Log 网络审查 被所有人都说过的话 map套set法+人哈希法
- 黑马52期学后总结笔记(九)
- Java算法题:求素数
- Linux下修改Mysql的用户(root)的密码
- 图像处理算法基础(十)---大津法求最佳分割阈值
- socket编程实验-UDP文件传输
- 【杭电2015年12月校赛H】【模拟 STL-MAP STL-SET stringstream】Study Words 从article中提取中10个没有学过的频率最高单词
- ubuntu 装 jdk 1.7
- 大话设计模式-原型模式
- AIDL
- 自定义View时,用到Paint Canvas的一些温故,讲讲平时一些效果是怎么画的(基础篇 一)
- Spring AOP初级入门-代码篇之XML
- Boost.Bind用法详解
- 关于String的两种赋值方式
- 在mac OSX中安装启动zookeeper