如何统计输入中所有单词出现的次数？

来源：互联网发布：nao机器人编程编辑：程序博客网时间：2024/06/05 05:10

问题描述：统计输入中所有单词的出现次数。因为预先不知道出现的单词列表，所以无法方便地排序，并使用折半查找；也不能分别对输入中的每个单词都执行一次线性查找，看它是否在前面是否已经出现，这样做，程序的执行时间将花费太长。那么如何组织这些数据，才能有效地处理一系列任意的单词呢？

一种解决办法是，在读取任意单词的同时就把它放到正确的位置上，从而始终保证所有的单词是按顺序排列的。虽然这可以不用通过在线性数组中移动单词来实现，但是它仍然会导致执行的时间过长。在这我们使用二叉树的数据结构来解决这个问题。

下面是整个程序代码：

#include <stdio.h>#include <ctype.h>#include <string.h>#include <malloc.h>#define MAXWORD 100struct tnode{    char *word;int count;struct tnode *left;struct tnode *right;};struct tnode *addtree(struct tnode *, char *);void treeprint(struct tnode *);int getword(char *, int);int main(void){   struct tnode *root;   char word[MAXWORD];      root = NULL;   while(getword(word, MAXWORD) != EOF)        if(isalpha(word[0]))   root = addtree(root, word);   treeprint(root);      return 0;}struct tnode *talloc(void);char *strdup1(char *);struct tnode *addtree(struct tnode *p, char *w){   int cond;   if(p == NULL)   {      p = talloc();  p->word = strdup(w);  p->count = 1;  p->left = NULL;  p->right = NULL;   } else if(cond = strcmp(w, p->word) == 0)      p->count++;   else if(cond < 0)  p->left = addtree(p->left, w);   else      p->right = addtree(p->right, w);   return p;  }void treeprint(struct tnode *p){    if(p != NULL){   treeprint(p->left);   printf("%4d %s\n", p->count, p->word);   treeprint(p->right);}}struct tnode *talloc(void){    return (struct tnode *) malloc(sizeof(struct tnode));}char *strdup1(char *s){    char *p;    p = (char *) malloc(sizeof(strlen(s) + 1));    if(p != NULL)       strcpy(p, s);    return p;   }int getword(char *word, int lim){   int c, getch();   void ungetch();   char *w = word;   while(isspace(c = getch()))        ;   if(c != EOF)      *w++ = c;   if(!isalpha(c))   {      *w = '\0';  return c;   }   for(; --lim>0; w++)   {      if(!isalnum(*w = getch()))  {      ungetch(*w);  break;  }   }   *w = '\0';   return word[0];}#define BUFSIZE 100char buf[BUFSIZE];int bufp = 0;int getch(void){    return (bufp > 0)?buf[--bufp]:getchar();}void ungetch(int c){    if(bufp >= BUFSIZE)   printf("ERROR: too many characters");else   buf[bufp++] = c; }

程序运行结果:

补充：

1.图中第二行“now is the……”是输入，在命令行中输入结束需要用组合键，参考文章：命令行结束输入

2.更多程序相关信息请参考《C程序设计语言》6.5节

3.在自己编程程序的过程发生了错误error: conflicting types for 'XXXX'，是常见错误，解决办法参考错误解决办法

本文程序bug修正（2013年11月4日）：

在写完程序后，我一直有个问题，那就是上面那段程序中

void treeprint(struct tnode *p){    if(p != NULL){   treeprint(p->left);   printf("%4d %s\n", p->count, p->word);   treeprint(p->right);}}

这是一个二叉树的中序遍历啊（中序遍历等知识参考文章：二叉树的前序遍历、中序遍历及后序遍历），可是上面程序运行的打印结果却是按输入的先后顺序来的，我检查了很久才发现错误所在，那就是

struct tnode *addtree(struct tnode *p, char *w){   int cond;   if(p == NULL)   {      p = talloc();  p->word = strdup(w);  p->count = 1;  p->left = NULL;  p->right = NULL;   } else if(cond = strcmp(w, p->word) == 0)      p->count++;   else if(cond < 0)  p->left = addtree(p->left, w);   else      p->right = addtree(p->right, w);   return p;  }

这段代码中出现了问题，问题出在第一个else if语句处，
(cond = strcmp(w, p->word) == 0)
根据c语言的符号优先级，==优先级高于=，那么这个程序的功能就变了，在上面的输入中就变成了输入now之后，再输入is，判断strcmp(w, p->word) == 0应该是否定的，所以cond此时的值为0，接下来就会执行p->right = addtree(p->right, w);也就是说，只有出现新的单词出现就插入到了二叉树的右边，所以才会出现结果是按输入顺序排序的情况。说来还真是巧啊，要是有一个程序要求这样的功能就可以这样做了，哈哈。
那么原来正确的程序应该是怎样的呢？
其实很简单，加个括号而已。
也是这样的
((cond = strcmp(w, p->word)) == 0)
将程序修改之后的运行结果如下：

好了，到此问题得到了解决，上图的结果才是我们预期想要滴，O(∩_∩)O~