【Leetcode Shell】Word Frequency
来源:互联网 发布:手机淘宝举报在哪里看 编辑:程序博客网 时间:2024/06/06 10:02
题目:
Write a bash script to calculate the frequency of each word in a text file words.txt
.
For simplicity sake, you may assume:
words.txt
contains only lowercase characters and space' '
characters.- Each word must consist of lowercase characters only.
- Words are separated by one or more whitespace characters.
For example, assume that words.txt
has the following content:
the day is sunny the thethe sunny is isYour script should output the following, sorted by descending frequency:
the 4is 3sunny 2day 1
Note:
Don't worry about handling ties, it is guaranteed that each word's frequency count is unique.
第一次写的:
# Read from the file words.txt
and
output the word frequency list to stdout.
sed
's/ /\n/g'
words.txt | sort | uniq -c | sort -r | awk
'{print $2 " " $1}'
思想:
(1)通过sed命令将空格转换成换行符——>(2)将得到的结果用sort命令来排序——>(3)然后用uniq -c命令来统计每个单词出现的次数——>(4)将得到的结果用sort -r命令来逆序排序——>(5)用awk重新排版
报错:
错误原因:
忽略了多个空格或者tab的影响,如果两个单词之间有多个空格,sed命令只会把一个空格当作分隔符
第二次:
# Read from the file words.txt
and
output the word frequency list to stdout.
sed
's/ /\n/g'
words.txt | sed
'/^\s*$/d'
| sort | uniq -c | sort -r | awk
'{print $2 " " $1}'
由于这道题只有空格,没有tab,在(1)和(2)之间加入去空格行的代码
还是报错:
错误原因:
can 13应该是在最前面的,结果排到了最后。说明是排序命令出现错误。
第三次:
# Read from the file words.txt
and
output the word frequency list to stdout.
sed
's/ /\n/g'
words.txt | sed
'/^\s*$/d'
| sort | uniq -c | sort -rn | awk
'{print $2 " " $1}'
Accepted
看来确实是排序命令没有用对,加一个-n选项,可以按照出现次数(注意:uniq -c输出的格式是:次数 单词)排序,这样就不会出现之前的状况了。
本题知识点:
一、sed转换,
(1)将空格转换成回车: sed 's/ /\n/g'
(2)将多个空格行删除:
sed '/^\s*$/d'
还可以用awk NF 或者 awk '!/^$/' 或者 tr -s '\n' ;
二、sort排序
(1)sort -r 逆序排列
-r, --reverse reverse the result of comparisons
--sort=WORD sort according to WORD:
general-numeric -g, human-numeric -h, month -M,
numeric -n, random -R, version -V
(2)sort -n 按字符串的数值排列,帮助文档:“ compare according to string numerical value”
三、uniq查重
我们通过uniq --help命令,查看uniq的帮助文档,有如下提示:
Note: 'uniq' does not detect repeated lines unless they are adjacent.
You may want to sort the input first, or use `sort -u' without `uniq'.
可以看到uniq只能检测到相邻的重复,所以我们在uniq之前先用sort命令排序,这样可以使重复的单词相邻,方便我们用uniq统计其重复次数。当然,我们也可以用sort -u来达到同样的目的。
四、awk排版
因为程序经sort -rn的输出格式是:次数 单词,因此我们需要排版,用awk命令(默认的分隔符是空格),将第一列和第二列颠倒即可。
本题扩展:
如果文件中有tab键该如何写shell?
# Read from the file words.txt
and
output the word frequency list to stdout.
sed
's/ /\n/g'
words.txt | sed -e
'/^\s*$/d'
-e
's/\t*//g'
\
| sort | uniq -c | sort -rn | awk
'{print $2 " " $1}'
对,只需用sed命令将一个或多个tab换成空即可,这里注意sed如果要多条命令同时执行,用-e选项
- [Leetcode Shell]Word Frequency
- 【Leetcode Shell】Word Frequency
- Leetcode: Word Frequency (shell , awk)
- leetcode-shell-192. Word Frequency
- [leetcode][bash] Word Frequency
- leetcode-192 Word Frequency
- LeetCode 192 Word Frequency
- Leetcode: Word Frequency
- [leetcode]Word Frequency
- leetcode 192. Word Frequency
- [Leetcode] 192. Word Frequency
- [Leetcode] Word Frequency的笔记
- [LeetCode] Word Frequency 单词频率
- Word Frequency
- Word Frequency
- Word Frequency
- Word Frequency
- shell 和 python3 :Word Frequency(leetcode192-t11.sh)
- java语言实现创建型设计模式—抽象工厂模式
- 漫谈程序员系列:快来约这些女生,保你脱单
- 《python核心编程第二版》第七章练习解析
- 10-2. 删除字符串中的子串(20)
- YTUOJ-占座问题(2道)
- 【Leetcode Shell】Word Frequency
- mongodb的安装 win7 32位机
- win32 滚动条 显示文本
- 字符串匹配的朴素算法和KMP算法
- 序列的划分
- 三分
- 两个函数
- mac终端新建标签/窗口ssh重复输入密码问题
- leetcode Intersection of Two Linked Lists