NLP01-python的wordcloud实现中文词云小例
来源:互联网 发布:mac fontawesome 字体 编辑:程序博客网 时间:2024/05/29 17:10
上图是下面歌词生成的
《When You Are Old》William Butler YeatsWhen you are old and grey and full of sleep,And nodding by the fire, take down this book,And slowly read, and dream of the soft lookYour eyes had once, and of their shadows deep;How many loved your moments of glad grace,And loved your beauty with love false or true,But one man loved the pilgrim soul in you,And loved the sorrows of your changing face;And bending down beside the glowing bars,Murmur, a little sadly, how love fledAnd paced upon the mountains overheadAnd hid his face amid a crowd of stars.
摘要:只是wordcloud的安装与演示测试,可为入门者提供帮助。
1. 安装
构建词云的方法很多, 但是个人觉得python的wordcloud包功能最为强大,可以自定义图片.
官网: https://amueller.github.io/word_cloud/
github: https://github.com/amueller/word_cloud
安装:pip install wordcloud
或 下载:http://www.lfd.uci.edu/~gohlke/pythonlibs/#wordcloud 然后安装。
2. 查看API
API中,WordCloud类是重要类。
class wordcloud.WordCloud(font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=0.9,mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None,background_color='black', max_font_size=None, font_step=1, mode='RGB', relative_scaling=0.5, regexp=None, collocations=True,colormap=None, normalize_plurals=True)font_path : string Font path to the font that will be used (OTF or TTF). Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don’t have this font, you need to adjust this path. [对于win7,这个得修改了,否则会乱码]width : int (default=400) Width of the canvas. 画布宽height : int (default=200) Height of the canvas. 画布高prefer_horizontal : float (default=0.90) The ratio of times to try horizontal fitting as opposed to vertical. If prefer_horizontal < 1, the algorithm will try rotating the word if it doesn’t fit. (There is currently no built-in way to get only vertical words.)mask : nd-array or None (default=None)scale : float (default=1) Scaling between computation and drawing. For large word-cloud images, using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words.min_font_size : int (default=4) Smallest font size to use. Will stop when there is no more room in this size. 最小字号大小font_step : int (default=1) Step size for the font. font_step > 1 might speed up computation but give a worse fit.max_words : number (default=200) The maximum number of words. 显示的最多中词数据上限stopwords : set of strings or None The words that will be eliminated. If None, the build-in STOPWORDS list will be used. 停用词background_color : color value (default=”black”) Background color for the word cloud image. 前景色max_font_size : int or None (default=None) Maximum font size for the largest word. If None, height of the image is used. 词的最大大小;mode : string (default=”RGB”) Transparent background will be generated when mode is “RGBA” and background_color is None. relative_scaling : float (default=.5) Importance of relative word frequencies for font-size. With relative_scaling=0, only word-ranks are considered. With relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only their rank, relative_scaling around .5 often looks good.color_func : callable, default=None Callable with parameters word, font_size, position, orientation, font_path, random_state that returns a PIL color for each word. Overwrites “colormap”. See colormap for specifying a matplotlib colormap instead.regexp : string or None (optional) Regular expression to split the input text into tokens in process_text. If None is specified,r"\w[\w']+" is used.collocations : bool, default=True Whether to include collocations (bigrams) of two words.colormap : string or matplotlib colormap, default=”viridis” Matplotlib colormap to randomly draw colors from for each word. Ignored if “color_func” is specified.normalize_plurals : bool, default=True Whether to remove trailing ‘s’ from words. If True and a word appears with and without a trailing ‘s’, the one with trailing ‘s’ is removed and its counts are added to the version without trailing ‘s’ – unless the word ends with ‘ss’.
3.图片
图片名为:mask_png.png
4.测试中文文档
题目:脚抽筋怎么办
网址:http://health.china.com/html/jiankang/jijiuzhinan/richangjijiu/201603/26-328450.html
5.代码
# -*- coding: utf-8 -*-from os import pathimport jiebaimport matplotlib.pyplot as pltfrom scipy.misc import imreadfrom wordcloud import WordClouddef doWordcloud(): comment_text = open('test.txt', 'r', encoding='UTF-8').read() cut_text = " ".join(jieba.cut(comment_text)) color_mask = imread("mask_png.png") cloud = WordCloud( # 设置字体,不指定就会出现乱码; # 在win7的路径:C:\Windows\Fonts进行查看 font_path="simsun.ttc", mask=color_mask, max_words=200, max_font_size=80, width=1000, height=1000 ) word_cloud = cloud.generate(cut_text) # 产生词云 # word_cloud.to_file("pic.jpg") # 保存图片 plt.imshow(word_cloud) plt.axis('off') plt.show()
说明:test.txt内容是《脚抽筋怎么办》的文章内容;
mask_png.png是上面那个小女孩的图片;
6.显示结果
【作者:happyprince ;http://blog.csdn.net/ld326/article/details/78341147】
阅读全文
1 0
- NLP01-python的wordcloud实现中文词云小例
- Python实现中文词云(wordcloud),根据背景图片生成词云
- Python wordcloud之中文词云
- Python NLPIR2016 与 wordcloud 结合生成中文词云
- wordcloud用来制作中文词云
- python中文词云生成
- 【python入门】制作一个自定义的中文词云
- 用Python做简易的中文词云
- python wordcloud的使用
- python的wordcloud使用
- 数据库多对多 一对多 建表 sublimeText 输出不全 wordcloud 中文词云为乱码
- 使用python的wordcloud包实现中文标签云制作
- 基于word2vec的中文词向量训练
- python wordcloud
- python + wordcloud实现任意形状标签云
- 微信聊天机器人、Python、中文词云
- 利用Python将已有TXT文档生成中文词云
- python的一个好玩模块wordcloud
- 2.6下用request_irq引起的问题
- vue的使用
- 常用网站集锦
- 2017.10.25 模拟考试
- C语言字符串与数字之间的相互转换
- NLP01-python的wordcloud实现中文词云小例
- 如何成为优秀的程序员---转载
- 2017年10月25笔记
- Unicode其实是Latin1的扩展。只有一个低字节的Uncode字符其实就是Latin1字符——附各种字符编码表及转换表
- sysu-17B01签到
- HMM学习
- 永恒之蓝与Oacle RAC:Oracle不支持在心跳交换机上关闭任何端口
- react_basic(6)
- Python 操作MySQL数据库