利用rwda包抓取微博评论并分析

来源:互联网 发布:java热血征途2008安卓 编辑:程序博客网 时间:2024/06/06 01:38
工具:Rstudio,R(3.3.4)
install.packages("devtools")library(devtools)devtools::install_github("sjhfx/rwda")library(RCurl)library(bitops)library(rjson)library(jiebaR)library(jiebaRD)library(wordcloud2)library(vioplot)library(sm)library(curl)library(plyr)library(rwda)access_token<-"your access_token here"#you can get it in http://open.weibo.com/tools/consoleweiboid="/the mid of post"#you can find it by #1, click the time under the weibo user name at the specific post,enter the page for details;#2, click the right mouse button to view the source code;#3, search "mid =", there is a string of numbers behind it,that's what we need#由于微博对个人开发者一次抓取评论数量有限制,大概在2000条(每页50条),对包里的get_comments函数进行改进,使其可以自行设置起始页数与终结页数get_com<-function (access_token, weiboid, spage,epage) {  comments = list()  df_comments = ldply(comments, get_fields, .id = NULL)  page = spage  while (page <= epage) {    url = paste("https://api.weibo.com/2/comments/show.json?id=",                 weiboid, "&page=", page, "&access_token=", access_token,                 sep = "")    comments = fromJSON(getURL(url, .opts = list(ssl.verifypeer = FALSE)))$comments    print(paste("Page", page, ": ", length(comments), "comments"))    df_comments = rbind(df_comments, ldply(comments, get_fields,                                            .id = NULL))    page = page + 1    Sys.sleep(1)  }  return(df_comments)}dat<-get_com(access_token,weiboid,spage=40,epage=80)getwd()write.table(dat,file="C:/Users/andy/Documents/weibo_comments.csv")##subset the rows contains keywordshead(dat)n=grep("\keywords",dat$text)#get the index of rows that contain keywordsdat_key=dat[n,]##create a wordcloud graph weibo_cloud(dat, stopwords = c("心心", "回复","哈哈","/doge"))
参考文章:
  • R微博数据分析
  • 用R爬取微博评论数据
阅读全文
0 0
原创粉丝点击