大数据文摘_志愿者 翻译

来源:互联网 发布:asp.net sql注入 编辑:程序博客网 时间:2024/05/29 03:38


How a Troll-Spotting Algorithm Learned Its Anti-antisocial Trade

 

Trolls are the scourge of many an Internet site. These are people who deliberately engage in antisocial behavior by posting inflammatory or off topic messages. At best, they are a frustrating annoyance; at the worst they can make people’s lives a misery.

So a way of spotting trolls early in their online careers and preventing their worst excesses would be a valuable tool.

Today, Justin Cheng at Stanford University in California and a few pals say they have created just such a tool by analyzing the behavior of trolls on several well-known websites and creating an algorithm that can accurately spot them after as few as 10 posts. They say their technique should be of high practical importance to the people who maintain online communities.

Cheng and co study three online news communities: the general news site CNN.com, the political news site Breitbart.com, and the computer gaming site IGN.com.

On each of these sites, they have a list of users who have been banned for antisocial behavior, over 10,000 of them in total. They also have all of the messages posted by these users throughout their period of online activity. “Such individuals are clear instances of antisocial users, and constitute ‘ground truth’ in our analyses,” say Cheng and co.

These guys set out to answer three different questions about antisocial users. First, whether they are antisocial throughout their community life or only towards the end. Second, whether the community’s reaction causes their behavior to become worse. And lastly, whether antisocial users can be accurately identified early on.

By comparing the messages posted by users who are ultimately banned against messages posted by users who are never banned, Cheng and co discover some clear differences. One measure they use is the readability of posts, as judged by a metric called the Automated Readability Index.

This clearly shows that users who are later banned tend to write poorer quality posts to start off with. And not only that, the quality of their posts decreases with time.

And while communities initially appear forgiving and are therefore slow to ban antisocial users, they become less tolerant over time. “This results in an increased rate at which [posts from antisocial users] are deleted,” they say.

Interestingly, Cheng and co say that the differences between messages posted by people who are later banned and those who are not is so clear that it is relatively straightforward to spot them using a machine learning algorithm. “In fact, we only need to observe five to 10 user posts before a classifier is able to make a reliable prediction,” they boast.

That could turn out to be useful. Antisocial behavior is an increasingly severe problem that requires significant human input to detect and tackle. This process often means that antisocial users are allowed to operate for much longer than necessary. “Our methods can effectively identify antisocial users early in their community lives and alleviate some of this burden,” say Cheng and co.

Of course, care must be taken with any automated approach. One potential danger is of needlessly banning users who are not antisocial but have been identified as such by the algorithm. This false positive rate needs to be more carefully studied.

Nevertheless, the work of moderators on sites that allow messages could soon be made significantly easier thanks to Cheng and co’s approach.


译文:前面几段


巨魔识别算法如何学习反对反社会贸易



巨魔是许多互联网网站上的祸害,是一些故意从事反社会行为的人在网站上散布谣言或偏离主题的信息,他们轻则增添令人沮丧的烦恼,重则给人的生活带来痛苦。


因此,一种能在其在线生涯中尽早识别巨魔并防止其恶劣暴行的方法将会是一个有价值的工具。


今天,来自加利福尼亚州斯坦福大学的贾斯汀.程和几个好朋友称他们已经创造了这样一种工具,这种工具通过分析巨魔在几个知名网站的行为,创造了一种只需巨魔发帖10次便能将他们准确识别出来的算法。他们说,他们的技术对维护网络社区的人有很大的实际意义。


程和朋友们研究了三在线新闻社区:一般新闻网站CNN.com,政治新闻网站breitbart.com,和电脑游戏网站IGN.com


这些网站都有一份记录着那些因有反社会行为而被禁止的用户名单,名单总数超过10000。网站也记录着名单上的用户在线期间内发布的所有信息。这些人都是反社会的用户实例,并构成我们分析的地面真相,程和朋友们说。


这些家伙将回答关于反社会用户的三个不同问题。第一,他们的发社会行为是针对社区生活还是仅限于发布信息;第二,社区的反应是否会恶化他们的行为;第三,能否在早期准确识别出反社会用户。


 


0 0
原创粉丝点击