{"id":877,"title":"Guardians of Global Knowledge: How Automated Helpers Protect Wikipedia","link":"https:\/\/www.reframetech.de\/en\/2017\/11\/08\/waechter-des-weltwissens-wie-automaten-wikipedia-beschuetzen\/","date":"11\/08\/2017","date_unix":1510142580,"date_modified_unix":1641380815,"date_iso":"2017-11-08T12:03:00+00:00","content":"<p><em>Not only is the online encyclopedia Wikipedia an indispensable platform for open knowledge on the Internet, it is also a prime example of how people and algorithms work together hand in hand. After all, numerous bots assist the site\u2019s volunteer editors while also keeping \u201cvandals\u201d at bay.<\/em><\/p>\n<p><!--more--><\/p>\n<p>It\u2019s mid-October, early afternoon in the United States, and many students have just come home from school. That means it\u2019s rush hour for one of the most dedicated Wikipedia contributors: ClueBot NG, who is functioning as a gatekeeper. An unknown user enters the birthday of his 17-year-old girlfriend in the <strong><a href=\"https:\/\/en.wikipedia.org\/w\/index.php?title=April_27&amp;diff=prev&amp;oldid=805799492\" target=\"_blank\" aria-label=\"Opens in a new tab\" >list of historic events that have taken place on April 27<\/a><\/strong>. ClueBot NG deletes the entry and sends the user a warning. Another author using the pseudonym Cct04 posts a full-throated complaint \u2013 \u201cI DON\u2019T KNOW WHAT THIS IS\u201d \u2013 in the middle of an <strong><a href=\"https:\/\/en.wikipedia.org\/w\/index.php?title=HOME_Investment_Partnerships_Program&amp;diff=prev&amp;oldid=805800482\" target=\"_blank\" aria-label=\"Opens in a new tab\" >article about a government program that provides affordable housing<\/a><\/strong>. ClueBot NG deletes this vociferous comment. But this time he does more than just issue a warning. This is the third time this particular author has committed an act of vandalism, so ClueBot NG flags the user as a potential vandal. An administrator will now block Cct07 from contributing to the site.<\/p>\n<p><strong>Untiring fight against vandalism<\/strong><\/p>\n<p>ClueBot NG is not a human author, but a program that runs on Wikimedia Foundation\u2019s servers \u2013 a so-called bot. That means something has long been reality at Wikipedia which is still only being discussed by the rest of the world: automated \u201chelpers\u201d are deciding largely for themselves what humans can and cannot do. Who, for example, is allowed to add to the crowdsourced encyclopedia and who may not? What is appropriate content for an encyclopedia and what is not? ClueBot NG is one of 350 bots approved for use in the English-language version of Wikipedia. One in 10 of the 528 million editorial changes that have been made to the platform was carried out by a bot. There are also numerous semi-automated tools that make it possible for human editors to decide, with just a few clicks, how to deal with multiple entries. Without algorithmic decision-making, Wikipedia would look much different than it does today.<\/p>\n<p>ClueBot NG is one of the site\u2019s most active bots. On this October afternoon, the program will act up to 12 times per minute to fight vandals, warn trolls and thwart bored youngsters.<\/p>\n<p><strong>Spam filter for the world\u2019s knowledge<\/strong><\/p>\n<p>Without ClueBot NG\u2019s ceaseless efforts, human authors would have to remove unwanted entries from the online encyclopedia by hand. Vandalism is defined as any change that is malicious and not intended to contribute to Wikipedia in a serious way. This includes everything from deleting entries and trolling to meticulously plotted manipulations designed to influence a company\u2019s stock price. All too often authors find an article has been augmented with a picture of a penis and anobscene comment. Other additions are not considered vandalism: well-intentioned but poorly executed entries, for example, or when someone raises an issue about an article\u2019s content \u2013 unless the author has already been contacted and continues to ignore the stated rules.<\/p>\n<p>Yet bots can only intervene in the most obvious cases. Since the programs do not understand an article\u2019s context and cannot check the content behind an inserted link, their ability to combat vandalism is limited. A human reader must look and decide if information added to a text is truly appropriate.<\/p>\n<p>At the same time, bots have become largely indispensible. A <strong><a href=\"http:\/\/stuartgeiger.com\/wikisym13-cluebot.pdf\" target=\"_blank\" aria-label=\"Opens in a new tab\" >study by R. Stuart Geiger and Aaron Halfaker<\/a> <\/strong>showed: When ClueBot NG is out of commission, content that is clearly vandalism is still deleted, but it takes twice as long and requires considerable human effort, time Wikipedians would rather spend writing articles.<\/p>\n<p><strong>Even algorithms make mistakes<\/strong><\/p>\n<p>ClueBot NG is guided by an artificial neuronal network. Its self-learning algorithms work the way a spam filter in an e-mail program does. ClueBot NG is constantly fed new data about which changes human Wikipedia authors consider vandalism and which are acceptable as content for the site\u2019s articles. The neuronal network then looks for similarities in the texts it examines, thereby generating rules for filtering content. The program uses those rules to calculate a value for every new Wikipedia entry, signaling the degree to which an edit corresponds to the known patterns of vandalism. Other parameters are also considered, such as a user\u2019s past contributions. If the value exceeds a certain threshold, the bot goes to work.<\/p>\n<p>Naturally, the program is not perfect. Immediately after Cct04 was blocked, ClueBot\u00a0NG also deleted an addition by a new user named Ariana, who noted that the actor Ellie Kemper was voicing a role in the children\u2019s television series <em>Sofia the First.<\/em> It was true \u2013 but it was also something that ClueBot NG could not have known.<\/p>\n<p>All that matters to the computer program is that a new user has made a change which looks like vandalism: It\u2019s brief, contains a lot of names and is not \u201cwikified\u201d \u2013 i.e. does not conform to the conventions used for Wikipedia texts. The bot always leaves a detailed technical explanation and link on the user page which allows authors to report an incorrect deletion. But Ariana does not respond and does not add anything to the site over the next few days. It seems that Wikipedia has lost a new contributor.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-935\" src=\"https:\/\/www.reframetech.de\/en\/wp-content\/uploads\/sites\/23\/2017\/11\/Wikipedia-English.png\" alt=\"\" width=\"1024\" height=\"512\" srcset=\"https:\/\/www.reframetech.de\/wp-content\/uploads\/sites\/23\/2017\/11\/Wikipedia-English.png 1024w, https:\/\/www.reframetech.de\/wp-content\/uploads\/sites\/23\/2017\/11\/Wikipedia-English-768x384.png 768w, https:\/\/www.reframetech.de\/wp-content\/uploads\/sites\/23\/2017\/11\/Wikipedia-English-600x300.png 600w, https:\/\/www.reframetech.de\/wp-content\/uploads\/sites\/23\/2017\/11\/Wikipedia-English-780x390.png 780w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<p><em>Wikipedia is exposed to regular vandalism attacs. It isn&#8217;t easy to distinguish the well-intentioned but inexperienced users from those, that consciously ignore rules or even aim to spread disinformation.<\/em><\/p>\n<p><strong>One of the open Internet\u2019s main pillars<\/strong><\/p>\n<p>\u201cIn 2003, the English-language Wikipedia community was still of the opinion that it would be better not to use bots. Today, however, bots are an everyday tool at Wikipedia,\u201d says Claudia M\u00fcller-Birn, a professor at Freie Universit\u00e4t Berlin who researches human-computer collaboration and has carried out several <strong><a href=\"https:\/\/www.clmb.de\/publications.html\" target=\"_blank\" aria-label=\"Opens in a new tab\" >studies on algorithms used by Wikipedia<\/a><\/strong>.<\/p>\n<p>Since Wikipedia\u2019s anarchic beginnings in 2001, the task of putting together a reliable encyclopedia has become increasingly complicated. On the one hand, the number of articles has grown, with the English version containing 5.5 million entries as of October 2017 and the German version 2.1 million. That means routine activities like fixing incorrect links or sorting lists can no longer be done by hand. On the other, the required tasks have become more complex. It is not so easy, for example, to add an accurate reference list to an article, and even creating info boxes requires a thorough introduction to Wikipedia\u2019s guidelines.<\/p>\n<p>As it has grown, Wikipedia has gained visibility. For instance, Google now regularly includes Wikipedia content in its search results, which means incorrect information \u2013 the news that someone who is still alive has died, for example \u2013 can spread within minutes if it is not immediately deleted from the site. This is where Wikipedia\u2019s good reputation collides with its open design.<\/p>\n<p><strong>Not just one more social network<\/strong><\/p>\n<p>In addition, the Wikipedia community has to deploy its contributors wisely: While there were more than 50,000 Wikipedia authors at work on the English site in 2007, today there are only about 30,000. Helpers such as bots are not only meant to reduce the authors\u2019 workload, they can also prevent human contributors from becoming frustrated. After all, anyone who has spent hours researching their hometown or a distant galaxy in order to create an encyclopedia-worthy article does not want the information they have provided ruined by nonsensical additions.<\/p>\n<iframe id=\"datawrapper-chart-HdJp9\" src=\"\/\/datawrapper.dwcdn.net\/HdJp9\/1\/\" scrolling=\"no\" frameborder=\"0\" allowtransparency=\"true\" allowfullscreen=\"allowfullscreen\" webkitallowfullscreen=\"webkitallowfullscreen\" mozallowfullscreen=\"mozallowfullscreen\" oallowfullscreen=\"oallowfullscreen\" msallowfullscreen=\"msallowfullscreen\" style=\"width: 0; min-width: 100% !important;\" height=\"586\"><\/iframe><script type=\"text\/javascript\">if(\"undefined\"==typeof window.datawrapper)window.datawrapper={};window.datawrapper[\"HdJp9\"]={},window.datawrapper[\"HdJp9\"].embedDeltas={\"100\":721,\"200\":667,\"300\":640,\"400\":613,\"500\":613,\"600\":586,\"700\":586,\"800\":586,\"900\":586,\"1000\":586},window.datawrapper[\"HdJp9\"].iframe=document.getElementById(\"datawrapper-chart-HdJp9\"),window.datawrapper[\"HdJp9\"].iframe.style.height=window.datawrapper[\"HdJp9\"].embedDeltas[Math.min(1e3,Math.max(100*Math.floor(window.datawrapper[\"HdJp9\"].iframe.offsetWidth\/100),100))]+\"px\",window.addEventListener(\"message\",function(a){if(\"undefined\"!=typeof a.data[\"datawrapper-height\"])for(var b in a.data[\"datawrapper-height\"])if(\"HdJp9\"==b)window.datawrapper[\"HdJp9\"].iframe.style.height=a.data[\"datawrapper-height\"][b]+\"px\"});<\/script>\n<p><em>In the early years of\u00a0 Wikipedia it seemed like human manpower was available almost endlessly. But while Wikipedia is continously growing, the number of contributors in stagnating.<\/em><\/p>\n<p>The key point here is that Wikipedia isn\u2019t interested in becoming just one more social platform. The site does not want readers to have to decide for themselves which entries they can trust. Incorrect information doesn\u2019t just fade from sight as it would on a timeline, where entries gradually disappear into the past. Thus, Wikipedians do their best to ensure a consensus exists on what is true and correct.<\/p>\n<p>Wikipedians don\u2019t view the site as a finished encyclopedia\u00a0, but as a \u201cproject for creating an encyclopedia\u201d \u2013 a project that is never supposed to be complete. In order to run a project that is never supposed to end, you need tenacious, untiring workers. ClueBot NG is one of them.<\/p>\n<p><strong><em>Wikipedians have learned to shape the social impact of algorithms in a positive way \u2013 for example by introducing registration procedures and rules for robots. More on that in <a href=\"https:\/\/www.reframetech.de\/en\/2017\/12\/06\/my-colleague-the-robot-how-people-and-automated-assistants-work-together-at-wikipedia\/\">Part 2 <\/a>of our series. Subscribe to our <\/em><\/strong><a href=\"https:\/\/www.reframetech.de\/feed\/\" target=\"_blank\" aria-label=\"Opens in a new tab\" ><strong><em>RSS feed<\/em><\/strong><\/a><strong><em> or <\/em><\/strong><a href=\"https:\/\/www.reframetech.de\/newsletter\/\" target=\"_blank\" aria-label=\"Opens in a new tab\" ><strong><em>e-mail newsletter<\/em><\/strong><\/a> <strong><em>to find out when new posts are added to this blog. <\/em><\/strong><\/p>\n","excerpt":"<p>Not only is the online encyclopedia Wikipedia an indispensable platform for open knowledge on the Internet, it is also a [&hellip;]<\/p>\n","thumbnail":"https:\/\/www.reframetech.de\/wp-content\/uploads\/sites\/23\/2017\/11\/Wikibots1.jpg","thumbnailsquare":"https:\/\/www.reframetech.de\/wp-content\/uploads\/sites\/23\/2017\/11\/Wikibots1-370x370.jpg","authors":[{"id":648,"name":"Torsten Kleinz","link":"https:\/\/www.reframetech.de\/blogger\/torsten-kleinz\/"}],"categories":[{"id":2,"name":"Uncategorized","link":"https:\/\/www.reframetech.de\/en\/category\/uncategorized\/"}],"tags":[{"id":76,"name":"Bots","link":"https:\/\/www.reframetech.de\/en\/tag\/bots-en\/"},{"id":78,"name":"Wikipedia","link":"https:\/\/www.reframetech.de\/en\/tag\/wikipedia-en\/"}]}