STiki Help Document (v2.1)

STiki is a utility for the detection and reversion of vandalism on Wikipedia. STiki uses state-of-the-art detection methods to determine which edits should be shown to end users. If a displayed edit is vandalism, STiki then streamlines the reversion and warning process. Critically, STiki is a collaborative approach to reverting vandalism, not a user-centric one -- the list of edits to be inspected is consumed in a crowd-sourced fashion. Architecturally, STiki consists of two high-level components:
  • Backend processing that watches all changes to Wikipedia and calculates/fetches the probability that each is vandalism. Multiple techniques are applied to each edit -- some developed by STiki's authors -- and others developed by third-parties. These scores are used to enqueue edits, such that those most likely to be vandalism are shown to end-users first. Backend work is done on STiki's servers and likely is not of interest to casual users.

  • A frontend GUI that presents likely-vandalism (found on the backend) to human users for classification. The interface is designed to enable the quick review and reversion of poor edits. Moreover, the classification process establishes a feedback loop to improve detection algorithms.

  • This document concerns itself primarily with the user-facing GUI. Once installed, STiki requires only that: (1) an Internet connection is maintained (with outbound port 3306 [MySQL] open), and that (2) a user has a Wikipedia account with sufficient permissions, having either: (a) The
    rollback permission, (b) greater than 1000 edits to Wikipedia's article namespace, or (c) explicit access as can be request at STiki's talk page. This document proceeds by discussing the STiki components in greater depth.

    Summarily, the backend engine populates a number of edit queues -- one for each scoring technique -- determining which edits a human will classify. In particular, the metadata scoring technique and anti-link-spam technique, both developed by STiki's authors, are discussed in some detail. Having selected a queue to use, an end-user may use revision filters to further constrain the edits displayed.

    Edits are ultimately presented in the diff browser and their metadata in the edit properties panel. After inspection, a classification is submitted via the classification buttons. A login panel and reversion comment panel customize how STiki interacts with Wikipedia, and the last revert panel summarizes completed actions. Some discussion on the expected performance of STiki is also provided.

    Also included in this distribution is the Offline Review Tool (ORT). The ORT takes a user provided set of edits, presents them in a reduced-functionality setting (no changes to Wikipedia are made), and writes binary classifications. The ORT should prove useful for corpus-building researchers.



    Revision Queues:

    The "Queue" menu allows a user to select which scoring system is the basis for the edits that will be displayed. Here we briefly describe the different queues available (some of which may be disabled for various reasons):
  • Cluebot-NG: Cluebot-NG (CBNG) is a bot which operates on Wikipedia, using an Artificial Neural Network (ANN) to score edits. The worst scoring edits are undone automatically. However. there are many edits that CBNG is quite confident are vandalism, but cannot revert due to a low false-positive tolerance, the one-revert-rule, or other constraints. These edits form the queue used by STiki. CBNG publishes scores to an IRC feed on which the STiki backend listens. Due to expected performance, this is currently the default queue.

  • STiki (metadata): This scoring system uses the metadata associated with an edit to arrive at vandalism predictions. Metadata includes properties such as the time-of-day an edit was made, the geographical location from which it originated, and the page on which it was made. More detail about this technique is available in the STiki Scoring System section.

  • WikiTrust: Built upon editor reputations calculated from content-persistence is the WikiTrust system of Adler et al.. More details are available at their website. WikiTrust calculates scores on their own servers, which are accessible via a publically available API.

  • Anti-link-spam: This queue targets a subset of the vandalism problem, external link spam. When selected, this queue makes minor changes to the functionality described in this document: (1) the "vandalism" classification button becomes a "spam" one, (2) the default revision comment is altered slightly, and (3) anti-spam warnings are issued on user talk pages instead of generalized "vandalism" ones. More details about the technique can be found in the Anti-link-spam Scoring section.

  • Meta (combination): When implemented, this queue will act as a higher-order meta-classifier. That is, it will combine the logic of all other scoring queues in an attempt to produce an even more accurate system.
  • An additional option in the queue menu allows a user to view "recent usage stats". The popped dialog will show usage statistics for the prior hour. Large amounts of STiki use in a short period of time may "exhaust" queues and temporarily decrease vandalism hit-rates. A user noticing a high density of recent classifications may wish to return to the tool at a later time or use one of the less popular queues.

    A final option in the queue menu allows one to "generate leaderboard". This locally generates a slightly simplified version of the public STiki leaderboard. It should prove useful for those looking to examine their intra-daily stats and relative performance (whereas the public version only updates once daily).

    ^ Return to Top ^



    STiki Scoring System:

    Here a particular scoring system is highlighted, one focused on metadata processing. Why this one?. First, it was the original queue developed for the GUI. As a result, it shares a code-base (and name) with the STiki frontend. Second, the name "STiki" is derived from the metadata technique (Spatio Temporal Analysis over Wikipedia) -- though this acronymic meaning is downplayed when discussing the frontend. Lastly, the same people who wrote this document (and all the STiki code) are vain and promoting their own system.

    The STiki queue ("metadata queue", hereafter) examines only 4 fields of an edit when scoring: (1) timestamp, (2) editor, (3) article, and (4) revision comment. These are used to calculate features pertaining to the editors registration status, edit time-of-day, edit day-of-week, geographical origin, page history, category memberships, revision comment length, etc.. These signals are given to a typical machine learning classifier (ADTree) to arrive at edit scores. A rigorous discussion of technique can be found in:
    West, A.G., Kannan, S., and Lee, I (2010). Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata. In EUROSEC '10: Proceedings of the Third European Workshop on System Security. Paris, France. (A preliminary version was published as UPENN-MS-CIS-10-05).

    That paper was an academic attempt to show that language properties were not necessary to detect Wikipedia vandalism. It succeeded in this regard, but since then system has been relaxed for general-purpose use. For example, metadata processing now includes some simple NLP features. Moreover, there was the decision to integrate other
    revision queues in the GUI frontend.

    A public-facing API has been been made available to the STiki/metadata scores. More can be learned about the software at STiki's website (including other citations appropriate for academic use).

    ^ Return to Top ^



    Anti-link-spam Scoring System:

    Another scoring system targets the subset of vandalism referred to as external link spam -- hyperlinks pointing off-wiki which are inappropriate in nature. Intuitively, these additions are interesting because the editor that adds them presumably stands to profit (monetarily or otherwise), making him/her a more motivated attacker than the immature ones seen in common vandalism.

    The scoring system operates by processing every external link addition to en.wiki. Features are derived from: (1) Wikipedia metadata: Similar to those calculated by the metadata queue. (2) Landing site processing: Obtaining the source of the URL added and processing it for commercial intention, SEO strategies, etc.. (3) Third-party information: Alexa and the Google Safe-Browsing project are queried to gleam additional information. The resulting feature set is handed off to a machine-learning algorithm (ADTree) to produce the spam probabilities/scores used to enqueue edits for GUI users. A more rigorous discussion of the technique can be found in:
    West, A.G., Agarwal, A., Baker, P., Exline, B., and Lee, I. (2011). Autonomous Detection of Link Spam in Purely Collaborative Environments. In WikiSym '11: Proceedings of the Seventh International Symposium on Wikis and Open Collaboration, Mountain View, CA, USA.

    The "anti-link-spam" queue is described here because it was written by STiki's authors. The source-code is separate from the core-STiki engine, but also available at the STiki website.

    ^ Return to Top ^



    Revision Filters:

    The various revision queues contain all edits made on Wikipedia, sorted by vandalism probability. If a STiki user wants to further constrain the edits they review, then local revision filters should be applied. These are accessible via the STiki menu-bar. The following filters are currently implemented:
  • Small Numerical Edits: When enabled, "minor numerical" edits will be shown. These are defined to be edits where only a single numerical value is changed, and the magnitude of that change is (0.5 < change < 2.0). Disabling this filter will prevent such edits from being shown, which some users find monotonous and difficult to casually assess.

  • Edits by Privileged Users: When enabled, edits made by users with "sysop", "rollback", or "review" permissions will be shown. Disabling this option will cause the GUI to only show edits made by less privileged users.

  • Namespace-Zero (NS0): When enabled, this filter permits only edits that reside in NS0 -- which constitutes the encyclopedic content. Other namespaces are reserved for article "talk" pages, user pages, etc.

  • Anonymous User Edits: When enabled, this filter permits edits made by anonymous (IP) editors. The bulk of Wikipedia vandalism is caused by editors of this kind, and the presence of an IP address may enable additional analysis for some queues.

  • Registered User Edits: When enabled, this filter permits edits made by registered (named) editors to be displayed.

  • Only Most Recent on Page: When enabled, this filter permits only edits which are the most-recent edit on the article they were made (or part of a rollback chain ending at the most recent edit). Assume edit x is under inspection. If x is vandalism, and x+1 exists, the latter might have already undone the vandalism. Further, reverting x may also undo constructive content added by revision x+n. Enabling this filter simplifies the diff-presentation, and is required in the current version.

  • Currently, a number of filters cannot be disabled. These are indicative of simplifications that will be relaxed in future versions. Further, a language is being developed which will allow user's to author their own filters.

    ^ Return to Top ^



    Diff Browser:

    If edit x has been popped off the selected revision queue and cleared all enabled revision filters, the 'diff' between edit x and the previous edit on the same article (edit x-1, hereafter) is rendered in the diff browser. It may be the case that x-1 is written by the same author as x, in which case we have a "rollback" situation. In this case the diff will summarizes all changes by a single user and any revert action will undo the same breadth of changes (thus we diff x against some x-n).

    At the top-center of the diff browser is the name of the article on which x and x-n reside. If n>1 (a "rollback" situation rather than simple revert), this fact is noted in purple font just beneath the article title. Then, the diff is visually displayed in a vertically divide, yet aligned manner. The left-hand side shows relevant portions of x-n, and the right hand-side shows the corresponding portions of x.

    The presentation is intuitive. If content exists on the left-side but the corresponding right-side is empty, this is indicative of a text block which edit x deleted. Vice versa, a populated right-side with an empty left-side shows a text block which edit x added. When text is modified or fine-granularity changes are made, there is content on both the left and right-hand sides and the changed text is bold-faced red.

    Otherwise, the diff browser is unremarkable. The diff is purely text-based -- image/table/figure content is not rendered and will instead show only the `code-behind' (i.e., the markup language used to add the special content). An option in the "options" menu permits external hyperlinks/references (i.e., any text of the form "https://...") to be activated (made click-able). These links will open in the user's default web browser. While this may be helpful in detecting spamming and other undesirable linking behaviors, do note that external pages may contain offensive, explicit, or malicious content.

    ^ Return to Top ^



    Edit Properties (Metadata) Panel:

    The edit properties panel displays metadata information about the edit currently being displayed in the diff browser. The panel is straightforward, with no settings that can be manipulated by an end-user. It does, however, provide links to the page/diff/user-pages on Wikipedia. These should be used in borderline cases where more investigation is warranted.

    One functionality of note is the display of user permissions. The "permissions", "rights", or "groups" of the editor under inspection are output parenthetically next to the user name. Only relevant permissions are displayed, and these include: "autoconf" (autoconfirmed), "conf" (confirmed), "review" (reviewer), "rb" (rollbacker), and "admin" (sysop). If a registered user has none of these, then "no permissions" will be displayed. No information is displayed about IP editors (as they can have no such permissions).

    ^ Return to Top ^



    Classification Panel:

    Assuming an edit-diff (say, x against x-n) has been rendered into the diff browser, it is the responsibility of the user to determine the worth of revision x (or the entire user chain) and press the appropriately labeled classification button:
  • Vandalism/Spam: If clicked, the currently displayed edit(s) will be reverted/un-done on Wikipedia. This action constitutes an actual change to live Wikipedia content and should not be abused. Further, the classification is sent to the STiki backend to improve scoring systems. If an edit occurs on the edit-page after the edit was displayed but before the button is pressed, then no change will be made to Wikipedia. After the button is pressed, the last revert panel will display whether the revert actually took place (as well as any warnings issued). The 4im button functions identically to the adjacent "vandalism/spam" one, but will hasten the warning hierarchy to issue a "final warning" (or AIV report), assuming that warning criteria are met.

  • Good-faith Revert: This is reserved for cases where an edit is clearly unconstructive, but does not constitute vandalism/spam. It is a "nice way" to undo damage from clueless or well-intending authors. The edit will be reverted/rolled-back on Wikipedia, but no warning will be issued.

  • Pass: In cases where the user is unsure of an edit's merit, this option should be chosen. The edit x is allowed to persist on Wikipedia, and no feedback is sent to the STiki backend. In particular, the edit in question may be presented/assigned to another STiki client. Further, edit x will never again be shown to the current user.

  • Innocent: Edits not displaying vandalism/spam or other issues are termed 'innocent.' If clicked, edit x is allowed to persist on Wikipedia, and feedback is sent to the STiki backend to improve future scoring attempts.

  • When any classification button is pressed, the revision queue will be popped, the revision filters checked, and a new diff will be rendered in the diff browser. Thus, after initially tweaking the settings, a user should interact primarily with the classification panel.

    To this end, hotkeys simplify user-interaction with the GUI. Simple presses of the i, p, g, or v/s key perform classification (innocent - pass - good faith - vandalism/spam), assuming the button panel has focus. Focus is obtained by performing a single mnemonic (ALT+[key]) or mouse-driven classification. Interaction with a non-classification GUI component will cause the panel to lose focus.

    Finally, the panel has a back button that allows users to revisit the previous edit displayed. Certain criteria disable the back button: (1) "Back" cannot be used after a reversion, and (2) A user may only go "back" a single edit. After a prior edit is re-classified, the diff-browser will re-show the edit displayed when "back" was pressed.

    ^ Return to Top ^



    Login Panel:

    As discussed in the classification button section, pressing the "vandalism/revert" button initiates a Wikipedia reversion. The login panel allows the STiki-user to control the Wikipedia-user to which this reversion (an edit) maps.

    A STiki user may choose to login or edit anonymously (Note that anonymous login may be disabled, in order to prevent abuse). Log-in credentials are those provided by Wikipedia. It is suggested that a user login so his/her contributions may be persistently tracked. Login credentials are never stored by STiki; they are passed directly to the Wikipedia API. Credentials must be obtained directly from Wikipedia.

    The login panel also provides several watchlisting options. These should be self explanatory.

    ^ Return to Top ^



    Reversion Comment Panel:

    As discussed in the classification button section, pressing the "vandalism/spam/good-faith" revert buttons initiates a Wikipedia reversion. The login panel describes how the user associated with this reversion is set. Similarly, the reversion comment panel allows one to set the revision comment that is left with the reversion and if the offending user should be 'warned.'

    Use of the panel is intuitive. Further, 'placeholders' allow more personalized comments to be left without the need to re-author the comment prior to each reversion. Currently implemented placeholders are:
  • #u# - User (guilty) who made the edit being reverted.
  • #p# - User whose previous version is being reverted back to.
  • #a# - Article on which the offending edit was placed.
  • #q# - Quantity of edits that will be reverted.
  • #s# - Plurality switch. Returns "s" if #q# > 1; "" otherwise.
  • #t# - Time (in seconds) which offending edit survived.

  • Additionally, this panel provides the opportunity to warn users. A warning is a textual message which discourages users from continuing their disruptive behavior. Warnings may only be left with "vandalism/spam" grade classifications. Warning are made according to the following criteria:
  • IP editors will only be warned if the vandalism is relatively recent (on the order of hours). Registered users will be warned regardless.
  • A warning will not be placed if the offending-editor is blocked.
  • A warning is only placed if the reversion/rollback succeeds.
  • The warning left will be per template {{subst:uw-vandalism#}} for vandalism or {{subst:uw-spam#}} for spam, where # is the severity of the warning (1 to 4). A best attempt is made to determine what existing, temporally-relevant warnings exist on the page and elevate the severity by one level.
  • If a 4th-level vandalism/spam warning has been issued and ignored -- a warning will not be issued, but instead a post to AIV (Administrative Intervention against Vandalism) made, likely resulting in the offending user being blocked.
  • An AIV post will not be made if the offending edit was made prior to the time when the user received their final associated warning (i.e., a level-4 warning).
  • If a non-vandalism 4im warning has recently been issued to the offending user, STiki will report the user to AIV. The posting will note the unusual nature of this request.

  • The warning which is ultimately issued (if any), is output to the last revert panel, along with any necessary justification.

    ^ Return to Top ^



    Last Revert Panel:

    After a reversion has been attempted (via the "vandalism/spam" or "good-faith revert" classification buttons), the last revert panel displays the outcome of that reversion, as well as any warnings that were issued (per the option in the reversion comment panel).

    The information the panel displays is intuitive. However, it is important to note the panel sometimes lags behind other components. Reverting an edit on Wikipedia is a non-trivial process which sometimes requires several seconds. So as not to delay the STiki user-experience -- the GUI proceeds without waiting for the reversion to complete, allowing it to execute in the background. The last revert panel is eventually updated when the revert finishes.

    Finally, the last revert panel provides links so that the contributions of guilty users may be examined. This may prove especially useful if you notice that a user has received a high-level warning or been reported to AIV -- in which case his/her prior edits are likely to contain additional instances of vandalism/spam (if not already reverted).

    ^ Return to Top ^



    STiki Performance:

    We now briefly contextualize STiki relative to other spam fighting efforts and estimate the expected vandalism hit-rate (i.e., the percentage of time an edit shown on the STiki frontend is actually vandalism):
  • Other Vandalism-Fighting Tools: There are many tools in both literature and active use on Wikipedia that aim to curb vandalism. STiki does not intend to compete against these alternative systems, but complement them. STiki's novelty lies in the fact it is an intelligent routing tool, directing humans to those edits needing inspection. In some ways, STiki is similar to the vandalism reversion GUI, Huggle. However, Huggle uses simple static rules to prioritize edits, whereas STiki uses dynamic machine-learning strategies. Further, STiki is written in Java and is platform independent.

  • Expected Performance: The performance of STiki is ultimately dependent on the revision queue being utilized. In some cases, offline/academic studies have measured the performance of these systems (for example, the STiki scoring system could score around a 50% hit-rate). In practice, expected performance is much harder to quantify when running in a live fashion. Competing (often autonomous) tools cut performance rates dramatically, since STiki users won't see vandalism that has already been reverted. Even so, STiki has proven particularly adept at finding embedded vandalism, that which escapes initial detection.

  • Improving Performance: STiki development is active -- and this means not only changes to the frontend GUI -- but also improvements to backend scoring systems are pending. Every STiki classification helps to refine how (some) scoring systems are trained. Further, backend modifications can generally occur seamlessly without the need for GUI users to re-install/update.

  • Depth-of-Search: Realize that performance measures (both actual and user-perceived) depend on the extent to which the revision queues are "populated" (i.e., how full they are). While not formally defined, it is intuitive that a large user-base will progress deep into the queues. In such a circumstance, popped revisions will have a lower probability of being vandalism than if few users were occasionally using the tool. With greater depth-of-search comes decreased detection-rates.

  • Suggestions for improving STiki performance are welcomed. Feel free to contact the STiki-author(s), whose contact data is available via the "About STiki" menu, or post discussion to
    STiki's talk page.

    ^ Return to Top ^



    Offline Review Tool (ORT):

    The STiki Offline Review Tool (ORT) is a reduced-functionality version of STiki that operates over user-provided edits. These edits need not pass any filters to be displayed. To this end, reversion and other "live" functionality is disabled. Binary classifications can be made (per whatever property), and written to an output file. The ORT is accessible via the "File -> Launch ORT" menu of STiki. We now provide specifics about the tool:
  • Input File: When ORT is launched, a file selection dialog is displayed. The input file should be a text file containing solely one Wikipedia revision-ID (RID) per-line. Input is un-checked, and behavior is undefined for input not meeting this specification.

  • ORT Operation: Operation occurs per the classification buttons; mark and un-mark, or their hot-keys (as bolded). This button ambiguity is intentional, allowing users to note whatever edit properties are of interest. Each classification advances the diff-browser to the next edit, until the input-file/queue has been exhausted. Due to threading considerations, presentation order (and thus, output order) may not match that of the input file. RIDs which do not exist or have been deleted from Wikipedia will be silently skipped.

  • Output File: Classifications are written to a CSV output file, with lines of the form "RID,CLASS" (CLASS=0 for 'un-marked' classifications, and CLASS=1 for 'marked' ones). The output file will have the same name and location as the input file, but the extension will be changed (or added) to *.marked.

  • Command-line Access: ORT does not need to be launched from within STiki. Command-line access is also made available, per [offline_review_driver.java] in the [executables] package. One argument is required; the input file.

  • STiki-ORT can be used however one sees fit. However, it was designed to simplify corpus-building for Wiki researchers. For those looking to build upon the STiki framework, ORT may be a good place to start -- its complexity is far less than the full-blown STiki tool.

    ^ Return to Top ^