I got gamed on Stack Overflow

trollface

Silly me. I stumbled across a question on Stack Overflow that was right up my alley, as it was something I had worked on before: Extract Relevant Tag/Keywords from Text block. The question posted by user593778 indicated that they wanted to use PHP or JavaScript to lift relevant keywords from a block of text.

Awesome. So I started to answer the question, outlining an approach for the poster to use to solve her problem. What I didn’t pay attention to was the fact that the user had a single point on Stack Overflow.

Stack Overflow works on a point system: it taps the lizard brain in most geeks by awarding points for answering questions, posting question, upvoting, etc. You want a high score. You’re awarded badges for hitting different thresholds in the system. Points are the carrot and the stick (you lose points on downvotes) that get people to participate and behave themselves in the community.

Having 1 point means you just logged into the system. You haven’t upvoted any other answers or answered any other questions. You haven’t even completed your user profile. This is a red flag.

Almost immediately the poster started to request more and more detail, basically asking for an implementation. We went back and forth a bit until I finally posted a full code example to the problem, at which point the poster disappeared.

So, I did their work and didn’t even get tagged as “the answer”, which means I received no points. I got played.

Solution

My solution is somewhat naive, but it works (see Gist below). It takes the input text and breaks it into an array of words, which are then filtered against a giant list of stop words.

<?php
function stopWords($text, $stopwords) {

  // Remove line breaks and spaces from stopwords
  $stopwords = array_map(function($x){
  	return trim(strtolower($x));
  }, $stopwords);

  // Replace all non-word chars with comma
  $pattern = '/[0-9\W]/';
  $text = preg_replace($pattern, ',', $text);

  // Create an array from $text
  $text_array = explode(",",$text);

  // remove whitespace and lowercase words in $text
  $text_array = array_map(function($x){
  	return trim(strtolower($x));
  }, $text_array);

  foreach ($text_array as $term) {
    if (!in_array($term, $stopwords)) {
      $keywords[] = $term;
    }
  };

  return array_filter($keywords);
}

$stopwords = file('stop_words.txt');
$text = "Requirements - Working knowledge, on LAMP Environment using Linux, Apache 2, MySQL 5 and PHP 5, - Knowledge of Web 2.0 Standards - Comfortable with JSON - Hands on Experience on working with Frameworks, Zend, OOPs - Cross Browser Javascripting, JQuery etc. - Knowledge of Version Control Software such as sub-version will be preferable.";

print_r(stopWords($text, $stopwords));
?>

This is not something you would probably want to do in real time on a text of any length. You would probably schedule this as a background task.

Comments