MTurk Really Adds Up

Sorry for the lack of updates, nonexistent audience, moving cross-country put a damper on my ability to update my blog with any frequency.

Today’s issue: MTurk is kind of expensive.

About a month ago, a friend and I decided to hack together some python to scrape Twitter for tweets containing certain keywords and use Bayesian classification to determine whether or not the tweet fit into certain (VERY SECRET) categories. Everything worked out until we wanted to actually tag up a set of tweets to use as a training set.


RT @CaliforniaBelle: Never underestimate the power of prayer and remember to trust God's timing. #LifeLessons 

Interesting? 3 - Yes, 2 - Maybe, 1 - No: 1
RT @iiDepressed: What do you want for christmass? A gun, boxes of pills, loads of alcohol, razor blades and a rope please. 

Interesting? 3 - Yes, 2 - Maybe, 1 - No: 1
@tranceforge emmm sbnrny waktu aku dtg disanapun sdh punya logic flownya..tetapi knp sptny smuany lbh ke ... http://t.co/BuQOCdBk 

Interesting? 3 - Yes, 2 - Maybe, 1 - No: 1
RT @Orlandomendez7: #Escogido hoy Guzmán LF, Lugo 2B, Kieschnick RF, Gómez DH, Green 1B, Mesa CF, Castillo C, Tatis 3B, Florimón SS. Cab ... 

Interesting? 3 - Yes, 2 - Maybe, 1 - No: 1
Ever Clean Multiple Cat Premium Clumping Cat Litter | Litter Boxes For Cats http://t.co/YX0V9ZUG 

Interesting? 3 - Yes, 2 - Maybe, 1 - No: 1

Hand-classifying tweets is really, really boring. Faced with the prospect of having to manually tag tweets for the next five hours, I decided to finally check out Amazon’s Mechanical Turk. It’s a really nice service that’s exactly what I wanted and very easy to use. There’s only one problem:

MTurk Total

Two cents doesn’t seem like much, but, when the task involved can take a second to complete and there are tens of thousands of them, it really starts to add up. So much so that doing all of the tagging myself is much more cost-effective than paying MTurk and doing real work with the time it would save me.

Posted on 14 Oct 2012