MTurk Really Adds Up
Sorry for the lack of updates, nonexistent audience, moving cross-country put a damper on my ability to update my blog with any frequency.
Today’s issue: MTurk is kind of expensive.
About a month ago, a friend and I decided to hack together some python to scrape Twitter for tweets containing certain keywords and use Bayesian classification to determine whether or not the tweet fit into certain (VERY SECRET) categories. Everything worked out until we wanted to actually tag up a set of tweets to use as a training set.
RT @CaliforniaBelle: Never underestimate the power of prayer and remember to trust God's timing. #LifeLessons
Interesting? 3 - Yes, 2 - Maybe, 1 - No: 1
RT @iiDepressed: What do you want for christmass? A gun, boxes of pills, loads of alcohol, razor blades and a rope please.
Interesting? 3 - Yes, 2 - Maybe, 1 - No: 1
@tranceforge emmm sbnrny waktu aku dtg disanapun sdh punya logic flownya..tetapi knp sptny smuany lbh ke ... http://t.co/BuQOCdBk
Interesting? 3 - Yes, 2 - Maybe, 1 - No: 1
RT @Orlandomendez7: #Escogido hoy Guzmán LF, Lugo 2B, Kieschnick RF, Gómez DH, Green 1B, Mesa CF, Castillo C, Tatis 3B, Florimón SS. Cab ...
Interesting? 3 - Yes, 2 - Maybe, 1 - No: 1
Ever Clean Multiple Cat Premium Clumping Cat Litter | Litter Boxes For Cats http://t.co/YX0V9ZUG
Interesting? 3 - Yes, 2 - Maybe, 1 - No: 1
Hand-classifying tweets is really, really boring. Faced with the prospect of having to manually tag tweets for the next five hours, I decided to finally check out Amazon’s Mechanical Turk. It’s a really nice service that’s exactly what I wanted and very easy to use. There’s only one problem:
Two cents doesn’t seem like much, but, when the task involved can take a second to complete and there are tens of thousands of them, it really starts to add up. So much so that doing all of the tagging myself is much more cost-effective than paying MTurk and doing real work with the time it would save me.