ticketmaster stops scalpers wait and restart "captchas" cannot trust results of an online poll, examples of slashdot poll for best grad school in US, the poll was broken. What about free email services? Google, Yahoo, MSN, people wrote programs to get millions of email accounts. A yahoo account only allows 100 messages a day, so we need a million accounts. Solution: CAPTCHA The CAPTCHA stopped spammers CAPTCHA sweatshops, $2.50 hour in other countries. CAPTCHAs are generating jobs in underdeveloped countries. The idea of a CAPTCHA is a test: humans can solve it, humans cannot. There are other ways of doing this besides CAPTCHAs. Examples. russian captcha. humans can do calculus india: humans can analyze circuits american: what is 1+1 = Luis' next project is reCAPTCHA 200 million captchas typed every day on the Internet BUT 10seconds per captcha = 500,000 hours per day wasted Can we use this human effort for good? Human brain is doing amazing things, is there a problem that humans can solve in a 10-second chunk that helps solve a real problem? Google, Amazon, Library of Congress, Internet Archive -- all scanning books. Take a digital photograph of every page in the book, then do OCR: optical character recognition on the digitized image. But OCR isn't perfect, it makes mistakes, especially on older books. 30% mistakes per page on books that are 130+ years old. OCR tells us which words it doesn't know -- we'll use those words as the basis for a CAPTCHA: the computer can't recognize it, we know from the OCR, then we use this word as part of a CAPTCHA. how to use word we don't know answer to as part of CAPTCHA? Solution: use a known word with the unknown word 120,000 sites use reCAPTCHA -- it's free. facebook, twitter, ticketmaster Digitizing 50,000,000 words/day using recpatchas. That would be 20,000 full time employees. It's 4-5 million books year!!! But, what about the random words? juxtaposition of words: bad christian, damn liberal, ... How is this better, typing two words rather than one? Typing two English words takes the same amount of time as 6-8 random characters, so reCAPTCHAs don't "waste time". Where do the words come from? Books written before 1923, when copyright no longer applies. Also archived from the NY Times. The NY Times archive will be done in less than a year. 400,000,000 different people have done a recaptch: 6% of the world. 4chan v. Time, the 4chan crowd made MOOT the winner. But 4chan folks wanted more. They wanted to spell a phrase. Once TIME added a reCAPTCHA, the order started getting messed up. Moot was winning, but the 4chan guys wanted to spell their phrase. But the problem was a reCAPTCHA in the way, how to get around the problem? The 4chan guys read about how reCaptcha works and got a solution. The 4chan guys thought they could fill the database with the word 'penis', but they couldn't get everything to be 'penis', but they did get the marblecake-the-game to be spelled. What about reCAPTCHAs and sweatshops. Spammers are good people, they're helping to digitize books. Luis talked to a spammer. The Spammer sent an email: 'could you help us find a captcha project' Mahiddin: we can type lots of captchas a day, how can we help you? Mahiddin: we can give you a lot of product, we can type lots of words Luis: How do you deal with IP address blocking. The recaptcha folks know if your IP address is typing 1000's of captchas a day. So 'how do you get around the IP address blocking'? Mahiddin: told them how they did it. So, instead of blocking the IP they get: please type the entire paragraph. They stopped typing the paragraphs after a couple of weeks? Will the project expand to other languages: Yes, 40 languages coming in six months. ======= what about wasted cycles/time? Americans spend 1.9seconds/day typing captchas. Average american: 1.1 hours/day on electronic games per day. This is from US Government poll. 9 billion hours of solitaire played in 2003. 7 million human hours to build the empire state building Panama Canal: 20 million human hours, less than a day of people playing solitaire. -- So, how can we help humankind? Get people to help with image-labeling... People play the ESP game: people enter labels: the labels are accurate even when you don't want them to be. If we put these on a gaming site, we could solve the Google image-labeling problem. What is the ESP game? Two-player online game. Object: type the same word. Where do the TABOO words come from? They are words that already matched, so we get a new word. TABOO words also make the game more difficult, thus more fun. Each player on average plays a lot. Some people play more than 20 hours a week. There's now a limit, can't play for more than 10 hours straight from a .edu site. If 5000 people played the game at the same time, we could label all images. There are about 5000 people playing games on YAHOO and MSN. How do we stop porn from coming up on the google image labeler. First look at the website and don't serve images from xxx.com. Attaching the label 'porn' to an image guarantees a human intervention and the image won't get served again in all likelihood. There is LOTS of information in the label, about the person. The person who entered 'britney' was under 40, the person who entered 'hot' was a male. We can figure out your age with 85% probability, and your gender with 95% probability. We need more than human labeling, we need context too. People that play the game feel a strong affinity for their partner. Even when playing with a partner for 2.5 minutes. Why do people like the ESP game? Examples: from blogs -- Luis' new project: language translation computers are getting better at translation, but they don't do it well. We want to get millions of people translating. We want to translate wikipedia into other languages. English, 3 million articles. Spanish 300,000 articles. How do we get 100 million people to help with language translation? Are there enough people who know 2 languages well. Can we get people to translate that only know one language? yes, but slower. Here's how: you know NO german, but you translate from German into English, without knowing GERMAN. How can get incentivize people to do the language translation? Spent a year trying, but couldn't find a game. We're going to teach you German while you help us translate. We'll use the 1-hour a day learning something. There are one billion people today learning a foreign language. We'll get them to translate Wikipedia. Monolingo is the site that will help you learn the language. Monolingo: Learn a language, translate the web. Coming in six months. We'll also get speech recognition better by making people talk the words. Luis' motivation: what are humanity's large scale achievements? Pyramids: 100,000 people laboring. The panama canal: 100,000 people. Apollo moon: 100,000 people. It's hard to coordinate more than 100,000 people. With the Internet we can get millions or 100 millions of people, that's Luis' goal.