pk.org: Computer Security/Lecture Notes

CAPTCHA - Study Guide

Paul Krzyzanowski – 2025-10-14

CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart.
It was designed to identify whether a user is human or an automated program.

It is not a method of authentication but a safeguard to keep automated software from abusing online systems—flooding comment sections, registering fake accounts, spamming, or scraping data at large scale.

The technique takes advantage of human perception: people recognize patterns and context better than early computer vision systems. CAPTCHAs use this difference by presenting visual or auditory tasks that humans can solve easily but machines initially could not.

Early Development

The first CAPTCHA appeared in the late 1990s when AltaVista added a text distortion test to stop automated URL submissions that were skewing search results.

In 2000, researchers at Carnegie Mellon University introduced the term CAPTCHA and created an early version, called EZ-Gimpy, that displayed a distorted English word. Around the same time, other groups developed similar systems that generated random text strings with background noise to confuse optical character recognition software.

These early CAPTCHAs worked because humans could still identify characters despite distortion, clutter, and missing information, while algorithms could not.

Why CAPTCHA Is Still Used

Even with improved security models, websites still need quick ways to detect automation. CAPTCHAs help maintain the integrity and usability of online systems by:

While CAPTCHAs no longer stop every attack, they remain effective at filtering out basic automation.

Problems and Limitations

Over time, the weaknesses of CAPTCHA became apparent.

The result is an arms race: stronger CAPTCHAs frustrate humans more but still fail against advanced bots.

reCAPTCHA and Its Evolution

In 2007, a group at CMU introduced reCAPTCHA, which made human effort useful by pairing two words: one known (for validation) and one unknown (from scanned books). Correct responses helped digitize historical texts. After Google acquired the system in 2009, it was used to transcribe newspapers and identify house numbers in Street View imagery.

After Google acquired reCAPTCHA in 2009, it expanded to include identifying street numbers and other image-labeling tasks. By 2014, however, Google’s own neural networks could solve reCAPTCHA challenges with near-perfect accuracy, forcing a redesign.

Google introduced NoCAPTCHA reCAPTCHA (v2) in 2014, replacing puzzles with a simple checkbox labeled “I’m not a robot.” The system analyzed browser cookies, IP addresses, mouse movements, and click timing to determine if the user behaved like a human. If the result was uncertain, it presented one of several image-based fallback challenges:

  1. Classification (static grid): Identify which images in a 3×3 grid match a description, such as “select all images with bridges.”

  2. Classification (dynamic replacement): Like the first, but images are replaced after each click, continuing until none remain.

  3. Segmentation: Break an image into a 4×4 grid and select all parts relevant to the prompt, such as identifying all parts of a motorcycle.

These puzzles used real-world images from Google’s databases to improve object-labeling accuracy while providing a more challenging test for bots.

Invisible reCAPTCHA (v3) removed user interaction entirely. It evaluates background behavior and generates a trust score between 0 and 1 that websites use to decide whether to grant access or fall back to an interactive CAPTCHA. This approach reduced friction but raised privacy concerns about extensive tracking.

The AI Threat

By the 2020s, advances in AI have nearly eliminated most of the distinctions between human and automated behavior.

In 2024, researchers at ETH Zürich used public AI software to defeat Google’s reCAPTCHA v2 with 100% accuracy by combining object-recognition models with simulated mouse motion.

A year later, OpenAI’s ChatGPT Agent was observed passing Cloudflare’s “Verify you are human” test through ordinary click and motion patterns, not puzzle solving. Modern AI can convincingly mimic human behavior, erasing the distinction that CAPTCHAs rely on.

New Directions and Abuses

Researchers have explored new ways to tell humans from machines.

Other alternatives, such as email or SMS verification and timing analysis (tracking how long users take to complete forms), offer mild protection but can still be automated once their logic is known.

The Orb and Biometric Verification

As AI becomes indistinguishable from human users, some systems now look to physical identity instead of perception.

Tools for Humanity, a company associated with the Worldcoin project, introduced the Orb and Orb Mini: devices that scan the iris to create a unique cryptographic proof of personhood stored on the blockchain. This concept moves from a perception-based test to a cryptographic guarantee that a real human is behind an interaction.

The Future of Human Verification

CAPTCHA worked by finding something humans did better than computers. That distinction is disappearing.
Future verification will depend on:

The challenge has shifted from proving “I am human” to proving “I am a trustworthy participant.”