top of page

Man or Machine? Will the Winograd Scheme and can it replace the Turing Test?

The Turing test: does it ring a bell?

Well, if it helps, the Turing test has often been called the "reverse CAPTCHA."

CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a security benchmark known as challenge-response authentication and not just a bunch of boring and arbitrary challenges to test your vision. You might have guessed that this is a test to tell machines and humans apart, which may not seem all that pertinent today. Yet, the essence of CAPTCHA is sustained by its ability to defend users from spam and password decryption by having them complete a straightforward test that backs up the fact that you are human and not an intelligent algorithm attempting to crack your way into a password-protected account.


It's a matter of pattern recognition. CAPTCHA deliberately assembles those patterns that are challenging to analyze using algorithms. A human can figure it out. But some bot scanning the screen needs to recognize the convention, or nothing would seem deducible.

But the pressing query is whether tomorrow harbors a potential, even a bleak one-in-a-million case where Artificial intelligence surpasses the CAPTCHA. And A.I. being the "omnipotent superior," gave a clear-cut answer to it by actually doing so.

"Earlier this year, a chatbot called Eugene Goostman' beat" a Turing Test for artificial intelligence as part of a contest organized by a U.K. university," claims

This does not imply that a machine has acquired the capability to "think" but establishes that a piece of software has adept at deluding the opposite person into presuming that they were conversing with a human rather than machinery.

See, here's the thing. The better the CAPTCHA algorithm gets, the more advanced the CAPTCHA beating algorithm is. See, Google builds competing software to familiarize the captcha algorithm. When this software pummels captcha, the CAPTCHA code is tweaked.

This is all fine and dandy, but the "attacker" is bound to overtake the CAPTCHA permanently at some point.

"Let's say that we are talking about mouse movement. Human movement has particular trends." These trends are hard to mimic, but the closer the robots get to mimicking the direction, the less likely a captcha will be able to spot the difference. An exact match would be impossible for the captcha to spot by the rulebook.

Although we remain on the safe side, having the assurance that A.I. cannot think (at least as of now), this revelation poses a threat to the users' cyber security and thereby demands a superior testing procedure. And we may have one in place, almost as if expected that CAPTCHA's Achilles heel would be exploited sooner or later.

The Winograd schema is a test built on questions that we (considering none of us are hyper-intelligent algorithms, that is) would find easy to answer but would pose a severe challenge for a computer.

Although the light shines bright at the end of the tunnel doesn't mean we're already there: the Winograd Schema may be the solution, but only until yet another loophole, just waiting there, is exploited one fine day.

The sizzling field of A.I. has made progress in leaps and bounds. Yet, it is still quite far from achieving "artificial general intelligence" or AGI. The primary justification for this is that we still don't possess a universal connotation of AGI. "There is no such thing as artificial general intelligence because there is no such thing as general intelligence. Human intelligence is very specialized," said Meta's chief AI scientist Yann LeCun.

"Computing Machinery and Intelligence" was a paper by Alan Turing that presented the chaos of testing human-level A.I. in 1950. The father of modern computer science suggested the 'Imitation Game' with two contenders– a human and a computer. A judge must determine which of the two contestants is human and which is a machine. The judge would do this by asking a series of questions to the contestants. The game strived to identify if the computer is a good simulation of humans and is, therefore, intelligent. At the heart of the Turing test is the question: is intelligence the core component of the imitation game? Fundamentally it's a test of whether an A.I. program can fool a human.

American author Gary Marcus said, "the Turing test is not a reliable measure of intelligence because humans are susceptible, and machines can be evasive." In simple terms, the harsh truth is humans are dumb. We fall for all kinds of schemes and maneuvers that a well-programmed A.I. can use to convince us that we're talking to a natural person who can think.

If you think about it, the Turing test still upholds its significance pretty high in today's tech-dominated world. However, the issue of its efficiency still stands and demands a better process for testing artificial intelligence.

A new A.I. contest, sponsored by Nuance Communications and, is offering a U.S. $25,000 prize to an A.I. that can successfully answer Winograd schemas, named after Terry Winograd, a professor of computer science at Stanford University.

Hector Levesque, a computer scientist at the University of Toronto, suggested the Winograd schema challenge in 2011. It was designed with one purpose in mind: to be a better substitute for the Turing test. The test is structured with multiple-choice questions, which are pairs of sentences whose intended meaning can be flipped by changing just one word. They generally involve vague pronouns or possessives.

In the book 'The Myth of Artificial Intelligence, A.I. researcher Erik J Larson said, "linguistic puzzles that humans easily understand are still beyond the comprehension of computers. For example, even single sentence Winograd schemas trip up machines."

Levesque pronounces two crucial criteria expected from the Winograd Schema: simple for humans to solve and shouldn't be Google-hackable. He also explained how the Winograd schema test could be better than a Turing test. "A machine should be able to show us that it is thinking without having to pretend to be somebody," he wrote in his paper.

"Our W.S. challenge does not allow a subject to hide behind a smokescreen of verbal tricks, playfulness, or canned responses." And, unlike the Turing test, which is scored by a panel of human judges, a Winograd schema test's grading is completely non-subjective.

However, in 2022, the test developers published a paper titled, 'The Defeat of the Winograd Schema Challenge, claiming most of the Winograd Schema Challenge has been overcome. Similarly, a 2021 paper, 'WinoGrande: An Adversarial Winograd Schema Challenge at Scale', shows how neural language models have saturated benchmarks like the WSC, with over 90% accuracy. The researchers asked, "Have neural language models successfully acquired commonsense, or are we overestimating the true capabilities of machine commonsense?"

To get computers to recognize images, computer scientists usually use neural networks, which are computer systems containing interconnected units called artificial neurons, trained to solve complex problems. Once again, this process can debatably be perceived as 'intelligence.' For example, when a computer is presented with a math problem, it subconsciously solves it without really comprehending what the symbols mean.

When intelligence itself turns into a question of whether it simply is a technological system that can be decided by complexity, the topic of 'tests' to determine human-level A.I. in this manner may be questionable in various aspects, although we may possess more than one way to do so.

At the end of the day, is it really a computer passing the test we should be afraid of or one that intentionally fails it?

Photo by Tara Winstead:


bottom of page