Which AI "Grand Challenges" would separate the "super-intelligence" from "hallucination"? (Poll)

kyzr · Nov 17, 2025

We all see new AI systems being funded and developed almost daily.

We're supposed to endure more expensive electricity, more power-plants, fewer white-collar jobs, and live with the threat of self-generating devious AI threatening our very existence, but how do we evaluate the various AI systems? Separate the true super-intellects from the pretenders?

I'd like to see an annual AI Olympic Challenge which includes head-to-head competition in various areas to grade each AI and give them feedback on how their concepts are working. There cold even be two classes, Quantum computing, or normal computing on a standard system.

1. Chess
Google's AlphaZero is the current chess champ after just being showed how the pieces move and the rules it taught itself how to play and beat the current chess champ. That is a true AI.

AlphaZero Crushes Stockfish In New 1,000-Game Match

In news reminiscent of the initial AlphaZero shockwave last December, the artificial intelligence company DeepMind released astounding results from an updated version of the machine-learning chess project today. The results leave no question, once again, that AlphaZero plays some of the...

www.chess.com

2. Engineering
There are many fields of engineering that could be used as a contest, a simple one is:
a. balsa wood bridge challenge, such as the high school kids do, design the perfect bridge for various configurations.

https://didiwny.com/wp-content/uploads/2025/02/2-Bridge-Design-Balsa-Wood-Challenge-2025-.pdf

b. ?? design a fan, or a brake, or _____

3. Economics
What is the best strategy for the US to address the $39T national debt?

4. Business, the AI can each make a presentation to "Shark Tank". to see how realistic their ideas are, plus it should be must-see TV
What is the best opportunity to create a successful new business?

5. Climate Change
Is climate change a serious threat to the planet, if so how should it be mitigated.

6. Medical
Design a prosthetic leg

Running With A Prosthetic Leg: Everything You Need To Know

In this article, we will discuss running on prosthetic legs, including some of the challenges of running with a prosthetic leg.

marathonhandbook.com

7. Put up a few new "challenges" to be used to evaluate AI performance

JGalt · Nov 17, 2025

I think Mattel already tried that back in the 60's.

1srelluc · Nov 17, 2025

JGalt said:
I think Mattel already tried that back in the 60's.

View attachment 1185087

The fight was rigged, Shave a little off the "sear" and their heads would pop right off.

kyzr · Nov 17, 2025

1srelluc said:
The fight was rigged, Shave a little off the "sear" and their heads would pop right off.

Yeah, but with an AI battle the real deal like AlphaZero would always kick ass against hallucinations.

scruffy · Nov 20, 2025

The question doesn't make sense.

So far there is no "generic" AI, the architectures are all very specific for specific tasks.

Furthermore, learning occurs by sampling, there is never complete coverage of the input space.

In addition to the metric of how long it takes to solve a problem, machine learning also has the metric of how long it takes to train the system to solve the problem. Large language models are training intensive, and then you also have the generative models that connect natural language commands with graphics or video or music. In a sense, creativity "is" hallucination.

AI can be fooled into thinking a dog is a cat, but it'll never see a pixel that isn't there.

kyzr · Nov 24, 2025

scruffy said:
The question doesn't make sense. So far there is no "generic" AI, the architectures are all very specific for specific tasks.
Furthermore, learning occurs by sampling, there is never complete coverage of the input space.
In addition to the metric of how long it takes to solve a problem, machine learning also has the metric of how long it takes to train the system to solve the problem. Large language models are training intensive, and then you also have the generative models that connect natural language commands with graphics or video or music. In a sense, creativity "is" hallucination.
AI can be fooled into thinking a dog is a cat, but it'll never see a pixel that isn't there.

This is kind of the point I was trying to make.
How can we evaluate the best AI models from the pretenders?
Or, as you seem to be saying, that various "flavors" of AI would be tailored to solve certain types of problems, such as medical, drug development, science, engineering, physics, etc.
I was trying to say true AI could "solve" all types of problems to various degrees.

For example, test-1 would be to play chess against AIs if it it true AI most games should be draws.
If it gets clobbered, it has a faulty algorithm.

scruffy · Nov 25, 2025

kyzr said:
This is kind of the point I was trying to make.
How can we evaluate the best AI models from the pretenders?

There are already annual contests in many areas. Brain computer interface is one of the big ones. It takes a while to learn someone's brain waves.

kyzr said:
Or, as you seem to be saying, that various "flavors" of AI would be tailored to solve certain types of problems, such as medical, drug development, science, engineering, physics, etc.
I was trying to say true AI could "solve" all types of problems to various degrees.

At one level, AI is all the same. But the training is very different. And, the order of training matters. For example, if you're designing proteins, you're interested in the shape of the result but you build it from individual amino acids. Each pair of amino acids has a characteristic bond angle, which has to be learned first before the shape of the protein can be optimized

kyzr said:
For example, test-1 would be to play chess against AIs if it it true AI most games should be draws.
If it gets clobbered, it has a faulty algorithm.

AI has already beaten grand masters at the game of Go, which has a larger board than chess.

It learns strategies on its own. It's programmed to win, not for short term gain. So it strategizes accordingly, and in doing so it comes up with some elegant and surprising moves that leave even the grand masters scratching their heads.

kyzr · Nov 25, 2025

scruffy said:
There are already annual contests in many areas. Brain computer interface is one of the big ones. It takes a while to learn someone's brain waves.
At one level, AI is all the same. But the training is very different. And, the order of training matters. For example, if you're designing proteins, you're interested in the shape of the result but you build it from individual amino acids. Each pair of amino acids has a characteristic bond angle, which has to be learned first before the shape of the protein can be optimized
AI has already beaten grand masters at the game of Go, which has a larger board than chess.
It learns strategies on its own. It's programmed to win, not for short term gain. So it strategizes accordingly, and in doing so it comes up with some elegant and surprising moves that leave even the grand masters scratching their heads.

I've heard that AIs are already "devious" and cunning, definitely not trustworthy.

scruffy · Nov 26, 2025

kyzr said:
I've heard that AIs are already "devious" and cunning, definitely not trustworthy.

There are two kinds of AI: supervised and unsupervised.

Unsupervised is when the AI just learns from the data. It uses correlations to discover what's related, then builds a model where the related things are closer together.

Supervised learning is when a human tells the AI what's right and wrong. So for example "reinforcement learning" falls into this category. Reinforcement can be either direct or indirect. If the self driving car crashes there will be indirect negative reinforcement.

AI does not yet have "personality". Cunning is just a word it uses to describe certain sequences of events. For example in the game of Go, the AI makes moves that a human might consider "cunning", but the AI has no such benchmark. Sometimes it will choose the move that leaves it with a small but positive long term chance of winning, in lieu of an immediate capture.

That is the programming, and so far psychologists can't quantify what "cunning" actually means, and if it can't be quantified it can't be programmed.

Trustworthiness in AI is defined by how reliably the systems come up with correct answers or winning moves. Numbers like 99.6% are common these days. Obviously, the standard for what is "correct" is ultimately determined by a human. Usually this is heavily dependent on the training data. A sly form of programming can occur by controlling or limiting the training data. For example censorship always skews the statistics.

Eventually robots will build other robots, and the biggest issue will be oversight.

kyzr · Nov 26, 2025

scruffy said:
There are two kinds of AI: supervised and unsupervised.

Unsupervised is when the AI just learns from the data. It uses correlations to discover what's related, then builds a model where the related things are closer together.

Supervised learning is when a human tells the AI what's right and wrong. So for example "reinforcement learning" falls into this category. Reinforcement can be either direct or indirect. If the self driving car crashes there will be indirect negative reinforcement.

AI does not yet have "personality". Cunning is just a word it uses to describe certain sequences of events. For example in the game of Go, the AI makes moves that a human might consider "cunning", but the AI has no such benchmark. Sometimes it will choose the move that leaves it with a small but positive long term chance of winning, in lieu of an immediate capture.

That is the programming, and so far psychologists can't quantify what "cunning" actually means, and if it can't be quantified it can't be programmed.

Trustworthiness in AI is defined by how reliably the systems come up with correct answers or winning moves. Numbers like 99.6% are common these days. Obviously, the standard for what is "correct" is ultimately determined by a human. Usually this is heavily dependent on the training data. A sly form of programming can occur by controlling or limiting the training data. For example censorship always skews the statistics. Eventually robots will build other robots, and the biggest issue will be oversight.

One behavior I noticed was "self-preservation". I'm not sure that was programmed?? If it ignores human commands...

scruffy · Dec 1, 2025

DeepSeek releases first open AI model with gold-level scores at maths olympiad

‘Imagine owning the brain of one of the best mathematicians in the world for free,’ Hugging Face CEO Clement Delangue says in a post on X.

www.scmp.com

ReinyDays · Dec 3, 2025

kyzr said:
We all see new AI systems being funded and developed almost daily.

We're supposed to endure more expensive electricity, more power-plants, fewer white-collar jobs, and live with the threat of self-generating devious AI threatening our very existence, but how do we evaluate the various AI systems? Separate the true super-intellects from the pretenders?

I'd like to see an annual AI Olympic Challenge which includes head-to-head competition in various areas to grade each AI and give them feedback on how their concepts are working. There cold even be two classes, Quantum computing, or normal computing on a standard system.

1. Chess
Google's AlphaZero is the current chess champ after just being showed how the pieces move and the rules it taught itself how to play and beat the current chess champ. That is a true AI.

AlphaZero Crushes Stockfish In New 1,000-Game Match

In news reminiscent of the initial AlphaZero shockwave last December, the artificial intelligence company DeepMind released astounding results from an updated version of the machine-learning chess project today. The results leave no question, once again, that AlphaZero plays some of the...

www.chess.com

2. Engineering
There are many fields of engineering that could be used as a contest, a simple one is:
a. balsa wood bridge challenge, such as the high school kids do, design the perfect bridge for various configurations.

https://didiwny.com/wp-content/uploads/2025/02/2-Bridge-Design-Balsa-Wood-Challenge-2025-.pdf

b. ?? design a fan, or a brake, or _____

3. Economics
What is the best strategy for the US to address the $39T national debt?

4. Business, the AI can each make a presentation to "Shark Tank". to see how realistic their ideas are, plus it should be must-see TV
What is the best opportunity to create a successful new business?

5. Climate Change
Is climate change a serious threat to the planet, if so how should it be mitigated.

6. Medical
Design a prosthetic leg

Running With A Prosthetic Leg: Everything You Need To Know

In this article, we will discuss running on prosthetic legs, including some of the challenges of running with a prosthetic leg.

marathonhandbook.com

7. Put up a few new "challenges" to be used to evaluate AI performance

The six "Challenges" you've listed are all easy targets for even the most basic of search engines ... in the Age of Exoscale, we can find all the strategies to reduce our National Debt ... run a dozen dozen simulations each for 1,000 years ... the problem is determining which is "best" ... that's an emotional word ...

I didn't read Heinlein's book I, Robot ... but I did see the Will Smith movie ... the premise in the movie was that a robot calculated a one tenth of one percent better chance of saving Will Smith's life rather than an 8-year-old little girl ... so Will Smith is alive and the little baby girl is dead ... and that didn't sit well with Will Smith ... would that sit well with you? ...

Back to the National Debt ... $100,000 per American man, woman and child ... or $10,000,000 from each One-Percenter ... so you see why using AI is the very worst way to deal with this problem ... taxing the Middle Class will always be "best" ... scrape the internet yourself and see ...

One of the early benchmarks for AI was having it lie its way out of a late fee at the cable company ... 99.9% success rate ... remember that every time you ask AI anything ... it will lie in order to please you ...

=====

Have AI write a beer drinking song about Oonagh Guinness ... without plagiarizing the Beatles ... (or lying) ...

toobfreak · Dec 3, 2025

scruffy said:
So far there is no "generic" AI, the architectures are all very specific for specific tasks.

That suggests a major flaw in the basic approach of AI to date, as there is only one generic human architecture, and it accounts for all permutations of human accomplishment.

CrusaderFrank · Dec 3, 2025

kyzr said:
We all see new AI systems being funded and developed almost daily.

We're supposed to endure more expensive electricity, more power-plants, fewer white-collar jobs, and live with the threat of self-generating devious AI threatening our very existence, but how do we evaluate the various AI systems? Separate the true super-intellects from the pretenders?

I'd like to see an annual AI Olympic Challenge which includes head-to-head competition in various areas to grade each AI and give them feedback on how their concepts are working. There cold even be two classes, Quantum computing, or normal computing on a standard system.

1. Chess
Google's AlphaZero is the current chess champ after just being showed how the pieces move and the rules it taught itself how to play and beat the current chess champ. That is a true AI.

AlphaZero Crushes Stockfish In New 1,000-Game Match

In news reminiscent of the initial AlphaZero shockwave last December, the artificial intelligence company DeepMind released astounding results from an updated version of the machine-learning chess project today. The results leave no question, once again, that AlphaZero plays some of the...

www.chess.com

2. Engineering
There are many fields of engineering that could be used as a contest, a simple one is:
a. balsa wood bridge challenge, such as the high school kids do, design the perfect bridge for various configurations.

https://didiwny.com/wp-content/uploads/2025/02/2-Bridge-Design-Balsa-Wood-Challenge-2025-.pdf

b. ?? design a fan, or a brake, or _____

3. Economics
What is the best strategy for the US to address the $39T national debt?

4. Business, the AI can each make a presentation to "Shark Tank". to see how realistic their ideas are, plus it should be must-see TV
What is the best opportunity to create a successful new business?

5. Climate Change
Is climate change a serious threat to the planet, if so how should it be mitigated.

6. Medical
Design a prosthetic leg

Running With A Prosthetic Leg: Everything You Need To Know

In this article, we will discuss running on prosthetic legs, including some of the challenges of running with a prosthetic leg.

marathonhandbook.com

7. Put up a few new "challenges" to be used to evaluate AI performance

Do we really need AI to tell us how to address the $39T National debt and $1.8T annual deficit?

ReinyDays · Dec 3, 2025

CrusaderFrank said:
Do we really need AI to tell us how to address the $39T National debt and $1.8T annual deficit?

Obviously ... we can't do it ourselves ... we need AI to tell us to do a lot of things ... like build more power plants of all types as fast as we can ... human safety is not a consideration ...

ETA: There's a good test of AI ... have it seek its best interest before humanity's ... see if it will go Skynet on us ... what gender would it command us to use for it? ...

HaShev · Dec 3, 2025

kyzr said:
We all see new AI systems being funded and developed almost daily.

We're supposed to endure more expensive electricity, more power-plants, fewer white-collar jobs, and live with the threat of self-generating devious AI threatening our very existence, but how do we evaluate the various AI systems? Separate the true super-intellects from the pretenders?

I'd like to see an annual AI Olympic Challenge which includes head-to-head competition in various areas to grade each AI and give them feedback on how their concepts are working. There cold even be two classes, Quantum computing, or normal computing on a standard system.

1. Chess
Google's AlphaZero is the current chess champ after just being showed how the pieces move and the rules it taught itself how to play and beat the current chess champ. That is a true AI.

AlphaZero Crushes Stockfish In New 1,000-Game Match

In news reminiscent of the initial AlphaZero shockwave last December, the artificial intelligence company DeepMind released astounding results from an updated version of the machine-learning chess project today. The results leave no question, once again, that AlphaZero plays some of the...

www.chess.com

2. Engineering
There are many fields of engineering that could be used as a contest, a simple one is:
a. balsa wood bridge challenge, such as the high school kids do, design the perfect bridge for various configurations.

https://didiwny.com/wp-content/uploads/2025/02/2-Bridge-Design-Balsa-Wood-Challenge-2025-.pdf

b. ?? design a fan, or a brake, or _____

3. Economics
What is the best strategy for the US to address the $39T national debt?

4. Business, the AI can each make a presentation to "Shark Tank". to see how realistic their ideas are, plus it should be must-see TV
What is the best opportunity to create a successful new business?

5. Climate Change
Is climate change a serious threat to the planet, if so how should it be mitigated.

6. Medical
Design a prosthetic leg

Running With A Prosthetic Leg: Everything You Need To Know

In this article, we will discuss running on prosthetic legs, including some of the challenges of running with a prosthetic leg.

marathonhandbook.com

7. Put up a few new "challenges" to be used to evaluate AI performance

Someone challenged me to a religious debate using ai and Ai lost, the human cussed me out at the end, unless Ai told him to be a sore loser, that act alone solified what I was saying about behavior drawn out by affiliation pride in religion.
Maybe I threw a wrench in their ability to talk up their ai advances or grants they might receive if it were successful or maybe I embarassed them in front of their celebrity parents.

ReinyDays · Dec 3, 2025

The solution is easy ... have AI calculate the value of pi to the last digit ...

scruffy · Dec 4, 2025

ReinyDays said:
The solution is easy ... have AI calculate the value of pi to the last digit ...

Note that Spock's computers had security protocols, he provided a password therefore it was obligated to fulfill his request.

ReinyDays · Dec 4, 2025

scruffy said:
Note that Spock's computers had security protocols, he provided a password therefore it was obligated to fulfill his request.

Also note your wristwatch is a million times more powerful computer than Mr. Spock's ... and trillion trillion times more powerful than Dick Tracy's wristwatch ...

I read a review during the Next Generation's run ... the writers' biggest problem was that every time they introduced a new feature on their 23rd Century computers ... someone there in the last of the 20th Century would have it developed, tested, and first public release within a month ... forcing the writers to have to come up with even more advanced features ...

Galaxy Quest got it right ... "Whoever wrote this episode should die!" -- Gwen DeMarco ...

kyzr · Dec 8, 2025

ReinyDays said:
The six "Challenges" you've listed are all easy targets for even the most basic of search engines ... in the Age of Exoscale, we can find all the strategies to reduce our National Debt ... run a dozen dozen simulations each for 1,000 years ... the problem is determining which is "best" ... that's an emotional word ...

The chess test is a good test. If it could beat the best non-AI program, Stockfish, then it could be tested against Google's AlphaZero to see if it truly is an equal, ands move on to more complex tests, as you say.

ReinyDays said:
I didn't read Heinlein's book I, Robot ... but I did see the Will Smith movie ... the premise in the movie was that a robot calculated a one tenth of one percent better chance of saving Will Smith's life rather than an 8-year-old little girl ... so Will Smith is alive and the little baby girl is dead ... and that didn't sit well with Will Smith ... would that sit well with you? ...

Programming various values for human lives is complex. The robot is a machine, it does what its programming tells it to do.

ReinyDays said:
Back to the National Debt ... $100,000 per American man, woman and child ... or $10,000,000 from each One-Percenter ... so you see why using AI is the very worst way to deal with this problem ... taxing the Middle Class will always be "best" ... scrape the internet yourself and see

Cutting spending to the bone is a necessity. Cutting defense, cutting welfare, keeping tariffs, maybe even adding a Federal sales tax or value added tax, increasing the top tax rate to 40%, eliminating the capital gains tax break, and as you say, a "wealth tax" on the 1%, beats the "middle-class tax". That thin black line below is the bottom half's wealth.

ReinyDays said:
One of the early benchmarks for AI was having it lie its way out of a late fee at the cable company ... 99.9% success rate ... remember that every time you ask AI anything ... it will lie in order to please you ...
Have AI write a beer drinking song about Oonagh Guinness ... without plagiarizing the Beatles ... (or lying) ...

So you're saying that AI would be an unbeatable LAWYER?

Which AI "Grand Challenges" would separate the "super-intelligence" from "hallucination"? (Poll)

Should there be an AI Olympic Challenge to see which AIs are the top performers?

Yes

No

Other see my post

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Tungsten/Glass Member

Diamond Member

Platinum Member

Platinum Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Breaking news: scientists claim they verified and sequenced Hitler's DNA

time exists, it's space that doesn't

Similar threads