@Shkib0y

One of my favorite stats professors enjoyed adding superfluous information to problems after students requested he didn’t and he always explained that identifying the pertinent information is as necessary a skill as your computational ability. I’d argue it’s even more important in applied mathematics and statistics given how heavily computational processes are offloaded to technology.

@kichelmoon6365

“They think that intelligence is about noticing things are relevant (detecting patterns); in a complex world, intelligence consists in ignoring things that are irrelevant (avoiding false patterns)” 
-Nassim Taleb

@johnchessant3012

I would contend that kids make the same mistake on the shepherd question because they're being trained to pattern-match just like the LLMs are. K-12 math education, on the whole, is stifling their creativity and their reasoning ability.

@casaroli

I’m using chat gpt with some undergrad math problems. 
The number of times I’ve gotten blatantly contradictory answers - within the same question - is very big. 
I’ve also convinced it to change its answer and agree with me on many problems and I simply don’t know when I can or can’t trust it.

@imthinkingthoughts

Hey Trefor, for future videos: its OpenAI o1 and OpenAI o1-mini. ChatGPT 4o is a separate line

@jursamaj

I don't get why this is even a question.  LLMs aren't doing reasoning AT ALL.  We already know this.  They are predicting what word comes next.

@paullovekamp7884

When ChatGPT came out publicly,  I asked it to solve the Gaussian integral.  That can't be solved in closed form, except for the special case from -infinity to +infinity.  It went through the steps, which are not obvious, but then came up with 1/sqrt(pi).  That's wrong, but close.  It's just plain sqrt(pi). It seemed to be just copying the steps from somewhere, but didn't actually understand what it was doing.  Kind of like some random student copying somebody else's answers to cheat.  !!  Makes sense for an LLM to do that.

@andrewharrison8436

I am surprised that the results didn't drop by more.
To really test the model we should probably add a bit to each question: "Show your working".

@boredphysicist

Theres various quadratics that LLMs are physically incapable of solving, it discovered a couple a while back by accident.

No matter how much I corrected it, they got it wrong time after time

@Unchained_Alice

Tbh adding in extra clauses like that would confuse most humans too. Including me, it just gets more confusing and easy to misread something. It really did need to have humans tested too. They can't just claim that humans would be likely to be able to discard useless information without data to back it up.
Edit:
Also what is reasoning? If I look at the qestion in my head am I not pattern matching the words into a sentence and then calculating how it needs to be solved? If I have learnt how to solve these problems, am I not just applying an alogorithm in my head to solve it? Really for me AI needs to be able to learn to solve things it hasn't been taught how to solve and can't reference other things to solve.

@kellystevens6464

On Wall Street the assumption is that true AGI is here, simply increasing the training data and the compute power will do the trick. Thus the justification for the unprecedented capital investments going on. This Apple study, among other things, makes you think the “singularity” is a long way off.

@3750gustavo

The issue of the tweak 2 is that it is confusing and puts the attention of the model to its limit, unless you try setting temp really low like 0.2 or less, most models will be tanked by that, even 70b models, but if you distribute the info in a table or a list of premises/conditions, suddenly the model can perform just as well

@zyxyuv1650

You can increase the score by asking o1 to assume there may be misleading language and to first determine which terms are irrelevant. As you know, hinting can have a significant impact and the irrelevancy hint can add extra reasoning steps. Even without a new algorithm for negative pattern recognition, OpenAI could make these steps into a layer pre-requirement, but it would have some impact on response time and token pricing. We may be seeing some "Layer scarcity" while OpenAI tries to figure out what to prioritize, because the model would become too slow and pricy if every possible layer is enabled right now.

@jackgude3969

I kind of disagree with the suggestion that this covo is pedantic. I think it's one of the most important problems with "AI" right now. We're already seeing lots of people put a lot of faith in this tech and it seems really important to formally investigate what it's actually capable of intellectually.

@Zeddy2.718

I'm a junior double majoring in Math & CS because I want to be an AI researcher in the future. This is a topic I've been reflecting on. 
When I was a freshman, I used ChatGPT for some coding HWs, but I haven't used it for math HWs since because it frequently gave me incorrect solutions. 
Every time a new model is released, I test it with my HWs, still disapointing.
Usually, I use it to look up concepts or interpretations in Math & CS, but mostly for elective courses. 
This is a limitation of LLMs that can’t be helped, and only when AI acquires strong math reasoning skills as a Ph.D level will we enter the era of AGI. 
We need a new learning system beyond Deep Learning. What could that be?🤔 It’s intriguing!

@tserendorjbatjargal

Speaking of which, 5:49 which one? The one you said or the one you displayed on screen?

@beng4186

Excellent video! You have a wonderful style of presentation!

@loganbeavers9799

This is actually very interesting, it sounds like the paper is saying these language models basically lack common sense. But I don't think this means that they're unable to mathematically reason.

To be less vague:
 Imagine in another culture saying something like
"5 apples are smaller than average"
implies that you shouldn't include them in your count because everybody knows small apples aren't desirable and it would be like counting apple cores as apples.

To the large language model the above may be the only rational conclusion. Like Dr. Bazett said, these models may have been trained on problems that only included relevant data.

I think it's a more of a hallmark of reasoning to try to figure out WHY data was included. Like the children doing nonsensical computation, it's just that the AI hasn't learned we throw in irrelevant stuff on tests as part of the test (which is something we've all learned to watch for).

As another example: I'm learning how to use Linux, and when stuff inevitably doesn't work I have a hard time fixing it. This is because when I read through the logs I don't really know what information is pertinent or not. Some logs are very explicit and I can easily lookup problems or just experiment and figure it out. My point being that you could say I'm just as incapable of reasoning because purposefully adding irrelevant information in the logs dramatically decreases my problem-solving abilities (because they HAD to include it for a reason RIGHT?!)

@ashnur

When I was learning myself, I always added these changes to any problem so that I can figure out how to solve them properly, not in the baby cases. It's too bad that with modern math you can't do that because they only work in special cases ;). Also, I tried to teach people like this and they thought I was messing with them, even though I know it helps to remember what's relevant and what's not.

@seijurouhiko

They could have also tested removing clauses needed to solve the problem, and see if the AI would identify that the problem cannot be solved the way it was stated.