I’m still trying to learn how to use ChatGPT to improve my productivity. One thing I’ve been experimenting with recently is to ask it to check my math. As it turns out, I’m still better at math than the algorithm. Here is a link to a recent discussion I had with ChatGPT. My entries are the short strongly indented statements. In this little conversation, ChatGPT made six separate math errors. Each time it confidently asserted something to be true when in fact it was provably false, and each time it would cheerfully admit its error when I pointed it out.
My recommendation is to keep ChatGPT on a short leash. Don’t ask it anything you can’t directly confirm yourself.
Doesn’t EJ Antoni rely on ChatGPT for his Twitter posts? And he trust it 100% as little EJ has no clue what real economics is.
Here’s what ChatGPT just produced over at EJ’s Twitter:
‘Adjusting average earnings for inflation doesn’t always present an accurate picture of the middle class’s financial health b/c average income can be skewed by outliers; the Household Budget Index just looks at the purchasing power of middle-income families and it shows significant losses over the last several years’
Macroduck the other day noted what this index is different from say CPI and it had nothing to do with “outliers”:
https://www.primerica.com/public/household-budget-index.html
The other thing is that this measure has been rising of late and is about where it was in January 2019. Clearly EJ’s version of ChatGPT is off the mark and of course a fake PhD and general MAGA moron knows even less.
“Each time it confidently asserted something to be true when in fact it was provably false, and each time it would cheerfully admit its error when I pointed it out.”
That was sort of fun to wade through.
Which model was this? 4o, or o1 (mini or preview)? That will potentially make a massive difference in the outcome.
It tells me, “I am ChatGPT, based on the GPT-4 architecture. My current version (v2.0) was released in 2024”
You are likely using the 4o version, which isn’t one of their ”reasoning” models (the o1 models). They would likely do far better, but they are only available to paid users. If you have any interest in trying without paying, I would be happy to to set up a Zoom call or something if you want to try.
Hi JDH,
You didn’t specify the version of chatGPT you were using, but I suspect it was not the best version for these problems. I asked your first question and then followup question to ChatGPT 01-preview, which implements chain of thought reasoning. Preview did not mistakenly assume that b is non-negative after calculating the determinant and adj(A). I noticed the version you used got the (3,2) term in the adjoint matrix wrong and therefore the wrong sign in its final answer. Preview got that right the first time. I then asked preview to multiply the third row of A by the second column of adj(A) and it got zero as expected.
I general I agree that you have to be careful using these models for symbolic math. For math problems and other reasoning tasks, you need to use preview, although I think it’s rate limited at this point. Another technique is to use the API and GPT-4o. ChatGPT allows tool calls through the API and in particular calls to the python interpreter, meaning that you can give the models access to SymPy to help it with symbolic math when it needs it. The new open LLama 3.2 models have also implemented tool calls. Besides being able to call to python and web search, you can also use a built in tool in LLama 3.2 to call Wolfram Alpha to help with symbolic (or numerical calculations).
I use these models all the time. Tremendous productivity gains.
This is for all the world like a C student bluffing his way through a writing assignment.
Couple of things:
The bot thingie keeps saying “let’s carefully analyze…” “Carefully” is salesmanship. It contributes nothing to the substance of the analysis. “Let’s” similarly adds nothing. Why is this expression in the bot’s response? I’m going to guess that it’s cribbed from a textbook. Not surprising, but it is kind of comical that the bot manages to ape teacher-speak but fails to get the math right.
Also comical that it’s better at making excuses than at completing the assignment. Acting like a C student – not the Turing test I would have expected.
The other thing that jumps out is that the bot works in a way that fails at what computers are good at. There is no way your average spreadsheet gets matrix algebra wrong on this scale. The bot is unable to scout out commonplace programming and report what it says. Rather, it depends entirely on some new-fangled process – linguistic? – that fails at what pre-AI computers were good at.
It seems AI wizards have not given their new programs the ability to interrogate other programs. I know nothing of such things, but this seems odd. Teaching computers to make sense of text has been hard, and is clearly not yet ready for prime time. Doing math to perfection was programmed in long ago.
Computers are now used to help create complex molecules. I wonder whether AI would make egregious errors in explaining how to balance simple chemical equations. Same for engineering, physics, inventory control – would ChatGPT ignore everything that computers have been taught to do in favor of trying to make sense of the semantics of some textbook? Seems a huge missed opportunity.
“Acting like a C student – not the Turing test I would have expected.”
One could say that it passes the Turing test. You say it acts like a C student and what’s more human than your average C student who makes a few mistakes. An absolutely perfect bot would give itself away as non-human.
I have found that chatgpt provides reasonable responses to specific questions asking for direct answers. if it is expected to monologue, then it rambles into unreliable results. and as noted previously, the specific model used is important. I have not really heard of folks using it to compute complex symbolic algebra, although it has done pretty good at describing results from some mathematical analysis I have had it investigate. but it was always best when I gave very specific requests, not vague directions.
overall it can be viewed as a pretty good teacher or academic colleague, but use with care. it only admits mistakes when you point them out.
I have an account with chatgpt. now you have me interested in using it over my preferred symbolic math software of maple.
I will say with respect to creating computer code, it kills. I have had it write code in Matlab, fortran and python. it excels at this type of work.
ChatGPT consistently gets math wrong. I’ve found if I include in the prompt something like [Stop generating if the numbers do not add up]. It will stop and tell me it is a mistake. Still looking for a solution.
If you’re looking for productivity tips, it cleans data and produces crosswalks between categories incredibly well. Both require checking, but it is fast than me or whoever I would delegate it to.