Chatting about math with ChatGPT

I’m still trying to learn how to use ChatGPT to improve my productivity. One thing I’ve been experimenting with recently is to ask it to check my math. As it turns out, I’m still better at math than the algorithm. Here is a link to a recent discussion I had with ChatGPT. My entries are the short strongly indented statements. In this little conversation, ChatGPT made six separate math errors. Each time it confidently asserted something to be true when in fact it was provably false, and each time it would cheerfully admit its error when I pointed it out.

My recommendation is to keep ChatGPT on a short leash. Don’t ask it anything you can’t directly confirm yourself.

8 thoughts on “Chatting about math with ChatGPT

  1. pgl

    Doesn’t EJ Antoni rely on ChatGPT for his Twitter posts? And he trust it 100% as little EJ has no clue what real economics is.

    Reply
  2. pgl

    Here’s what ChatGPT just produced over at EJ’s Twitter:

    ‘Adjusting average earnings for inflation doesn’t always present an accurate picture of the middle class’s financial health b/c average income can be skewed by outliers; the Household Budget Index just looks at the purchasing power of middle-income families and it shows significant losses over the last several years’

    Macroduck the other day noted what this index is different from say CPI and it had nothing to do with “outliers”:

    https://www.primerica.com/public/household-budget-index.html

    The other thing is that this measure has been rising of late and is about where it was in January 2019. Clearly EJ’s version of ChatGPT is off the mark and of course a fake PhD and general MAGA moron knows even less.

    Reply
  3. pgl

    “Each time it confidently asserted something to be true when in fact it was provably false, and each time it would cheerfully admit its error when I pointed it out.”

    That was sort of fun to wade through.

    Reply
    1. James_Hamilton Post author

      It tells me, “I am ChatGPT, based on the GPT-4 architecture. My current version (v2.0) was released in 2024”

      Reply
      1. KenS

        You are likely using the 4o version, which isn’t one of their ”reasoning” models (the o1 models). They would likely do far better, but they are only available to paid users. If you have any interest in trying without paying, I would be happy to to set up a Zoom call or something if you want to try.

        Reply
  4. Rick Stryker

    Hi JDH,

    You didn’t specify the version of chatGPT you were using, but I suspect it was not the best version for these problems. I asked your first question and then followup question to ChatGPT 01-preview, which implements chain of thought reasoning. Preview did not mistakenly assume that b is non-negative after calculating the determinant and adj(A). I noticed the version you used got the (3,2) term in the adjoint matrix wrong and therefore the wrong sign in its final answer. Preview got that right the first time. I then asked preview to multiply the third row of A by the second column of adj(A) and it got zero as expected.

    I general I agree that you have to be careful using these models for symbolic math. For math problems and other reasoning tasks, you need to use preview, although I think it’s rate limited at this point. Another technique is to use the API and GPT-4o. ChatGPT allows tool calls through the API and in particular calls to the python interpreter, meaning that you can give the models access to SymPy to help it with symbolic math when it needs it. The new open LLama 3.2 models have also implemented tool calls. Besides being able to call to python and web search, you can also use a built in tool in LLama 3.2 to call Wolfram Alpha to help with symbolic (or numerical calculations).

    I use these models all the time. Tremendous productivity gains.

    Reply
  5. Macroduck

    This is for all the world like a C student bluffing his way through a writing assignment.

    Couple of things:

    The bot thingie keeps saying “let’s carefully analyze…” “Carefully” is salesmanship. It contributes nothing to the substance of the analysis. “Let’s” similarly adds nothing. Why is this expression in the bot’s response? I’m going to guess that it’s cribbed from a textbook. Not surprising, but it is kind of comical that the bot manages to ape teacher-speak but fails to get the math right.

    Also comical that it’s better at making excuses than at completing the assignment. Acting like a C student – not the Turing test I would have expected.

    The other thing that jumps out is that the bot works in a way that fails at what computers are good at. There is no way your average spreadsheet gets matrix algebra wrong on this scale. The bot is unable to scout out commonplace programming and report what it says. Rather, it depends entirely on some new-fangled process – linguistic? – that fails at what pre-AI computers were good at.

    It seems AI wizards have not given their new programs the ability to interrogate other programs. I know nothing of such things, but this seems odd. Teaching computers to make sense of text has been hard, and is clearly not yet ready for prime time. Doing math to perfection was programmed in long ago.

    Computers are now used to help create complex molecules. I wonder whether AI would make egregious errors in explaining how to balance simple chemical equations. Same for engineering, physics, inventory control – would ChatGPT ignore everything that computers have been taught to do in favor of trying to make sense of the semantics of some textbook? Seems a huge missed opportunity.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *