ChatGPT Is Getting Dumber at Basic Math 91
A recently released research reveals a fundamental challenge of developing artificial intelligence: ChatGPT has become worse at performing certain basic math operations. From a report: The researchers at Stanford University and the University of California, Berkeley said the deterioration is an example of a phenomenon known to AI developers as drift, where attempts to improve one part of the enormously complex AI models make other parts of the models perform worse.
[...] Thus far, they have tested two versions of ChatGPT: version 3.5, available free online to anyone, and version 4.0, available via a premium subscription. The results aren't entirely promising. They gave the chatbot a basic task: identify whether a particular number is a prime number. This is the sort of math problem that is complicated for people but simple for computers.
Is 17,077 prime? Is 17,947 prime? Unless you are a savant you can't work this out in your head, but it is easy for computers to evaluate. A computer can just brute force the problem -- try dividing by two, three, five, etc., and see if anything works. To track performance, the researchers fed ChatGPT 1,000 different numbers. In March, the premium GPT-4, correctly identified whether 84% of the numbers were prime or not. (Pretty mediocre performance for a computer, frankly.) By June its success rate had dropped to 51%. Across eight different tasks, GPT-4 became worse at six of them. GPT-3.5 improved on six measures, but remained worse than its advanced sibling at most of the tasks.
[...] Thus far, they have tested two versions of ChatGPT: version 3.5, available free online to anyone, and version 4.0, available via a premium subscription. The results aren't entirely promising. They gave the chatbot a basic task: identify whether a particular number is a prime number. This is the sort of math problem that is complicated for people but simple for computers.
Is 17,077 prime? Is 17,947 prime? Unless you are a savant you can't work this out in your head, but it is easy for computers to evaluate. A computer can just brute force the problem -- try dividing by two, three, five, etc., and see if anything works. To track performance, the researchers fed ChatGPT 1,000 different numbers. In March, the premium GPT-4, correctly identified whether 84% of the numbers were prime or not. (Pretty mediocre performance for a computer, frankly.) By June its success rate had dropped to 51%. Across eight different tasks, GPT-4 became worse at six of them. GPT-3.5 improved on six measures, but remained worse than its advanced sibling at most of the tasks.