ChatGPTā€™s latest model may be a regression in performance

by Pelican Press
9 views 4 minutes read

ChatGPTā€™s latest model may be a regression in performance

According to a new report from Artificial Analysis, OpenAIā€™s flagship large language model for ChatGPT, GPT-4o, has significantly regressed in recent weeks, putting the state-of-the-art modelā€™s performance on par with the far smaller, and notably less capable, GPT-4o-mini model.

This analysis comes less than 24 hours after the company announced an upgrade for the GPT-4o model. ā€œThe modelā€™s creative writing ability has leveled upā€“more natural, engaging, and tailored writing to improve relevance & readability,ā€ OpenAI wrote on X. ā€œItā€™s also better at working with uploaded files, providing deeper insights & more thorough responses.ā€ Whether those claims continue to hold up is now being cast in doubt.

ā€œWe have completed running our independent evals on OpenAIā€™s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o,ā€ the Artificial Analysis announced via an X post on Thursday, noting that the modelā€™s Artificial Analysis Quality Index decreased from 77 to 71 (and is now equal to that of GPT-4o mini).

Whatā€™s more, GPT-4oā€™s performance on the GPQA Diamond benchmark decreased from 51% to 39% while its MATH benchmarks decreased from 78% to 69%.

Simultaneously, the researchers discovered more than a doubling in the speed increase of the modelā€™s responses, accelerating from around 80 output tokens per second to roughly 180 tokens/s. ā€œWe have generally observed significantly faster speeds on launch day for OpenAI models (likely due to OpenAI provisioning capacity ahead of adoption), but previously have not seen a 2x speed difference,ā€ the researchers wrote.

Wait ā€“ is the new GPT-4o a smaller and less intelligent model?

We have completed running our independent evals on OpenAIā€™s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o.

GPT-4o (Nov) vs GPT-4o (Aug):
āž¤ā€¦ pic.twitter.com/gjY2pBFuUv

ā€” Artificial Analysis (@ArtificialAnlys) November 21, 2024

ā€œBased on this data, we conclude that it is likely that OpenAIā€™s Nov 20th GPT-4o model is a smaller model than the August release,ā€ they continued. ā€œGiven that OpenAI has not cut prices for the Nov 20th version, we recommend that developers do not shift workloads away from the August version without careful testing.ā€

GPT-4o was first released in May 2024 to surpass the existing GPT-3.5 and GPT-4 models. GPT-4o offers state-of-the-art benchmark results in voice, multilingual, and vision tasks, according to OpenAI, making it ideal for advanced applications like real-time translation and conversational AI.








Source link

#ChatGPTs #latest #model #regression #performance

Add Comment

You may also like