Are New GPTs Actually Better - A DataDriven Investigation

ChatGPT really took the world by storm with its impressive language proficiency. Now, with the release of GPT4 and GPT4 Turbo there are promises of even greater heights. Most recently we have custom GPTs. With AI progressing at such an insane rate, it’s good sometimes to stop and analyse and compare and understand which model or type should be used in what circumstance.

Before we begin, for those who don’t know what a custom GPT is, they were released by OpenAI on the 6th November 2023. Custom GPTs can be designed to perform distinct tasks, such as providing information, generating content, answering queries, performing data analysis, or executing particular actions based on predefined instructions which saves time by eliminating the need to give specific instructions before doing meaningful tasks and importantly, custom GPTs offer a friendly way to fine-tune your own chatbot allowing a maximum of 20 files per Assistant, and they can be at most 512 MB each.

The question I want to answer is: Are these flashy custom GPT models genuinely better in meaningful ways?

The Study and Its Findings

Recently, researchers put GPT iterations to test in an unusual experiment in an education setting. They trained a customised GPT-based virtual assistant to act as a statistics professor for business students at the Universidad Pontificia Comillas, they trained it with course material and related resources, specifically they fed it 2 books, authored by professor running the course along with an R programming practices book. By comparing its responses to real student questions against a GPT4 Turbo model, they assessed if specialisation actually improves performance over raw large language model capacity.

The researchers found (as you’d expect) the custom GPT they created which was tailored and primed with their own data did adopt a friendlier conversational tone and was able to surpass GPT4 Turbo when asked about specific course related questions. This in itself is probably the largest benefit over the standard model. However, for general explanation and accuracy, it displayed no significant improvement. In fact, even when the researchers asked questions around R (for which they fed in their own custom data) the difference was minimal which is likely to have occurred because GPT4 Turbo will have enough data around the R programming language in its own dataset.

Bar plot of the scores obtained by BSVP and ChatGPT-4 Turbo in each of the three dimensions analysed

As the paper concludes – specialised customisation offers some novelty, but minimal advantage over raw GPT-4 for the core task.

Interpreting The Evidence

Tailoring did enable narrow specificity lacking in generic models. But broader competence relied more on scale and training than custom tweaks. Such granularity is expected in complex systems still far from generalised intelligence.

Key Takeaways

GPTs undeniably have a role, and that role lies in true customisation with personal data to create hyper-focused chatbots for more accurate responses in specific settings such as an education setting or in-house chatbots trained on bespoke system or CRM instructions for staff to use and ask questions. Even used in company IT departments to handle basic tickets from staff. Thousands upon thousands of GPTs have been created in a short period of time. The majority are not well written and are very poor with their responses. However, a well-written and useful GPT can be an absolute gem and can really boost your productivity.

Full credit to Eduardo C. Garrido-Merchán, Jose L. Arroyo-Barrigüete, Francisco Borrás-Pala, Leandro Escobar-Torres, Carlos Martínez de Ibarreta, Jose María Ortiz-Lozano, Antonio Rua-Vieites. "Real Customization or Just Marketing: Are Customized Versions of Chat GPT Useful?" arXiv preprint arXiv:2312.03728 (2023). The paper is available on arXiv at this link: