Techonology

Forget the lamp. Big language models are still getting cheap

As recently 2022, the construction of a large language model (LLM) was an achievement in the state-of-the-art of artificial-warmth (AI) engineering. Three years later, it is difficult to affect experts. In fact, to stand out in the crowded market, an AI lab needs not only to build a high quality model, but to make it cheaply.

In December, a Chinese firm, Deepsek, made a frontier model in the headlines for cutting dollars of $ 61.6m (LLAMA 3.1, an LLM, a technology company manufactured by a technology company), for cutting the dollar cost of just $ 6m. In a preprint posted online in February, researchers at Stanford University and Washington University have claimed that they have trained their S1 LLM for only $ 6. Another way is, Deepsek took a 2.7 meter hour computer time to train; S1 was taken within just seven hours.

The figures are eye-popping, but comparison is not the same. Where Deepsek’s V3 chatbot was trained with scratches-OpneiI, despite the theft prosecution from an American contestant, and peers-is a “fine tune” on the qwen2.5 LLM already existing instead of S1, produced by China’s other top-tier AI Lab. Before the commencement of S1’s training, in other words, the model may already write, ask questions and produce the code.

Such pigbacking can save saving, but cannot cut costs in single digits on its own. To do this, the American team was freed from the major paradigm in AI Research, which is considered to improve the amount of data and computing power available to train the language model. Instead, he envisaged that a small amount of high amounts of data with high adequate quality could also do work. To test the proposal, he collected a selection of 59,000 questions, which included everything from standardized English tests to graduate level problems in probability, making them possible with the intention of determining the most effective training.

How to do this, questions on yourself are not enough. Answer is also needed. So the team asked Mithun of another AI model, Google, which is known as a logic approach, in which the “thought process” of the model is shared with the answer, using questions. This gave them three datasets to use S1: 59,000 questions to train; With the answer; And used to connect both the “chain of thought”.

He then threw almost everyone. As the S1 was based on Alibaba’s Qwen AI, anything that the model could already solve was unnecessary. Anything poorly formatted was also thrown, as the model of Google solved without the need to think very difficult. If a given problem did not add the overall diversity of the training set, it was also out. The final result was a streamlined 1,000 questions that the researchers proved that all 59,000 trained a model could train a model to train-and for a fraction of the cost.

Such tricks are surprised. Like all logic models, S1 “thinks” before responding, it is finished to work through the problem before announced and submit the final answer. But a lot of logic models give better answers. If they are allowed to think for a long time, an approach called “Test-Time Comput”. And so the researchers hit the simplest possible approach to achieve the model to pursue logic: When it declares that this thinking is finished, just remove that message and add to the word “wait” instead.

The tricks also work. Thinking four times allows the model to score more than 20 percent with scientific people along with mathematics tests. 16 times preventing the model from being unable to achieve a single mark in a hard mathematics examination to get a score of 60%until being forced to think. Hard to think is more expensive, of course, and an increase in estimated cost with each additional “waiting”. But as the training is available cheaply, additional expenditure may be worth it.

Researchers say that her new model already defeated openi’s first attempt in space, the September O1-Permine, Mathematics capacity measures. Efficiency drive is the new limit.

Reverse about the world? Sign up to enjoy our mind-dominated science coverage Bus scienceOur weekly customer-Kaval newspaper.

© 2025, The Economist Newspaper Limited. All rights reserved. From The Economist, published under license. The original material can be found on www.economist.com

Model (T) LLM Training Dataset (T) Gemini AI Logic (T) Open AI O1 Preview Comparison
#Forget #lamp #Big #language #models #cheap

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *