Techonology

How did Deepsek build his AI with less money?

Last month, US Financial Markets said after a Chinese start-up called Deepsek that it had manufactured one of the world’s most powerful artificial intelligence systems, using lesser computer chips than many experts.

AI companies usually train their chatbott using a supercomputer packed with 16,000 special chips or more. But Deepsek said that it only needs 2,000.

Dipsek engineers expanded into one Research paper Published just after Christmas, Start-up used several technical tricks to reduce the cost of manufacturing its system. Its engineers required only $ 6 million in raw computing power, which Meta spent in the manufacture of its latest AI technology.

What exactly did Dipsek do? There is a guide here.

Major AI technologies are based on what scientific nervous networks, mathematical systems, who learn their skills by analyzing huge amounts of data.

The most powerful systems analyze months about many images, sounds and other multimedia along with all English texts on the Internet. This requires huge quantity of computing power.

About 15 years ago, AI researchers felt that graphics processing units, or special computer chips called JPU were an effective way to do such data analysis. Companies such as Silicon Valley Chipmekar Nvidia originally designed these chips to present graphics for computer video games. But the GPU also had a habit to run mathematics that operated the nerve network.

As companies pack more GPU in their computer data centers, their AI systems can analyze more data.

But the best GPU costs approximately $ 40,000, and they require huge amounts of electricity. Sending data between chips can be used more electric power than running the chips itself.

It did many things. Most especially, it adopted a method called “a mixture of experts”.

Companies usually created a single nerve network, which learned all the patterns in all data on the Internet. This was expensive, as it required huge amounts of data to travel between GPU chips.

If one chip was learning to write a poem and the other was learning to write a computer program, then they still needed to talk to each other, just in the case there was some overlap between poetry and programming.

With a mixture of the method of experts, researchers tried to solve this problem by dividing the system into multiple nerve networks: for a poem, for a computer programming, for a biology, for physics, one and so much for physics. But. These small “experts” systems can be 100. Each specialist can focus on his particular area.

Many companies have fought with this method, but Deepsek was able to do it well. Its trick was to connect those small “experts” systems with the “generalist” system.

Experts still needed to trade some information with each other, and the generalist – which did not have a decent but detailed understanding of each subject – can help coordinate interactions between experts.

It is like an editor who oversees the news room filled with expert correspondents.

Too much. But this is not the only thing by Deepsek. It also mastery in a simple trick that includes decimal, which can miss the mathematics class of their primary school.

Remember your mathematics teacher by explaining the concept of PI. Pie, also shown as π, is a number that never ends: 3.14159265358979…

You can use to to use useful calculations, such as determining the circumference of a circle. When you do those calculations, you shorten the only decimal: 3.14. If you use this simple number, you get a very good estimate of the circumference of a circle.

Deepsek did something similar – but on a very large scale – to train his AI technology.

Mathematics that allows a nerve network to identify patterns in the text, actually only multiplication – lots and lots and lots multiplication. We are talking about months of multiplication in thousands of computer chips.

Typically, chips multiply numbers that fit 16 bits memory. But Deepsek squeezed each number in only 8 bits memory – half the place. In short, it lodged several decimal from each number.

This meant that each calculation was less accurate. But it did not make any difference. The calculation was actually sufficient to produce powerful nerve network.

Well, he added another trick.

After squeezing each number into 8 bits memory, Deepsek took a different route while multiplying those numbers simultaneously. When determining the answer to each multiplication problem – making an important calculation that will help decide how the nerve network will operate – it increased the answer in 32 bits memory. In other words, it kept many more decimal. This made the answer more accurate.

Not good. Deepsek engineers showed in their paper that they were very good in writing very complex computer codes which tells the GPU what to do. They knew how to squeeze even more efficiency with these chips.

Some people have that kind of skill. But in Gambhir AI Labs, talented engineers need to be matched which Deepsek has done.

Some AI labs may already use at least some similar tricks. Companies like Openai do not always explain what they are doing behind closed doors.

But other people were clearly surprised by the work of Deepsek. What started by start-up is not easy. The experiment required to find such success involves millions of dollars – if not billions – in electric power.

In other words, it requires huge amounts of risk.

“You have to invest a lot of money on the line to try new things – and often, they fail,” said Tim Keptmers, a researcher at Allen Institute for Artificial Intelligence in Seattle – and often, they fail. ” An AI researcher in Meta.

He said, “This is why we do not see much innovation: people are afraid of losing many millions of people, which does not work,” he said.

Many pundits reported that the $ 6 million of Dipsek only covered the start-ups while training the final version of the system. In his paper, Deepsek engineers said that they had spent additional funds on research and experiment before running the final training. But the same is true about any state -of -the -art AI project.

Deepsek used, and paid it. Now, because the Chinese Start-up shared their ways with other AI researchers, its technical tricks are ready to significantly reduce the cost of building AI.

,
#Deepsek #build #money

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *