Set as Homepage - Add to Favorites

【Taste of Future Sister-in-law】

Source：Information Information Network Editor：Entertainment Time：2025-06-26 03:52:00

DeepSeek has released a new paper,Taste of Future Sister-in-law with co-founder Liang Wenfeng credited as a contributor, detailing how its latest large language model DeepSeek-V3 achieves efficient training and inference using only 2,048 H800 GPUs – significantly fewer than the tens of thousands typically required. The team attributes this efficiency to four key innovations: memory optimization through multi-head latent attention (MLA), computational savings via a Mixture-of-Experts (MoE) design with FP8 precision, communication improvements using a multi-plane network topology, and faster inference through multi-token prediction (MTP). With MLA, KV cache memory usage is cut to just 70KB per token, up to 1/7 that of competing models. MoE architecture activates only 37 billion of the model’s 671 billion parameters per forward pass, reducing training costs by 90% compared to dense models. FP8 training further halves compute and memory usage, with minimal accuracy tradeoff. Beyond the model, the paper also outlines five future directions for AI hardware design, advocating for tighter integration between software and hardware to address memory, compute, and networking bottlenecks. [36Kr, in Chinese]

1
2
3
4
5
6
7
8
9
10
11

Previous：SpaceX's Starlink satellite launch in pictures

Next：Keeping Hope Alive

Related Articles

Related Recommendations

Categories

Latest Articles

Popular Articles

Hot Recommendations

Featured Column

Quick Links

I need Twitter spoilers to get through big, live sports games Wordle today: Here's the answer, hints for June 9 'Severance' cinematographer Jessica Lee Gagné on working with Ben Stiller and more The most popular radio show in the UK is getting its first ever female host The 14 best short 10 biggest moments of activism in 2022 Bored Ape Yacht Club hacked again, loses $360,000 in NFTs Everything Apple revealed at its WWDC 2022 event 'Large Adult Son' has perfect response to mom's embarrassing #HimToo tweet Wordle today: Here's the answer, hints for June 9 Internet steps up to help YouTuber who had $18,000 in Lego stolen Internet steps up to help YouTuber who had $18,000 in Lego stolen Can someone explain this clueless senator's random selfie?Wordle today: Here's the answer, hints for June 8 A Twitter convo about self Seth Green paid nearly $300,000 to get his stolen Bored Ape back Apple Store goes all 'hint hint' ahead of WWDC 2022 Facebook will no longer produce Portal devices, report says Apple WWDC 2022: 'Daddy' Craig Federighi is the real MVP What are NFTs? Everything you need to know. Apple's 2018 MacBook Pro can't recover data if the logic board fails Dude uses Facebook to prank the internet into thinking he can see the future Samsung has an unbreakable smartphone screen, or so it claims 10 years later, a look back at 'Step Brothers' A buried lake may have been found on Mars. What does it mean for life? Schools across the UK are introducing gender neutral uniforms 'Where does this stop?' Obama blasts proposed Muslim ban following Orlando attack 'Rick and Morty' co Woman casually gets her eyebrows tinted while awaiting her C The best ambient noise websites for when you need a breather HBO confirmed the 'Deadwood' movie again, but it's for real this time Walk, bike, take a train, call a car Donald Trump just revoked press credentials for the Washington Post Orlando attack is a 'wake Sean Penn wanted to name his son Steak, which seems about right Cord cutters increase to 32 percent in 2018, pay TV declines Donald Trump has jumped on the 'shadow ban' Twitter conspiracy theory Gigi Hadid shares sweet note supporting Zayn as he struggles with anxiety Twitter tightens developer policies to fight spam before it starts Muslim American man shares both blood and patriotism in wake of Orlando attack

2.0903s , 8183.1328125 kb

Copyright © 2025 Powered by 【Taste of Future Sister-in-law】,Information Information Network

Top