DeepSeek has released a new paper,Taste of Future Sister-in-law with co-founder Liang Wenfeng credited as a contributor, detailing how its latest large language model DeepSeek-V3 achieves efficient training and inference using only 2,048 H800 GPUs – significantly fewer than the tens of thousands typically required. The team attributes this efficiency to four key innovations: memory optimization through multi-head latent attention (MLA), computational savings via a Mixture-of-Experts (MoE) design with FP8 precision, communication improvements using a multi-plane network topology, and faster inference through multi-token prediction (MTP). With MLA, KV cache memory usage is cut to just 70KB per token, up to 1/7 that of competing models. MoE architecture activates only 37 billion of the model’s 671 billion parameters per forward pass, reducing training costs by 90% compared to dense models. FP8 training further halves compute and memory usage, with minimal accuracy tradeoff. Beyond the model, the paper also outlines five future directions for AI hardware design, advocating for tighter integration between software and hardware to address memory, compute, and networking bottlenecks. [36Kr, in Chinese]
I need Twitter spoilers to get through big, live sports gamesWordle today: Here's the answer, hints for June 9'Severance' cinematographer Jessica Lee Gagné on working with Ben Stiller and moreThe most popular radio show in the UK is getting its first ever female hostThe 14 best short10 biggest moments of activism in 2022Bored Ape Yacht Club hacked again, loses $360,000 in NFTsEverything Apple revealed at its WWDC 2022 event'Large Adult Son' has perfect response to mom's embarrassing #HimToo tweetWordle today: Here's the answer, hints for June 9Internet steps up to help YouTuber who had $18,000 in Lego stolenInternet steps up to help YouTuber who had $18,000 in Lego stolenCan someone explain this clueless senator's random selfie?Wordle today: Here's the answer, hints for June 8A Twitter convo about selfSeth Green paid nearly $300,000 to get his stolen Bored Ape backApple Store goes all 'hint hint' ahead of WWDC 2022Facebook will no longer produce Portal devices, report saysApple WWDC 2022: 'Daddy' Craig Federighi is the real MVPWhat are NFTs? Everything you need to know. Apple's 2018 MacBook Pro can't recover data if the logic board fails Dude uses Facebook to prank the internet into thinking he can see the future Samsung has an unbreakable smartphone screen, or so it claims 10 years later, a look back at 'Step Brothers' A buried lake may have been found on Mars. What does it mean for life? Schools across the UK are introducing gender neutral uniforms 'Where does this stop?' Obama blasts proposed Muslim ban following Orlando attack 'Rick and Morty' co Woman casually gets her eyebrows tinted while awaiting her C The best ambient noise websites for when you need a breather HBO confirmed the 'Deadwood' movie again, but it's for real this time Walk, bike, take a train, call a car Donald Trump just revoked press credentials for the Washington Post Orlando attack is a 'wake Sean Penn wanted to name his son Steak, which seems about right Cord cutters increase to 32 percent in 2018, pay TV declines Donald Trump has jumped on the 'shadow ban' Twitter conspiracy theory Gigi Hadid shares sweet note supporting Zayn as he struggles with anxiety Twitter tightens developer policies to fight spam before it starts Muslim American man shares both blood and patriotism in wake of Orlando attack
2.0903s , 8183.1328125 kb
Copyright © 2025 Powered by 【Taste of Future Sister-in-law】,Information Information Network