ChatGPT and Local Inference: Running Neural Engines on CPUs and GPUs

Official link to the ChatGPT in the App Store, as there are many clones.

I wonder if there will be a point where they will also use Neural Engines in iPhones and iPads. At 175 billion parameters (175B), ChatGPT 3.5 is still much too big to run locally, but it goes very fast with the smaller 7B, 13B and 30B models. So it has call.cppwhich runs LLMs locally on CPUs and GPUs, inference on Metal GPUs got it workingand becomes worked hard to further improvements.

The biggest limitation will be working memory. For a 13B model, at least 8GB is needed, and for a 30B model, 16GB (and then little else is allowed to run). This will therefore also be reserved for iPads for the time being (since even the iPhone Pros only have 6 GB of RAM).

The new Macs with 96, 128 or even 192 GB of unified memory can already run much larger models, such as LLaMA 65B and all variants, Falcon 40B and the new 104B InternLM.

On the r/LocalLLaMA subreddit has a lot on this too.

[Reactie gewijzigd door Balance op 8 juni 2023 18:05]

2023-06-08 15:42:29
#ChatGPT #app #iOS #Siri #integration #support #iPads

ChatGPT and Local Inference: Running Neural Engines on CPUs and GPUs

Related posts:

New Patch for Cyberpunk 2077 Introduces More Problems: Game Won't Start and Icons Disappear

MSI CLAW Handheld Console: High-Performance Hardware Expected at CES 2024

Blockchain gaming redefined: This is how Off the Grid excites gamers

the tip for sending ephemeral photos and videos!

Related

Join the Tet Bosa Tesla Lottery Draw for a chance to Win a Tesla Model 3 SR+

Futuristic Style Transformation: Zhanna Dubska by Fashion Artist Kasher – Season 2 Finale of Style Patrol

Leave a Comment Cancel reply