Home » Technology » ChatGPT and Local Inference: Running Neural Engines on CPUs and GPUs

ChatGPT and Local Inference: Running Neural Engines on CPUs and GPUs

Official link to the ChatGPT in the App Store, as there are many clones.

I wonder if there will be a point where they will also use Neural Engines in iPhones and iPads. At 175 billion parameters (175B), ChatGPT 3.5 is still much too big to run locally, but it goes very fast with the smaller 7B, 13B and 30B models. So it has call.cppwhich runs LLMs locally on CPUs and GPUs, inference on Metal GPUs got it workingand becomes worked hard to further improvements.

The biggest limitation will be working memory. For a 13B model, at least 8GB is needed, and for a 30B model, 16GB (and then little else is allowed to run). This will therefore also be reserved for iPads for the time being (since even the iPhone Pros only have 6 GB of RAM).

The new Macs with 96, 128 or even 192 GB of unified memory can already run much larger models, such as LLaMA 65B and all variants, Falcon 40B and the new 104B InternLM.

On the r/LocalLLaMA subreddit has a lot on this too.

[Reactie gewijzigd door Balance op 8 juni 2023 18:05]

2023-06-08 15:42:29
#ChatGPT #app #iOS #Siri #integration #support #iPads

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.