Home » Business » Biren Technology BR100, China’s initially multi-intent GPU chip with 77 billion transistors, debuts abroad – Hardware – cnBeta.COM

Biren Technology BR100, China’s initially multi-intent GPU chip with 77 billion transistors, debuts abroad – Hardware – cnBeta.COM

On August 9, Birentech, a countrywide engineering innovation organization, officially released the BR100 series of common-reason computing GPUs, professing to be the 1st in conditions of computing power in China, with multidirectional indicators comparable or even outstanding to international main goods. On August 22, area time, the initially working day of the 34th Sizzling Chips Meeting, the multi-goal GPUs from NVIDIA Hopper, AMD Intuition MI200 and Intel Ponte Vecchio flexed their muscle tissue one immediately after an additional and Biren technology appeared side by facet with their BR100.

obtain:

Ali cloud server unique assortment: 1 main 1G cloud server starting at .9 yuan / month

At the assembly, Hongzhou, co-founder and CTO of Biren Technology, and Xu Lingjie, co-founder and president of Biren Technology, gave a speech titled “Biren BR100 GPGPU: Accelerating Datacenter Scale AI Computing”, introducing the skilled audience of all all around the environment BR100 chip options and first chip architecture specifics.

Biren Technology BR100, China’s initially multi-intent GPU chip with 77 billion transistors, debuts abroad – Hardware – cnBeta.COM

In accordance to the introduction,As a GPGPU chip primarily applied to speed up generic details heart scale processing, BR100 has very high processing electrical power density.Single board 16-little bit floating position computing energy reaches the PFLOPS degree and has an advert chip. superior pace and off-chip interconnect bandwidth.

BR100 adopts 7nm method know-how, compact chip chiplet design and style and CoWoS 2.5D packaging technology.Distributed as an OAM module, it can type a comprehensive 8-board position-to-issue interconnect topology on a basic UBB motherboard.

To assist highly effective computing energy, the BR100 is geared up with over 300MB of on-chip cache for temporary storage and reuse of details and 64GB of HBM2E high-pace memory.

Its major processing device is comprised of a large range of common-intent stream processors, with computing energy committed to tensor acceleration focused to typical-function processing and the 2.5D GEMM architecture.

At the original architecture stage, Biren Technological know-how presents a set of highly developed details stream capabilities centered on the computational qualities of generic workloads these kinds of as deep finding out, such as specific C-Warp collaborative concurrency mode, TDA tensor details access, NUMA / UMA accessibility storage mode, quasi storage computing, etcetera. These features are the essential to the BR100’s means to reach the earth-leading level in phrases of computing energy and power efficiency ratio.

In addition, Biren technologies was also releasedA new TF32 + details type with higher precision than the TF32 facts form.

In conditions of application, Biren Engineering has also launched the BIRENSUPATM software package stack. Its principal programming product has a C / C ++ programming interface and runtime API, and its type is very similar to the common GPGPU progress language and paradigm of programming.

It lets developers to program and build really effortlessly on the BR100, substantially reducing the workload of code migration, enabling a seamless migration from common programming environments to the Birensupa system.

According to the knowledge, Biren Technologies BR100 integrates up to 77 billion transistors, a scale comparable to nerve cells in the human mind. It is very shut to the NVIDIA GH100 processing core with 80 billion transistors and the BR100 series chips are successfully run on at the time!

In terms of effectiveness, integer calculation INT8 2048 utmost (2048 trillion situations for every second), floating point calculation BF16 1024 TFlop (1024 trillion moments for each next), floating level calculation TF32 + 512 TFlop (512 trillion instances for each second) 2nd), FP32 double precision Variable issue 256 TFlops (256 teraflops).

In addition, its external IO bandwidth reaches 2.3TB / s, supports 64 encoding channels, 512 decoding channels, and also supports PCIe 5. and CXL interconnect protocols.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.