LLM Colosseum: Small Model Wins in "Quick Fight" Battle - Overseas Experiment Ranks 14 Large Language Models

There are currently many LLM large language models on the Internet. As far as AI chatbots are concerned, the more training data, the more powerful they will basically be. However, this is not the case when applied to fighting games. Recently, some people abroad have compared LLM with “Quick Fighting Tornado” Combined with the game, 14 large language models were tested, and the final winners were all small models.

Overseas, LLM is combined with the game “Quick Fight” to compare 14 large language models to see who is the strongest.

Advertisement (Please continue reading this article)

This open source project is called LLM Colosseum, developed by Stan Girard and Quivr Brain. According to the introduction, this game runs in an emulator, allowing LLM to operate the characters in the game and compete (the character is limited to Ken), and everyone can Download and install this project to test it yourself.

Amazon employee Banjo Obayomi shared an article a few days ago about the results of his use of this project to test 14 LLMs. The content also detailed how LLM controls the characters in the game “Tornado”. LLM will continuously read the current state of the game, such as character position, health and scores. These data will be translated into a prompt, such as actions that can be taken and recommended strategies, to facilitate LLM’s understanding and use:

Advertisement (Please continue reading this article)

After receiving this prompt, LLM will analyze the current game status and decide the next action, convert it into game instructions, and implement them in the simulator, such as approaching, retreating, wave fist, and rising dragon fist. For details, please refer to the video below:

From the video shared by Matthew Berman, a well-known foreign YouTube channel, you can see a relatively complete duel. On the left is the MISTRAL SMALL model, and on the right is the MISTRAL MEDIUM model. The two models fight quite smoothly, but there are some details to pay attention to. These Both models seem to have no so-called defensive actions, just movement and attack. If it were a fight with humans, no surprise humans would win easily:

Anyway, this is a battle between LLMs, and MISTRAL SMALL wins in the end, the small model is stronger than the big model. It can be seen that unlike AI chat, fighting games value speed and reaction most, and LLM small models usually have lower latency and speed.

Matthew Berman In the second half of the video, there are instructional steps for installing the LLM Colosseum project. It is recommended for those who want to play around with it themselves.

Among the 14 large language models tested by Banjo Obayomi, the final winner was claude_3_haiku, with a total of 314 games. He also found that small models have lower latency, faster reaction times and more movements in each game, so it is not surprising that Anthropic’s Claude won the front position:

However, although LLM is very smart, it is not without its shortcomings. Sometimes there will be some special situations, such as “hallucination” and “refuse to play”. In addition, each LLM also has its own unique play style. Some like aggressive attacks, while others adopt more defensive counterattacks. There are even spam attacks that repeatedly send the same actions:

Source: Banjo Obayomi

Hubble captures an open star cluster in a nearby satellite galaxy

Used Bike Components for Sale in Sweden - Saddle Ritchey, Kona HeiHei AL, Specialized Power Comp, an...

Banishers: Ghosts of New Eden Combat System Trailer Revealed: Parries, Counterattacks, Dodge, and Ra...

The price of the latest iPhone XR in July 2022 and the complete specifications of the iPhone XR whic...