The large-scale board game experience officer is here! Not only can it quickly provide evaluations and suggestions, but it can also simulate the differences in experience among different types of players.
Recently, a research team from Shanda Tokyo Research Institute, Shanghai Innovation Academy, Nankai University, and Shanghai Artificial Intelligence Laboratory jointly proposed MeepleLM , the first virtual trial model that can simulate the perspective of a real player and provide constructive criticism based on dynamic game experience.
To mitigate the "floating" feeling of AI evaluations, the research team constructed a dedicated dataset containing 1,727 structured board game rulebooks and 150,000 real player reviews, establishing a mapping relationship from "objective rules" to "subjective experience".
Building on this foundation, the team introduced the classic MDA (Mechanism-Dynamic-Aesthetic) game design theory to construct the core of the reasoning, enabling the model to transcend static text, deduce dynamic interactions during game execution, and further extract five typical player profiles from the evaluation data, allowing the AI to internalize specific preferences to simulate the real feeling of "a thousand faces for a thousand people".
Experiments show that MeepleLM is significantly better than general models such as GPT-5.1 and Gemini3-Pro in terms of accuracy in reproducing player reputation and rating distribution.
The "blind box" dilemma in board game design
The board game industry is experiencing rapid growth, but its design process still faces significant challenges. Unlike video games, the board game experience relies heavily on social interaction between players and the emergent effect of rules .
Traditional design processes rely heavily on human playtesting, which is not only time-consuming and labor-intensive but also struggles to cover the preferences of all types of players. While existing general large model (LLM) can understand text, they often lack a deep understanding of how game mechanics translate into emotional experiences. The suggestions they generate are usually ambiguous "statements" or simply restatements of the rules, failing to provide profound insights based on different player perspectives.
To break this deadlock, the research team proposed MeepleLM , a virtual tester that can not only understand the rules but also "simulate human nature".
△
Teaching AI to think like a designer
MeepleLM's core breakthrough lies in the fact that it does not treat evaluation as a simple text generation task, but instead constructs a cognitive link from objective rules to subjective experience.
1. High-quality professional datasets
The team first selected 1,727 representative games covering different levels of complexity and years using a stratified sampling strategy, transforming the unstructured PDF rulebooks into structured documents. This resulted in a dataset containing 1,727 structured rulebooks and 150,000 high-quality comments.
Meanwhile, for the massive 1.8 million comments, the team designed an automated processing flow that includes hard filtering, MDA scoring, and semantic dimension recognition. Ultimately , about 8% of the high-quality corpus that can be deeply associated with "game mechanics" and "dynamic experience" was selected to ensure that the model learns true "experience insights".
△
2. MDA Cognitive Chain (Chain-of-Thought)
To help the model understand the causes of "fun," MeepleLM introduces the classic game design framework MDA (Mechanics-Dynamics-Aesthetics) as a thought process:
Mechanics : What are the rules in the game? (TheWhat)
Dynamics : What interactions occur during rule execution? (The How)
Aesthetics : What emotional experience does this interaction bring to players? (TheFeel)
Through this explicit reasoning path, the model is no longer guessing, but rather logically deriving the experience results.
3. Five Player Profiles
"One man's meat is another man's poison." Different players react drastically to the same mechanic. Through cluster analysis, the research team identified five typical data-driven player profiles:
The System Purist: Seeks the ultimate balance and logic, and despises randomness.
The Efficiency Essentialist: They prioritize a smooth workflow and dislike tedious operations.
The Narrative Architect: Immersive storytelling and a sense of presence, with mechanics serving the theme.
The Social Lubricator: Plays games for social interaction; enjoys banter and interaction.
The Thrill Seeker: The thrill of high risk and high reward, the enjoyment of dice.
MeepleLM is able to “role-play” these specific profiles, thereby providing diverse feedback with specific preferences.
△
Virtual reviewers who understand players better
To verify the results, the research team conducted extensive testing on 207 games (including new releases in 2024-2025).
△
1. Macro-level score alignment:
General-purpose models (such as GPT-5.1) often act like a smooth "nice guy," tending to give a safe score of 7-10. MeepleLM overcomes this "positive bias," meaning it can not only identify strengths but also keenly capture the fatal flaws that cause players to quit, accurately reflecting the polarized evaluation patterns in the real community.
△
2. Micro-level evaluation quality:
In generating commentary, MeepleLM balances factual accuracy with diversity of perspectives. As shown in Figure 6, regarding the reviews of *One Night of the Ultimate Werewolf*, Qwen3-8B employs a generic, exaggeratedly sentimental tone ("tragic drama"), while GPT-5.1 sounds like a detached journalist ("social media savvy lubricant"). However, MeepleLM authentically captures the unique voices of each character.
The model can seamlessly switch to community slang (e.g., "alpha player") in social contexts and switch to technical commentary (e.g., "variant rules") when facing purists, proving that it is not just retrieving knowledge, but truly simulating the player's perspective.
△
3. Practical value:
By extracting real opinions from historical comments and then semantically matching them with simulated comments generated by the model, the results show that MeepleLM has the highest Op-Rec, proving its practical value in predicting market feedback and presenting diverse player opinions.
In a blind A/B test involving 10 different types of players, MeepleLM significantly outperformed GPT-5.1 in dimensions such as authenticity and decision confidence. More than 70% of users tended to use MeepleLM as a reference for purchasing decisions, saying it "didn't feel like marketing rhetoric" and was more effective in identifying potential design flaws.
A New Paradigm for Evaluating Interactive Systems
By connecting static rules with dynamic experiences, MeepleLM establishes a new paradigm for automated virtual testing of general interactive systems:
It can accelerate design iteration based on anticipated market feedback and help players make personalized choices. This paves the way for "experience-aware" human-computer collaboration, enabling models to gradually evolve from simple functional tools into empathetic partners capable of understanding the subjective feelings of the audience.
Paper Title:
MeepleLM:A Virtual Playtester Simulating Diverse Subjective Experiences
Paper link:
https://arxiv.org/abs/2601.07251
Project link:
https://github.com/leroy9472/MeepleLM
First author:
Zizhen Li (Shanda AI Research Tokyo/Nankai University)
Corresponding author:
Kaipeng Zhang(Shanda AI Research Tokyo)
This article is from the WeChat public account "Quantum Bit" , authored by the MeepleLM team, and published with authorization from 36Kr.





