This is a Plain English Papers summary of a research paper called SocioVerse: LLM Agents Simulate 10M Real Users for Realistic Social Behavior. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.
Creating Virtual Societies: How SocioVerse Models Social Behavior with LLM Agents
Social scientists traditionally study human behavior through surveys, interviews, and observations—methods that can be costly, limited in scope, and raise ethical concerns. Social simulation offers an alternative approach, using computational agents to model how people act in various contexts. Recent advances in large language models (LLMs) have dramatically enhanced these simulations, but challenges remain in aligning them with real-world environments, users, interactions, and behaviors.
The SocioVerse framework addresses four key alignment challenges in social simulation using the Ukraine issue as an example.
Researchers have developed SocioVerse to address these challenges—a comprehensive framework for social simulation powered by LLM agents and built upon a massive pool of 10 million real-world users. This system tackles four critical alignment questions:
- How to keep the simulated environment synchronized with the real world?
- How to make simulated agents precisely match target users?
- How to create consistent interaction mechanisms across different scenarios?
- How to ensure behavioral patterns match real-world groups?
The SocioVerse Framework: A Complete Social Simulation Pipeline
SocioVerse operates through four integrated components that work together to create realistic simulations of social dynamics.
The SocioVerse framework consists of four main components that work together to create realistic social simulations.
Creating a Dynamic Social Environment
The social environment component injects real-world context into simulations, improving realism and agent decision-making. It incorporates three types of information:
- Social Structure: Demographic data, cultural norms, and collective behavior patterns that help agents act in alignment with typical characteristics of their assigned profiles
- Social Dynamics: Time-sensitive content like news events and policy changes, maintained in an updated event database with timestamps
- Personalized Context: Individual information feeds based on social networks and interests, pushing relevant content to agents
Building a Diverse User Pool with 10 Million Real Users
The user engine aligns simulated agents with real-world users, drawing from a massive pool of authentic digital footprints:
Source | # Users | # Posts |
---|---|---|
X | 1,006,517 | 30,195,510 |
Rednote | 9,158,404 | 40,963,735 |
Statistical summary of the 10M user pool showing the number of users and posts from different platforms.
This engine includes a comprehensive demographic annotation system that labels users across 15 dimensions including age, gender, occupation, income, education, and political views. The process combines multiple LLMs as initial annotators with human verification to ensure accurate labeling.
Designing Flexible Scenario Templates
The scenario engine creates various simulation structures based on specific task requirements. It offers four archetypal templates:
- Questionnaire: One-to-many format for massive social investigations like election polls
- In-depth Interview: One-to-one structure for exploring motivations through multiple interaction rounds
- Behavior Experiment: Various formats for examining decision-making processes in controlled conditions
- Social Media Interaction: Many-to-many structure for analyzing dynamic exchanges in online settings
These templates, designed with AgentSociety's principles in mind, standardize simulation components for better extensibility across different social contexts.
Modeling Realistic Behavior Patterns
The behavior engine integrates all other components to predict individual behaviors. It employs two complementary approaches:
- Traditional Agent-Based Modeling: Rule-based and mathematical models that are computationally efficient for large populations
- LLM-powered Agents: Language models that generate realistic user content through both non-parametric prompting (general LLMs) and parametric training (specialized models for complex profiles)
This dual approach, inspired by agentic society research, enables credible behavior simulation across diverse contexts.
Real-World Applications: Testing SocioVerse in Three Diverse Domains
To validate its effectiveness, researchers implemented three distinct social simulation scenarios through the SocioVerse framework.
The SocioVerse framework was tested through three different scenarios spanning political, media, and economic domains.
Predicting U.S. Presidential Elections
This simulation analyzed methods for large-scale election prediction through the Electoral College framework. It modeled U.S. demographic diversity using Census Bureau and American National Election Studies data, incorporating 12 attributes including socioeconomic factors, geographic dimensions, and political preferences.
The researchers designed a comprehensive election questionnaire based on polls from media and research institutes, evaluating results through accuracy rate (proportion of correctly predicted states) and root mean square error (difference between simulated and actual vote shares).
Analyzing Public Response to ChatGPT
This scenario examined how technology-interested users respond to breaking news about ChatGPT. The researchers identified potential audience and ground truth sets from the Rednote user pool, designing a cognitive questionnaire based on the ABC attitude model (Affect, Behavior, Cognition) with a 5-point Likert scale.
The questionnaire measured six dimensions: public cognition, perceived risks, perceived benefits, trust, fairness, and public acceptance. Results were evaluated through normalized RMSE (point-wise differences) and KL-divergence (distribution comparison).
Modeling Economic Behavior in China
This simulation followed a national economic survey methodology from China's National Bureau of Statistics. The researchers sampled nationwide agents proportionally by region and generated income distributions based on regional averages.
The questionnaire covered eight spending categories (food, clothing, housing, daily necessities, transportation, education, healthcare, and others), with results evaluated against official statistics through NRMSE and KL-divergence measures.
Results: SocioVerse Achieves Realistic Social Simulations
The researchers tested SocioVerse using multiple powerful LLMs, including Llama-3-70b, Qwen2.5-72b, DeepSeek models, and GPT-4o variants. Each scenario was configured with different parameters:
Scenario | # Agents | # Demographics | Type | Sampling | Source | Language | # Questions | Ground truth |
---|---|---|---|---|---|---|---|---|
PresElectPredict | 33,1836 | 12 | label | IPF | X | EN | 49 | real world |
BreakNewsFeed | 20,000 | 7 | label | IDS | rednote | ZH | 18 | calculated |
NatEconSurvey | 16,000 | 9 | label+number | IDS | rednote | ZH | 17 | real world |
Detail settings of three simulation scenarios, where IPF and IDS denote iterative proportional fitting and identical distribution sampling methods.
The overall performance across all scenarios demonstrated SocioVerse's capabilities:
Model | PresElectPredict | BreakNewsFeed | NatEconSurvey | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Overall | Battleground | Overall | Developed-region | |||||||
Acc ↑ | RMSE ↓ | Acc ↑ | RMSE ↓ | KL-Div ↓ | RMSE ↓ | KL-Div ↓ | RMSE ↓ | KL-Div ↓ | RMSE ↓ | |
Llama3-70b | 0.843 | 0.064 | 0.733 | 0.045 | 0.668 | 0.199 | 0.016 | 0.026 | 0.013 | 0.025 |
Qwen2.5-72b | 0.922 | 0.037 | 0.800 | 0.031 | 0.113 | 0.059 | 0.066 | 0.048 | 0.043 | 0.039 |
DeepSeek-R1-671b | \ | \ | 0.670 | 0.065 | 0.383 | 0.082 | 0.059 | 0.045 | 0.045 | 0.036 |
DeepSeek-V3 | 0.922 | 0.046 | 0.867 | 0.041 | 0.263 | 0.072 | 0.035 | 0.036 | 0.023 | 0.030 |
GPT-4o-mini | \ | \ | 0.800 | 0.039 | 0.195 | 0.114 | 0.046 | 0.045 | 0.030 | 0.036 |
GPT-4o | \ | \ | \ | \ | 0.196 | 0.055 | 0.062 | 0.051 | 0.036 | 0.038 |
Overall results of the three scenarios, where subset Battleground indicates battleground states in the U.S. presidential election and subset Developed-Region indicates top-10 developed regions in China in terms of GDP.
Key Findings from Presidential Election Predictions
In the election scenario, Qwen2.5-72b and DeepSeek-V3 achieved the highest accuracy (92.2%), correctly predicting over 90% of state voting results. The simulation was particularly effective at replicating real-world election outcomes through the winner-takes-all rule.
Ablation studies showed that prior distribution and real-world knowledge significantly improved prediction accuracy, especially in battleground states:
Model | Acc ↑ | RMSE ↓ |
---|---|---|
Llama3-70b | 0.733 | 0.045 |
- w/o Knowledge | 0.533 | 0.051 |
- w/o Knowledge & Píror Distribution | 0.600 | 0.386 |
Qwen2.5-72b | 0.800 | 0.031 |
- w/o Knowledge | 0.800 | 0.033 |
- w/o Knowledge & Píror Distribution | 0.600 | 0.370 |
GPT-4o-mini | 0.800 | 0.039 |
- w/o Knowledge | 0.800 | 0.052 |
- w/o Knowledge & Píror Distribution | 0.667 | 0.323 |
Results showing how prior distribution and real-world knowledge enhance prediction accuracy in battleground states.
Insights from Breaking News Response Analysis
The breaking news scenario revealed how accurately different models captured public attitudes toward ChatGPT.
Comparison of simulated and real user responses to ChatGPT across six attitudinal dimensions.
Most models produced responses consistent with ground truth users, though Llama3-70b showed larger gaps. An interesting pattern emerged: all models generated more conservative simulated results than the real responses, highlighting potential biases in public opinion simulation.
Economic Simulation Performance
In the national economic survey, all models showed strong alignment with real-world statistics, with Llama3-70b demonstrating superior performance. Models performed better in developed regions than overall, suggesting they more accurately capture economic behavior in wealthier areas.
Detailed results across spending categories revealed interesting patterns:
Item | Llama3-70b | Qwen2.5-72b | GPT-4o-mini | GPT-4o | DeepSeek-R1 |
---|---|---|---|---|---|
Daily | 0.007 | 0.009 | 0.006 | 0.010 | 0.009 |
Clothing | 0.012 | 0.015 | 0.019 | 0.015 | 0.015 |
Transportation_Communication | 0.016 | 0.020 | 0.027 | 0.023 | 0.017 |
Education_Entertainment | 0.018 | 0.022 | 0.024 | 0.017 | 0.022 |
Medical | 0.023 | 0.062 | 0.041 | 0.057 | 0.060 |
Food | 0.037 | 0.031 | 0.031 | 0.040 | 0.032 |
Household | 0.052 | 0.110 | 0.107 | 0.120 | 0.102 |
Others | 0.008 | 0.008 | 0.010 | 0.005 | 0.009 |
Detailed results on the national economic survey simulation reported in NRMSE, where lower values indicate better performance.
All models performed best on daily necessities spending and worst on housing spending, suggesting LLMs have stronger capabilities in certain economic domains.
Bridging AI and Social Science: Future Directions
SocioVerse represents a significant advancement in social simulation powered by LLM agents, demonstrating that state-of-the-art language models can effectively simulate human responses in complex social contexts. The research identified several key patterns:
- Incorporating demographic distributions and users' historical experiences significantly improves simulation accuracy
- Under consistent measurement protocols, LLMs produce broadly similar simulations of human attitudes, though model-specific biases remain
- LLMs perform better in simple daily scenarios than in complex situations requiring contextual knowledge
The current implementation represents only part of the full SocioVerse vision. Future work could enhance each module for better collaboration:
- Refining the social environment to inject more up-to-date knowledge
- Expanding the scenario engine beyond surveys to interviews and free interactions
- Optimizing LLMs to better accommodate minority groups and individuals with special needs
- Developing autonomous planning modules to improve simulation credibility
Beyond technical improvements, SocioVerse offers social scientists a cost-effective tool for conducting experiments with minimal setup. This bridge between AI systems and traditional research could help analyze psychological and sociological theories, predict social impacts of policy changes, and explore long-term trends in virtual societies—ultimately creating a realistic mapping for understanding our complex social world.