This is a Plain English Papers summary of a research paper called SocioVerse: LLM Agents Simulate 10M Real Users for Realistic Social Behavior. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.

Creating Virtual Societies: How SocioVerse Models Social Behavior with LLM Agents

Social scientists traditionally study human behavior through surveys, interviews, and observations—methods that can be costly, limited in scope, and raise ethical concerns. Social simulation offers an alternative approach, using computational agents to model how people act in various contexts. Recent advances in large language models (LLMs) have dramatically enhanced these simulations, but challenges remain in aligning them with real-world environments, users, interactions, and behaviors.

An illustration of the SocioVerse in the case of Ukraine issue. The alignment challenges are well handled regarding environment, user, scenario, and behavior.
The SocioVerse framework addresses four key alignment challenges in social simulation using the Ukraine issue as an example.

Researchers have developed SocioVerse to address these challenges—a comprehensive framework for social simulation powered by LLM agents and built upon a massive pool of 10 million real-world users. This system tackles four critical alignment questions:

  1. How to keep the simulated environment synchronized with the real world?
  2. How to make simulated agents precisely match target users?
  3. How to create consistent interaction mechanisms across different scenarios?
  4. How to ensure behavioral patterns match real-world groups?

The SocioVerse Framework: A Complete Social Simulation Pipeline

SocioVerse operates through four integrated components that work together to create realistic simulations of social dynamics.

An illustration of SocioVerse framework invovling 4 powerful parts. The social environment provides an updated context for the simulation. During the simulation, the behavior engine takes the simulation setting, user profiles, and social information from the scenario engine, user engine, and social environment, respectively, and generates the results according to the query.
The SocioVerse framework consists of four main components that work together to create realistic social simulations.

Creating a Dynamic Social Environment

The social environment component injects real-world context into simulations, improving realism and agent decision-making. It incorporates three types of information:

  • Social Structure: Demographic data, cultural norms, and collective behavior patterns that help agents act in alignment with typical characteristics of their assigned profiles
  • Social Dynamics: Time-sensitive content like news events and policy changes, maintained in an updated event database with timestamps
  • Personalized Context: Individual information feeds based on social networks and interests, pushing relevant content to agents

Building a Diverse User Pool with 10 Million Real Users

The user engine aligns simulated agents with real-world users, drawing from a massive pool of authentic digital footprints:

Source # Users # Posts
X 1,006,517 30,195,510
Rednote 9,158,404 40,963,735

Statistical summary of the 10M user pool showing the number of users and posts from different platforms.

This engine includes a comprehensive demographic annotation system that labels users across 15 dimensions including age, gender, occupation, income, education, and political views. The process combines multiple LLMs as initial annotators with human verification to ensure accurate labeling.

Designing Flexible Scenario Templates

The scenario engine creates various simulation structures based on specific task requirements. It offers four archetypal templates:

  • Questionnaire: One-to-many format for massive social investigations like election polls
  • In-depth Interview: One-to-one structure for exploring motivations through multiple interaction rounds
  • Behavior Experiment: Various formats for examining decision-making processes in controlled conditions
  • Social Media Interaction: Many-to-many structure for analyzing dynamic exchanges in online settings

These templates, designed with AgentSociety's principles in mind, standardize simulation components for better extensibility across different social contexts.

Modeling Realistic Behavior Patterns

The behavior engine integrates all other components to predict individual behaviors. It employs two complementary approaches:

  • Traditional Agent-Based Modeling: Rule-based and mathematical models that are computationally efficient for large populations
  • LLM-powered Agents: Language models that generate realistic user content through both non-parametric prompting (general LLMs) and parametric training (specialized models for complex profiles)

This dual approach, inspired by agentic society research, enables credible behavior simulation across diverse contexts.

Real-World Applications: Testing SocioVerse in Three Diverse Domains

To validate its effectiveness, researchers implemented three distinct social simulation scenarios through the SocioVerse framework.

Illustration of three scenarios representing (a) presidential election prediction, (b) breaking news feedback, and (c) national economic survey.
The SocioVerse framework was tested through three different scenarios spanning political, media, and economic domains.

Predicting U.S. Presidential Elections

This simulation analyzed methods for large-scale election prediction through the Electoral College framework. It modeled U.S. demographic diversity using Census Bureau and American National Election Studies data, incorporating 12 attributes including socioeconomic factors, geographic dimensions, and political preferences.

The researchers designed a comprehensive election questionnaire based on polls from media and research institutes, evaluating results through accuracy rate (proportion of correctly predicted states) and root mean square error (difference between simulated and actual vote shares).

Analyzing Public Response to ChatGPT

This scenario examined how technology-interested users respond to breaking news about ChatGPT. The researchers identified potential audience and ground truth sets from the Rednote user pool, designing a cognitive questionnaire based on the ABC attitude model (Affect, Behavior, Cognition) with a 5-point Likert scale.

The questionnaire measured six dimensions: public cognition, perceived risks, perceived benefits, trust, fairness, and public acceptance. Results were evaluated through normalized RMSE (point-wise differences) and KL-divergence (distribution comparison).

Modeling Economic Behavior in China

This simulation followed a national economic survey methodology from China's National Bureau of Statistics. The researchers sampled nationwide agents proportionally by region and generated income distributions based on regional averages.

The questionnaire covered eight spending categories (food, clothing, housing, daily necessities, transportation, education, healthcare, and others), with results evaluated against official statistics through NRMSE and KL-divergence measures.

Results: SocioVerse Achieves Realistic Social Simulations

The researchers tested SocioVerse using multiple powerful LLMs, including Llama-3-70b, Qwen2.5-72b, DeepSeek models, and GPT-4o variants. Each scenario was configured with different parameters:

Scenario # Agents # Demographics Type Sampling Source Language # Questions Ground truth
PresElectPredict 33,1836 12 label IPF X EN 49 real world
BreakNewsFeed 20,000 7 label IDS rednote ZH 18 calculated
NatEconSurvey 16,000 9 label+number IDS rednote ZH 17 real world

Detail settings of three simulation scenarios, where IPF and IDS denote iterative proportional fitting and identical distribution sampling methods.

The overall performance across all scenarios demonstrated SocioVerse's capabilities:

Model PresElectPredict BreakNewsFeed NatEconSurvey
Overall Battleground Overall Developed-region
Acc ↑ RMSE ↓ Acc ↑ RMSE ↓ KL-Div ↓ RMSE ↓ KL-Div ↓ RMSE ↓ KL-Div ↓ RMSE ↓
Llama3-70b 0.843 0.064 0.733 0.045 0.668 0.199 0.016 0.026 0.013 0.025
Qwen2.5-72b 0.922 0.037 0.800 0.031 0.113 0.059 0.066 0.048 0.043 0.039
DeepSeek-R1-671b \ \ 0.670 0.065 0.383 0.082 0.059 0.045 0.045 0.036
DeepSeek-V3 0.922 0.046 0.867 0.041 0.263 0.072 0.035 0.036 0.023 0.030
GPT-4o-mini \ \ 0.800 0.039 0.195 0.114 0.046 0.045 0.030 0.036
GPT-4o \ \ \ \ 0.196 0.055 0.062 0.051 0.036 0.038

Overall results of the three scenarios, where subset Battleground indicates battleground states in the U.S. presidential election and subset Developed-Region indicates top-10 developed regions in China in terms of GDP.

Key Findings from Presidential Election Predictions

In the election scenario, Qwen2.5-72b and DeepSeek-V3 achieved the highest accuracy (92.2%), correctly predicting over 90% of state voting results. The simulation was particularly effective at replicating real-world election outcomes through the winner-takes-all rule.

Ablation studies showed that prior distribution and real-world knowledge significantly improved prediction accuracy, especially in battleground states:

Model Acc ↑ RMSE ↓
Llama3-70b 0.733 0.045
- w/o Knowledge 0.533 0.051
- w/o Knowledge & Píror Distribution 0.600 0.386
Qwen2.5-72b 0.800 0.031
- w/o Knowledge 0.800 0.033
- w/o Knowledge & Píror Distribution 0.600 0.370
GPT-4o-mini 0.800 0.039
- w/o Knowledge 0.800 0.052
- w/o Knowledge & Píror Distribution 0.667 0.323

Results showing how prior distribution and real-world knowledge enhance prediction accuracy in battleground states.

Insights from Breaking News Response Analysis

The breaking news scenario revealed how accurately different models captured public attitudes toward ChatGPT.

An illustration of the performances of the breaking news feedback simulation, where PC, PR, PB, TR, FA, and PA denote six dimensions from the Likert scale (see §3.2 questionnaire design), with 1-point standing for totally disagree and 5-point for totally agree.
Comparison of simulated and real user responses to ChatGPT across six attitudinal dimensions.

Most models produced responses consistent with ground truth users, though Llama3-70b showed larger gaps. An interesting pattern emerged: all models generated more conservative simulated results than the real responses, highlighting potential biases in public opinion simulation.

Economic Simulation Performance

In the national economic survey, all models showed strong alignment with real-world statistics, with Llama3-70b demonstrating superior performance. Models performed better in developed regions than overall, suggesting they more accurately capture economic behavior in wealthier areas.

Detailed results across spending categories revealed interesting patterns:

Item Llama3-70b Qwen2.5-72b GPT-4o-mini GPT-4o DeepSeek-R1
Daily 0.007 0.009 0.006 0.010 0.009
Clothing 0.012 0.015 0.019 0.015 0.015
Transportation_Communication 0.016 0.020 0.027 0.023 0.017
Education_Entertainment 0.018 0.022 0.024 0.017 0.022
Medical 0.023 0.062 0.041 0.057 0.060
Food 0.037 0.031 0.031 0.040 0.032
Household 0.052 0.110 0.107 0.120 0.102
Others 0.008 0.008 0.010 0.005 0.009

Detailed results on the national economic survey simulation reported in NRMSE, where lower values indicate better performance.

All models performed best on daily necessities spending and worst on housing spending, suggesting LLMs have stronger capabilities in certain economic domains.

Bridging AI and Social Science: Future Directions

SocioVerse represents a significant advancement in social simulation powered by LLM agents, demonstrating that state-of-the-art language models can effectively simulate human responses in complex social contexts. The research identified several key patterns:

  1. Incorporating demographic distributions and users' historical experiences significantly improves simulation accuracy
  2. Under consistent measurement protocols, LLMs produce broadly similar simulations of human attitudes, though model-specific biases remain
  3. LLMs perform better in simple daily scenarios than in complex situations requiring contextual knowledge

The current implementation represents only part of the full SocioVerse vision. Future work could enhance each module for better collaboration:

  • Refining the social environment to inject more up-to-date knowledge
  • Expanding the scenario engine beyond surveys to interviews and free interactions
  • Optimizing LLMs to better accommodate minority groups and individuals with special needs
  • Developing autonomous planning modules to improve simulation credibility

Beyond technical improvements, SocioVerse offers social scientists a cost-effective tool for conducting experiments with minimal setup. This bridge between AI systems and traditional research could help analyze psychological and sociological theories, predict social impacts of policy changes, and explore long-term trends in virtual societies—ultimately creating a realistic mapping for understanding our complex social world.

Click here to read the full summary of this paper