TL;DR

A preprint argues that large language models (LLMs) are being compared to an underspecified notion of “human” performance. The authors report that LLM responses look like outliers relative to large cross-cultural datasets and most closely match people from WEIRD (Western, Educated, Industrialized, Rich, Democratic) populations, with similarity declining sharply as populations differ (r = -0.70).

What happened

A team of researchers published a preprint titled “Which Humans?” that examines how large language models (LLMs) compare to human psychological data across cultures. The paper, submitted September 22, 2023 and last edited June 20, 2024, reports that LLM outputs on psychological measures are outliers relative to large-scale cross-cultural datasets. The authors find that LLMs’ performance on cognitive psychological tasks most closely resembles that of people from WEIRD societies, and that similarity declines rapidly as sampled human populations move away from WEIRD characteristics (reported correlation r = -0.70). The preprint notes scientific and ethical concerns stemming from treating LLMs as representative of “human” behavior without accounting for global psychological diversity. The paper includes public data and a preregistration and is shared under a CC-BY 4.0 license; the authors also discuss potential ways to mitigate WEIRD bias in future generative models.

Why it matters

  • Benchmarking LLMs against an undefined or narrow human baseline risks overstating their generality.
  • If model behavior primarily reflects WEIRD populations, tools built on those models may misrepresent or marginalize non-WEIRD perspectives.
  • Scientific claims about machine ‘human-likeness’ may be biased if cross-cultural psychological variation is ignored.
  • Policy, ethics, and deployment decisions that assume universal human norms could produce unfair outcomes across diverse populations.

Key facts

  • Preprint title: “Which Humans?”; authors: Mohammad Atari, Mona J. Xue, Peter S. Park, Damián Blasi, Joseph Henrich.
  • Submitted to PsyArXiv on September 22, 2023; last edited June 20, 2024.
  • Main empirical claim: LLM responses are outliers compared with large-scale cross-cultural data.
  • Reported pattern: LLM performance most closely resembles WEIRD populations and declines as populations diverge (correlation r = -0.70).
  • Authors highlight scientific and ethical issues from ignoring cross-cultural diversity in human and machine psychology.
  • The preprint links to public data and a public preregistration.
  • Document is available under a CC-BY Attribution 4.0 International license.
  • Preprint metrics on the hosting page show tens of thousands of views and over ten thousand downloads (views: 42,775; downloads: 10,058).

What to watch next

  • Authors’ proposed mitigation strategies for WEIRD bias in future generative models (discussion present in the preprint, details not confirmed in the source).
  • Which specific LLM architectures, training corpora, and psychological tasks were evaluated — not confirmed in the source.
  • Follow-up studies that test additional non-WEIRD populations and replicate the reported correlation — not confirmed in the source.
  • Whether the preprint undergoes peer review and what changes might follow — not confirmed in the source.

Quick glossary

  • Large language model (LLM): A neural network trained on large amounts of text to generate and analyze natural language; examples include transformer-based models.
  • WEIRD: An acronym for Western, Educated, Industrialized, Rich, and Democratic — used to describe many study samples that may not represent global diversity.
  • Cross-cultural data: Empirical measurements collected from multiple cultural, geographic, or demographic groups to assess variation across populations.
  • Preprint: A research manuscript shared publicly before formal peer review in a journal; intended to accelerate dissemination and feedback.
  • CC-BY 4.0: A Creative Commons license that allows sharing and adaptation of the work with attribution to the original authors.

Reader FAQ

Which specific LLMs did the study evaluate?
Not confirmed in the source.

What does the reported correlation r = -0.70 mean here?
It indicates a strong negative relationship: as sampled human populations diverge from WEIRD characteristics, similarity to LLM responses declines sharply.

Do the authors report any conflicts of interest?
The authors asserted no conflict of interest in the preprint.

Is the study’s data and preregistration available?
The preprint page lists public data and a public preregistration linked from the manuscript.

Has this work been peer-reviewed?
Not confirmed in the source.

/ Preprints / PsyArXiv / 5b26t_v1 Notice: This website relies on cookies to help provide a better user experience. By clicking accept or continuing to use the site, you consent…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *