TL;DR

In a January 2026 Substack post, Gary Marcus argues generative AI is underperforming: large language models remain untrustworthy, rely heavily on memorization, and deliver limited measurable economic value. Marcus cautions that continued scaling looks unlikely to fix these core problems and warns against reorganizing policy and industry expectations around optimistic promises.

What happened

Gary Marcus published a January 12, 2026 newsletter sampling recent reporting and studies that, in his view, show generative AI underperforming against claims made by its proponents. He highlights assertions that large language models (LLMs) are still unreliable, that much of their output reflects memorization rather than genuine capability, and that some prominent voices — Marcus says including Geoffrey Hinton — were on the wrong side of debates about these limitations. Marcus also cites analyses suggesting generative AI has produced little measurable economic benefit to date; he adds an update referencing the Remote Labor Index as reported by the Washington Post, which estimated AI could perform roughly 2.5% of jobs. He warns that continued scaling of current architectures is unlikely to resolve these problems and argues it would be unwise to base economic or geopolitical strategy on optimistic, unproven expectations for rapid improvement.

Why it matters

  • Policy makers and businesses risk misallocating resources if they assume near-term breakthroughs will materialize.
  • Trust and reliability issues in LLMs constrain their safe deployment in high-stakes contexts.
  • If AI delivers limited quantifiable productivity gains, projected economic impacts and labor forecasts may need reassessment.
  • Relying on scaling alone could prolong investment in approaches that do not address core limitations.

Key facts

  • Author: Gary Marcus, Substack post dated January 12, 2026.
  • Central claim: Generative AI is 'not going all that well,' according to Marcus's newsletter sampling.
  • Marcus states LLMs remain untrustworthy and that a large fraction of their behavior is memorization.
  • Marcus says Geoffrey Hinton 'was on the wrong side' of an argument about memorization versus capability.
  • Marcus cites reporting and studies suggesting limited quantifiable value added by current models.
  • Update in the post references the Remote Labor Index finding, as reported by the Washington Post, that AI could perform about 2.5% of jobs.
  • Marcus argues that further scaling of current approaches is not likely to cure these problems.
  • The post drew engagement on the platform (the page shows likes, restacks and comments) and generated discussion about overpromising by AI creators.

What to watch next

  • Further reporting and peer-reviewed studies on the degree to which LLM outputs reflect memorization versus generalization.
  • New analyses of economic impact and labor displacement metrics, including follow-ups to the Remote Labor Index finding.
  • Policy proposals or industry strategies that explicitly tie economic or geopolitical planning to near-term AI performance — not confirmed in the source.
  • Announcements from major AI labs about architectural changes aimed at addressing reasoning and reliability limitations — not confirmed in the source.

Quick glossary

  • Generative AI: A class of machine learning models that produce new content — text, images, audio or code — based on learned patterns from training data.
  • Large Language Model (LLM): A type of neural network trained on large text corpora to predict or generate language; examples include models that power many text-generation systems.
  • Memorization: In machine learning, when a model reproduces or closely echoes training data rather than producing novel, generalized solutions.
  • Scaling: The process of increasing model size, training data, or compute resources in the hope of improving performance.
  • Remote Labor Index: A metric referenced in the post (via reporting) used to estimate the share of jobs potentially automatable or performable by AI systems.

Reader FAQ

Does the post claim LLMs are completely useless?
No; the post argues LLMs are unreliable in key ways and deliver limited measurable value, but it does not claim they are entirely without use.

How much of jobs can AI do according to the source?
An update in the post cites a Washington Post report of the Remote Labor Index estimating AI could do about 2.5% of jobs.

Will simply making models bigger solve these problems?
Marcus contends that continued scaling is unlikely to cure the core issues; the post argues scaling is not going well and probably won't fix them.

Is it recommended to base national policy on current AI capabilities?
Marcus warns that orienting economic and geopolitical policy around current generative AI expectations would be a mistake.

Let’s be honest, Generative AI isn’t going all that well A sampling of recent news GARY MARCUS JAN 12, 2026 298 148 29 Share Some recent news, all long anticipated…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *