Apple open-sources SHARP model to convert single photos into 3D views

TL;DR

Apple published SHARP, an open-source system that generates a metric 3D Gaussian scene representation from a single photo and can render nearby views in real time. The project includes code, a pretrained checkpoint, and a command-line interface; video rendering currently requires a CUDA GPU.

What happened

Apple released an open-source implementation of SHARP (Sharp Monocular View Synthesis in Less Than a Second), a neural method that infers parameters of a 3D Gaussian representation from a single photograph. The model produces this 3D Gaussian splat (3DGS) representation via a single feedforward network pass in under a second on a standard GPU, and the resulting representation can be rendered interactively to produce high-resolution nearby views. The representation is metric and supports metric camera movements. The repository on GitHub includes installation instructions, a CLI for prediction and rendering, and an option to download or supply a pretrained checkpoint. The authors report substantial quality and speed gains in their paper—lower LPIPS and DISTS errors compared with prior work and a synthesis-time reduction by three orders of magnitude. Video rendering from predicted gaussians requires a CUDA GPU; predictions themselves run on CPU, CUDA, and MPS.

Why it matters

Single-image 3D view synthesis becomes faster and more practical for real-time or interactive pipelines, since SHARP produces a renderable representation in under a second.
Metric 3D outputs allow consistent camera movements and potential integration with applications that require scale-aware scene representations.
Open-source release with a pretrained checkpoint and CLI lowers the barrier for researchers and developers to experiment and build on the method.

Key facts

Project name: SHARP (Sharp Monocular View Synthesis in Less Than a Second).
Source code and model released publicly on GitHub in the apple/ml-sharp repository.
Model infers parameters of a 3D Gaussian representation (3DGS) from a single image via a single feedforward pass.
Inference to produce the 3DGS is reported to take less than one second on a standard GPU.
The produced 3DGS can be rendered in real time to generate high-resolution nearby views.
Authors provide an arXiv citation: arXiv:2512.10685 and a paper with quantitative evaluations reporting 25–34% LPIPS and 21–43% DISTS reductions versus prior models.
Repository includes a CLI: sharp predict and sharp render; the pretrained checkpoint is auto-downloaded to ~/.cache/torch/hub/checkpoints/ or can be manually fetched from the provided URL.
Output format: 3D Gaussian splats saved as .ply files, compatible with public 3DGS renderers; OpenCV coordinate convention is used (x right, y down, z forward).
Rendering videos with the –render option requires a CUDA GPU; gaussian prediction itself supports CPU, CUDA, and MPS.
Repository metadata shows ~5.4k stars, 351 forks, and two listed contributors in the source.

What to watch next

How the project's LICENSE and LICENSE_MODEL files affect commercial or research use — not confirmed in the source.
Broader compatibility and performance when integrating the 3DGS .ply outputs with third-party renderers; the repo notes users may need to scale or re-center scenes.
Planned updates, maintenance cadence, or additional checkpoints for different domains — not confirmed in the source.
Practical performance differences on non-CUDA platforms (MPS, CPUs) for real-world workloads — partially confirmed (prediction supports MPS/CPU) but detailed benchmarks are not provided in the source.

Quick glossary

Monocular view synthesis: The process of generating new views of a scene from a single input image.
3D Gaussian splats (3DGS): A scene representation that models elements as 3D Gaussians which can be rendered to produce images from different viewpoints.
LPIPS: A perceptual similarity metric used to measure visual differences between images; lower values indicate closer perceptual match.
DISTS: A metric combining structural and texture similarity measures to evaluate perceptual image quality.
Zero-shot generalization: The ability of a model to perform on unseen datasets or tasks without additional fine-tuning on those specific data.

Reader FAQ

Can I run SHARP on a CPU?
Yes — the repository states gaussian prediction works on CPU, CUDA, and MPS; however, rendering videos with the –render option requires a CUDA GPU.

Is a pretrained model checkpoint provided?
Yes — the CLI will download a checkpoint automatically to ~/.cache/torch/hub/checkpoints/, and the README includes a direct wget URL to the checkpoint file.

What output files does SHARP produce?
SHARP saves 3D Gaussian splats as .ply files (3DGS), intended for use with compatible 3DGS renderers; coordinate convention and scene centering are described in the repo.

Are usage restrictions or licensing details specified?
The repository contains LICENSE and LICENSE_MODEL files; users are instructed to check those files for licensing terms.

Sharp Monocular View Synthesis in Less Than a Second This software project accompanies the research paper: Sharp Monocular View Synthesis in Less Than a Second by Lars Mescheder, Wei Dong,…

Apple open-sources SHARP model to convert single photos into 3D views

By

TL;DR

What happened

Why it matters

Key facts

What to watch next

Quick glossary

Reader FAQ

Sources

Related posts

By

Related Post

Recreated: Steve Jobs’s 1975 Atari horoscope program — now runnable

The waning era of scale-only AI: why scaling’s grip is weakening

McKinsey and General Catalyst: the ‘learn once, work forever’ era is over

Leave a Reply Cancel reply

You missed

SMTP Tunnel: A SOCKS5 proxy that masks TCP as SMTP to bypass DPI

Recreated: Steve Jobs’s 1975 Atari horoscope program — now runnable

Google to publish AOSP source twice yearly, a setback for custom ROMs

Transform your phone into a true productivity workhorse with a USB-C hub