TL;DR
Apple published SHARP, an open-source system that generates a metric 3D Gaussian scene representation from a single photo and can render nearby views in real time. The project includes code, a pretrained checkpoint, and a command-line interface; video rendering currently requires a CUDA GPU.
What happened
Apple released an open-source implementation of SHARP (Sharp Monocular View Synthesis in Less Than a Second), a neural method that infers parameters of a 3D Gaussian representation from a single photograph. The model produces this 3D Gaussian splat (3DGS) representation via a single feedforward network pass in under a second on a standard GPU, and the resulting representation can be rendered interactively to produce high-resolution nearby views. The representation is metric and supports metric camera movements. The repository on GitHub includes installation instructions, a CLI for prediction and rendering, and an option to download or supply a pretrained checkpoint. The authors report substantial quality and speed gains in their paper—lower LPIPS and DISTS errors compared with prior work and a synthesis-time reduction by three orders of magnitude. Video rendering from predicted gaussians requires a CUDA GPU; predictions themselves run on CPU, CUDA, and MPS.
Why it matters
- Single-image 3D view synthesis becomes faster and more practical for real-time or interactive pipelines, since SHARP produces a renderable representation in under a second.
- Metric 3D outputs allow consistent camera movements and potential integration with applications that require scale-aware scene representations.
- Open-source release with a pretrained checkpoint and CLI lowers the barrier for researchers and developers to experiment and build on the method.
Key facts
- Project name: SHARP (Sharp Monocular View Synthesis in Less Than a Second).
- Source code and model released publicly on GitHub in the apple/ml-sharp repository.
- Model infers parameters of a 3D Gaussian representation (3DGS) from a single image via a single feedforward pass.
- Inference to produce the 3DGS is reported to take less than one second on a standard GPU.
- The produced 3DGS can be rendered in real time to generate high-resolution nearby views.
- Authors provide an arXiv citation: arXiv:2512.10685 and a paper with quantitative evaluations reporting 25–34% LPIPS and 21–43% DISTS reductions versus prior models.
- Repository includes a CLI: sharp predict and sharp render; the pretrained checkpoint is auto-downloaded to ~/.cache/torch/hub/checkpoints/ or can be manually fetched from the provided URL.
- Output format: 3D Gaussian splats saved as .ply files, compatible with public 3DGS renderers; OpenCV coordinate convention is used (x right, y down, z forward).
- Rendering videos with the –render option requires a CUDA GPU; gaussian prediction itself supports CPU, CUDA, and MPS.
- Repository metadata shows ~5.4k stars, 351 forks, and two listed contributors in the source.
What to watch next
- How the project's LICENSE and LICENSE_MODEL files affect commercial or research use — not confirmed in the source.
- Broader compatibility and performance when integrating the 3DGS .ply outputs with third-party renderers; the repo notes users may need to scale or re-center scenes.
- Planned updates, maintenance cadence, or additional checkpoints for different domains — not confirmed in the source.
- Practical performance differences on non-CUDA platforms (MPS, CPUs) for real-world workloads — partially confirmed (prediction supports MPS/CPU) but detailed benchmarks are not provided in the source.
Quick glossary
- Monocular view synthesis: The process of generating new views of a scene from a single input image.
- 3D Gaussian splats (3DGS): A scene representation that models elements as 3D Gaussians which can be rendered to produce images from different viewpoints.
- LPIPS: A perceptual similarity metric used to measure visual differences between images; lower values indicate closer perceptual match.
- DISTS: A metric combining structural and texture similarity measures to evaluate perceptual image quality.
- Zero-shot generalization: The ability of a model to perform on unseen datasets or tasks without additional fine-tuning on those specific data.
Reader FAQ
Can I run SHARP on a CPU?
Yes — the repository states gaussian prediction works on CPU, CUDA, and MPS; however, rendering videos with the –render option requires a CUDA GPU.
Is a pretrained model checkpoint provided?
Yes — the CLI will download a checkpoint automatically to ~/.cache/torch/hub/checkpoints/, and the README includes a direct wget URL to the checkpoint file.
What output files does SHARP produce?
SHARP saves 3D Gaussian splats as .ply files (3DGS), intended for use with compatible 3DGS renderers; coordinate convention and scene centering are described in the repo.
Are usage restrictions or licensing details specified?
The repository contains LICENSE and LICENSE_MODEL files; users are instructed to check those files for licensing terms.
Sharp Monocular View Synthesis in Less Than a Second This software project accompanies the research paper: Sharp Monocular View Synthesis in Less Than a Second by Lars Mescheder, Wei Dong,…
Sources
- Apple releases open-source model that instantly turns 2D photos into 3D views
- Sharp Monocular View Synthesis in Less Than a Second
- Apple's new open-source model turns 2D photos into 3D …
- Apple's SHARP AI Converts 2D Photos to 3D in Under 1 Second
Related posts
- 9to5Mac Weekly Roundup: iPhone Fold, iPhone Air 2, and iPhone 18 Rumors
- Dashlane Passkey Report Signals Enterprises Must Prepare for Passwordless Era
- Snapdragon 8 Elite Gen 5 vs. Exynos 2600: Battle for 2026 flagship SoCs