TL;DR

Apple published SHARP, an open-source system that generates a metric 3D Gaussian scene representation from a single photo and can render nearby views in real time. The project includes code, a pretrained checkpoint, and a command-line interface; video rendering currently requires a CUDA GPU.

What happened

Apple released an open-source implementation of SHARP (Sharp Monocular View Synthesis in Less Than a Second), a neural method that infers parameters of a 3D Gaussian representation from a single photograph. The model produces this 3D Gaussian splat (3DGS) representation via a single feedforward network pass in under a second on a standard GPU, and the resulting representation can be rendered interactively to produce high-resolution nearby views. The representation is metric and supports metric camera movements. The repository on GitHub includes installation instructions, a CLI for prediction and rendering, and an option to download or supply a pretrained checkpoint. The authors report substantial quality and speed gains in their paper—lower LPIPS and DISTS errors compared with prior work and a synthesis-time reduction by three orders of magnitude. Video rendering from predicted gaussians requires a CUDA GPU; predictions themselves run on CPU, CUDA, and MPS.

Why it matters

  • Single-image 3D view synthesis becomes faster and more practical for real-time or interactive pipelines, since SHARP produces a renderable representation in under a second.
  • Metric 3D outputs allow consistent camera movements and potential integration with applications that require scale-aware scene representations.
  • Open-source release with a pretrained checkpoint and CLI lowers the barrier for researchers and developers to experiment and build on the method.

Key facts

  • Project name: SHARP (Sharp Monocular View Synthesis in Less Than a Second).
  • Source code and model released publicly on GitHub in the apple/ml-sharp repository.
  • Model infers parameters of a 3D Gaussian representation (3DGS) from a single image via a single feedforward pass.
  • Inference to produce the 3DGS is reported to take less than one second on a standard GPU.
  • The produced 3DGS can be rendered in real time to generate high-resolution nearby views.
  • Authors provide an arXiv citation: arXiv:2512.10685 and a paper with quantitative evaluations reporting 25–34% LPIPS and 21–43% DISTS reductions versus prior models.
  • Repository includes a CLI: sharp predict and sharp render; the pretrained checkpoint is auto-downloaded to ~/.cache/torch/hub/checkpoints/ or can be manually fetched from the provided URL.
  • Output format: 3D Gaussian splats saved as .ply files, compatible with public 3DGS renderers; OpenCV coordinate convention is used (x right, y down, z forward).
  • Rendering videos with the –render option requires a CUDA GPU; gaussian prediction itself supports CPU, CUDA, and MPS.
  • Repository metadata shows ~5.4k stars, 351 forks, and two listed contributors in the source.

What to watch next

  • How the project's LICENSE and LICENSE_MODEL files affect commercial or research use — not confirmed in the source.
  • Broader compatibility and performance when integrating the 3DGS .ply outputs with third-party renderers; the repo notes users may need to scale or re-center scenes.
  • Planned updates, maintenance cadence, or additional checkpoints for different domains — not confirmed in the source.
  • Practical performance differences on non-CUDA platforms (MPS, CPUs) for real-world workloads — partially confirmed (prediction supports MPS/CPU) but detailed benchmarks are not provided in the source.

Quick glossary

  • Monocular view synthesis: The process of generating new views of a scene from a single input image.
  • 3D Gaussian splats (3DGS): A scene representation that models elements as 3D Gaussians which can be rendered to produce images from different viewpoints.
  • LPIPS: A perceptual similarity metric used to measure visual differences between images; lower values indicate closer perceptual match.
  • DISTS: A metric combining structural and texture similarity measures to evaluate perceptual image quality.
  • Zero-shot generalization: The ability of a model to perform on unseen datasets or tasks without additional fine-tuning on those specific data.

Reader FAQ

Can I run SHARP on a CPU?
Yes — the repository states gaussian prediction works on CPU, CUDA, and MPS; however, rendering videos with the –render option requires a CUDA GPU.

Is a pretrained model checkpoint provided?
Yes — the CLI will download a checkpoint automatically to ~/.cache/torch/hub/checkpoints/, and the README includes a direct wget URL to the checkpoint file.

What output files does SHARP produce?
SHARP saves 3D Gaussian splats as .ply files (3DGS), intended for use with compatible 3DGS renderers; coordinate convention and scene centering are described in the repo.

Are usage restrictions or licensing details specified?
The repository contains LICENSE and LICENSE_MODEL files; users are instructed to check those files for licensing terms.

Sharp Monocular View Synthesis in Less Than a Second This software project accompanies the research paper: Sharp Monocular View Synthesis in Less Than a Second by Lars Mescheder, Wei Dong,…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *