TL;DR

A public repository and accompanying paper present an algorithm and implementation to compute the product of a deep network’s Hessian inverse with a vector. The method reformulates the linear system into a structured block-tridiagonal form and uses a partitioned-matrix library; the authors suggest it could become a preconditioner to accelerate SGD.

What happened

A researcher published code and a paper demonstrating an approach to compute H^{-1} times a vector for deep networks, addressing the problem of solving Hx = v efficiently. Rather than relying on the standard Hessian-vector product technique popularized by Pearlmutter, the project augments the linear system with auxiliary variables, reorders it into a block-tridiagonal system, factors that system, and then solves it. According to the repository, these algebraic manipulations effectively resemble running propagation on an augmented version of the original network. The codebase includes an implementation function named hessian_inverse_product, a demo notebook (demo_hessian.ipynb) and a small library for manipulating nested, block-structured matrices (block_partitioned_matrices.py). The project is open on GitHub and is accompanied by a paper describing the full idea.

Why it matters

  • Computing H^{-1}·v efficiently could enable second-order style preconditioning that reduces iterations in stochastic gradient descent.
  • The approach aims to avoid the cubic parameter-scaling cost of naive Hessian inversion by exploiting structure in tall-skinny networks.
  • Providing both a paper and working code (including a demo and matrix utility library) helps reproducibility and adoption by researchers and practitioners.

Key facts

  • The repository provides code and a paper that compute the product of the inverse Hessian with a vector for deep nets.
  • The work is contrasted with Pearlmutter’s Hessian-vector-product (HVP) method; here the inverse product (H^{-1}·v) is computed directly.
  • Solving Hx = v naively has cubic complexity in the number of parameters; the project targets a more efficient route by exploiting matrix structure.
  • The core trick is to augment the system with auxiliary variables, pivot to a block-tridiagonal system, factor that system, and then solve it.
  • The sequence of algebraic steps is said to resemble running propagation on an augmented network.
  • The codebase includes a demo notebook (demo_hessian.ipynb) and an implementation function named hessian_inverse_product.
  • A helper library, block_partitioned_matrices.py, supports hierarchical operations on partitioned and block-structured matrices and includes a tutorial.
  • Repository metadata: 6 stars, 1 watcher, 1 fork, and 5 contributors (as reported in the source).
  • Languages in the repo include Jupyter Notebook, Python and TeX (percentages listed in the source).

What to watch next

  • Efforts to use this method as a preconditioner for stochastic gradient descent, as the authors suggest are possible.
  • Empirical performance and scaling on large, modern networks (not confirmed in the source).
  • Benchmarks comparing iteration counts and wall-clock time versus existing first- and second-order optimizers (not confirmed in the source).

Quick glossary

  • Hessian: The square matrix of second derivatives of a scalar-valued function with respect to model parameters; in optimization it captures local curvature.
  • Hessian-vector product (HVP): An operation that computes the product of a function’s Hessian matrix with a vector without forming the full Hessian explicitly.
  • Preconditioner: A transformation applied to a linear system that improves the conditioning and accelerates iterative solvers.
  • Block-tridiagonal matrix: A matrix composed of blocks arranged so that nonzero blocks appear only on the main diagonal and the diagonals immediately above and below it.
  • Partitioned matrix: A matrix represented as an array of submatrices (blocks), enabling structured algebraic operations that exploit that block decomposition.

Reader FAQ

What does this project provide?
An algorithm, paper and code to compute the inverse-Hessian times a vector, a demo notebook, and utilities for block-partitioned matrices.

Is this approach intended to speed up training?
The authors express the hope it can be used as a preconditioner to speed stochastic gradient descent, but concrete speedup results are not provided in the source.

Does the code require building the full Hessian?
The project avoids naive full-Hessian inversion by restructuring the problem into a block-tridiagonal system; details are in the paper and implementation.

Has this been tested on large modern models?
not confirmed in the source

The Hessian of tall-skinny networks is easy to invert This package shows how to multiply the inverse of the Hessian of a deep network with a vector. If the Hessian-vector…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *