1. Background and motivation
Traditional bone registration relies on manually selecting optical markers and CT fiducials. This is slow and error-prone. The goal here is to recover a dense depth map of the surgical surface directly from images, so the depth map can be registered to a pre-operative CT scan automatically.
Commercial inspiration: MedicalTek's MonoStereo system in Taiwan converts 2D endoscope video into 3D in real time using a single-image inverse-square + Lambertian model plus learned priors. The classical multi-light version of the same idea, photometric stereo, recovers true surface normals algebraically from three or more images, then integrates them into a depth map. That is the technique implemented here.
2. Physical model
2.1 Lambertian reflection
A matte surface scatters incoming light equally in all directions. Its observed brightness depends only on the angle between the surface normal and the light direction:
I = ρ · L · cos(θ)
where ρ is albedo (surface reflectivity), L is light intensity, and θ is the angle between surface normal n and light direction L. This is Lambert's cosine law.
2.2 Inverse-square falloff
For a point light source, intensity at the surface drops with the square of distance because the same energy spreads over a sphere of area 4πr²:
I ∝ 1 / r²
2.3 Combined endoscope model (single light, coaxial)
When the light source is on the camera axis (e.g. endoscope tip), the two combine:
I(p) = ρ · L · cos(θ) / r²
Inverting this for r gives the single-image depth estimator. Useful but limited: it conflates ρ, cos(θ), and r² into a single brightness number.
2.4 Photometric stereo (Woodham 1980)
With three images of the same surface lit from three known different directions, the per-pixel equation system becomes:
I_k = ρ · (n · L_k) for k = 1, 2, 3
Stacking into matrix form:
[I₁] [L₁ᵀ]
[I₂] = [L₂ᵀ] · (ρn)
[I₃] [L₃ᵀ]
Let g = ρn. Then g = L⁻¹ · I, computed once per pixel by a single matrix multiply. From g:
- Albedo:
ρ = |g| - Surface normal:
n = g / |g|
Critical advantage over the single-image method: distance does not appear in these equations. Photometric stereo recovers the surface orientation directly, with no inverse-square assumption.
2.5 Recovering depth from normals
Surface normals give the gradient of the depth function, not depth itself. The relationship (with nz > 0 pointing toward the camera and z increasing away from camera):
dz/dx_image = +nx / nz
dz/dy_image = −ny / nz ← sign flip: image y points down, world Y points up
Integrate these gradients across the image to recover z(x, y) up to an additive constant. Sign convention bit us during development — see section 6.
2.6 Absolute scale anchoring (coaxial light)
Pure photometric stereo produces depth up to an additive offset. To get real centimeters, take one additional image with a coaxial light source (light on the camera axis) and use the inverse-square law at the brightest pixel:
r_near = sqrt(ρ · L / I_brightest)
This single measurement fixes the offset for the entire depth map.
3. Pipeline
The implementation has five stages:
- Render or capture three lit images of the same scene with three different known light directions.
- Solve photometric stereo per pixel:
g = L⁻¹ · I→ albedo, surface normal. - Integrate surface normals into a relative depth map (weighted least-squares Poisson solve).
- Calibrate absolute scale using a coaxial-light reference image (inverse-square at the brightest point).
- Visualize as a heatmap with numerical labels in cm.
4. Code structure (photometric_stereo_sphere.py)
4.1 make_sphere_scene(size, radius_cm, center_dist_cm, albedo, light_dirs)
Generates a synthetic test scene:
- A sphere of radius 1.5 cm centered at depth 4.0 cm from the camera
- 300×300 image, ~60 pixels per cm
- Three lit images using directional lights at [−1, 1, 1.5], [1, 1, 1.5], [0, −1, 1.5] (unit-normalized)
- Returns: list of RGB images, ground truth depth map, ground truth normals, foreground mask, light direction matrix
4.2 solve_photometric_stereo(images, light_dirs, mask)
The core Woodham step. Stack three luminance images into a (3, N) matrix where N = H × W. Compute L⁻¹ once (3×3 inverse), then G = L⁻¹ · I_stack recovers g = ρn for every pixel in a single matrix multiply. Split into albedo = |g| and normal = g / |g|.
4.3 integrate_normals_to_depth(normals, mask, cm_per_pixel)
Sparse weighted least-squares Poisson solver. For each masked pixel, write two equations:
nz · (z[y, x+1] − z[y, x]) = +nx · cm_per_pixel
nz · (z[y+1, x] − z[y, x]) = −ny · cm_per_pixel
The nz weighting downweights silhouette pixels where nz ≈ 0 and the slope estimate nx/nz would otherwise blow up. Anchor one pixel to 0 to make the system non-singular. Build as a sparse CSR matrix, solve with scipy.sparse.linalg.lsqr.
4.4 calibrate_scale_with_coaxial(z_relative, mask, coaxial_image, albedo, light_power_cm2)
Fixes the additive offset. Take the top 0.1% brightest pixels in the coaxial image (where cos θ ≈ 1, so they are the nearest points). Average their red-channel intensity → r_near = sqrt(ρL / I). Set the offset so the integrated depth at those bright pixels equals r_near. Also handles sign flip if photometric stereo returned a flipped-orientation solution.
4.5 render_coaxial(normals, mask, albedo_map, light_power_cm2, z_true, noise)
Helper that synthesizes the calibration image used by calibrate_scale_with_coaxial. For real hardware, this image comes from the actual scope.
4.6 Visualization helpers
annotate_sphere, labeled, and the figure-generating code in main() produce five output PNGs with depth labels, comparison plots, and a 3D surface rendering.
5. Light direction calibration
In the synthetic demo, light directions are hardcoded because we both generate the images and solve them. In a real rig you must measure them. Four standard methods:
5.1 Chrome sphere calibration (gold standard)
Place a shiny ball of known size in the scene. The specular highlight on the ball pinpoints the light source position. For a highlight at pixel (x, y) on a sphere with center (cx, cy) and radius r pixels:
nx = (x − cx) / r
ny = −(y − cy) / r ← image-y flip
nz = sqrt(1 − nx² − ny²)
L = 2(n · V)n − V ← mirror reflection of view direction V = [0, 0, 1]
Do this once per light. Re-calibrate whenever any light moves.
5.2 Known-geometry object
Image a calibration target whose shape you know. Solve for the L that best explains the observed brightness pattern. A 3-parameter nonlinear least-squares fit.
5.3 Direct physical measurement
If the lights are fixed in a rig (LED ring, scope tip), measure their positions in millimeters from a reference point and convert to camera coordinates. This is what surgical scopes effectively do — LED positions are known from manufacturing.
5.4 Uncalibrated photometric stereo
Solve for lights and surface jointly. Suffers from the generalized bas-relief ambiguity — multiple (lights, surface) pairs explain the same images. Needs additional constraints (integrability, prior on one normal, multiple objects). Active research area.
5.5 Geometric requirement
The three light directions must form a linearly independent set for L⁻¹ to exist. Lights all roughly above the object → L becomes near-singular → solver explodes on noise. Recommended configuration: roughly 120° apart in azimuth, all tilted 30–45° above the horizon.
6. Bugs encountered and fixed
6.1 Unit mismatch in integration (factor of ~60,000 error)
Symptom: Integrated depth span was 153 cm when the true span was 1.5 cm.
Cause: Wrote forward differences as z[y, x+1] − z[y, x] = −nx/nz (unitless slope) instead of = −nx/nz · cm_per_pixel. The integration accumulated slope × pixel-count, producing depth in “pixels” instead of cm.
Fix: Multiply gradient RHS by cm_per_pixel everywhere. For our 300-pixel image with 1.5 cm sphere radius spanning 0.3 × W pixels, cm_per_pixel = 1.5 / 90 ≈ 0.0167.
6.2 RGB vs luminance in coaxial calibration
Symptom: Coaxial calibration produced r_near = 3.09 cm when true depth at brightest pixel was 2.52 cm.
Cause: Computed luminance with weighted RGB sum (0.15R + 0.70G + 0.15B), but the tissue tint applied to the rendered image scaled G and B by 0.55. So luminance was 0.6175× the raw intensity. The calibration was applying sqrt(ρL / I_luminance) instead of sqrt(ρL / I_raw).
Fix: Use the red channel directly for coaxial calibration. R preserves the raw intensity in our tissue-tinted rendering.
6.3 Sign convention on y-axis (factor of 2 shape error)
Symptom: Even with true normals, integration produced span 3.45 cm vs true 1.5 cm — the shape was stretched.
Cause: Textbook photometric stereo writes dz/dx = −nx/nz, dz/dy = −ny/nz for a coordinate system where z increases toward the camera and y increases upward. Our convention has z increasing away from the camera and image y increasing downward. The two sign flips do not cancel — they affect different equations.
Fix: A sign sweep on true normals showed (+nx, −ny) produces the correct shape:
sx=+1, sy=+1: best slope=0.000, error after best-fit=0.296 cm ← wrong
sx=+1, sy=−1: best slope=0.995, error after best-fit=0.013 cm ← CORRECT
sx=−1, sy=+1: best slope=−0.995, error after best-fit=0.013 cm
sx=−1, sy=−1: best slope=0.000, error after best-fit=0.296 cm ← wrong
Settled on dz/dx_img = +nx/nz, dz/dy_img = −ny/nz.
6.4 Silhouette instability
Symptom: Near the sphere silhouette, nz → 0, so −nx/nz and −ny/nz blow up. Tried clamping nz with safe_nz = max(nz, 1e-3) — produced huge spurious gradients at the boundary that corrupted the entire integration.
Fix: Reformulate the gradient equations in their undivided form:
nz · (z[y, x+1] − z[y, x]) = +nx · cm_per_pixel
Now silhouette pixels (nz ≈ 0) contribute negligibly to the least-squares system, exactly proportional to how reliable they are. No clamping needed.
7. Results
Synthetic sphere, radius 1.5 cm, center at depth 4.0 cm:
| Metric | Value |
|---|---|
| True depth range | 2.500 – 4.000 cm (span 1.5 cm) |
| Estimated depth range | 2.418 – 3.987 cm |
| Mean absolute error | 0.082 cm (2.75% of mean depth) |
| Normal recovery error | 0.041 (vs unit vectors) |
Reference: the academic photometric-endoscopy paper from Batlle, Montiel & Tardós (arXiv:2204.09083) reports 7% mean error in vivo. Our 2.75% on a synthetic sphere with known lights is in the same ballpark, slightly better as expected because the scene is noise-free and the lights are perfectly known.
8. Limitations and next steps
8.1 What this implementation does well
- Recovers absolute-scale depth from three lit images plus a coaxial reference
- Robust to silhouette boundaries via weighted Poisson integration
- Handles arbitrary masks (works on any object outline, not just spheres)
- Real-units output (cm) via the coaxial calibration step
8.2 What it does NOT handle
- Specular highlights on wet/shiny surfaces — would need a detection-and-masking pre-step (the earlier single-image version does this; can be ported)
- Cast shadows — pixels in shadow violate
I = ρ · n · L. Standard fix: detect dark pixels per-image, exclude them from that light's equation, fall back to two-light solve where needed - Non-Lambertian materials — only the diffuse term is modeled
- Unknown light directions — current code assumes calibrated lights (see section 5)
- Moving lights or moving camera between frames — assumes the three exposures are temporally coregistered
8.3 Next milestones for Lumina Bone
- Hardware light calibration — implement chrome-sphere calibration routine (section 5.1) and integrate into capture workflow.
- Real image testing — capture three lit images of a bone phantom and run the pipeline end-to-end.
- CT registration — implement ICP between the photometric depth map and the pre-op CT surface.
- Shadow/specular handling — port the masking code from the single-image pipeline.
- Latency optimization — current sparse-solve takes a few seconds at 300×300; for real-time use, port to GPU (cuSPARSE) or use a multigrid Poisson solver.
9. Files in the repository
| File | Purpose |
|---|---|
photometric_depth.py | Single-image inverse-square + Lambertian pipeline. Handles specular masking, edge-aware smoothing, and stereo view synthesis. Input: one image. |
photometric_heatmap.py | Wrapper around the single-image pipeline that produces labeled heatmaps with cm values. |
photometric_stereo_sphere.py | Main file. Three-light photometric stereo on a synthetic sphere. Produces five output PNGs including the labeled depth heatmap and 3D surface. |
9.1 How to run
pip install numpy scipy matplotlib pillow
python photometric_stereo_sphere.py
Outputs five PNGs to ./outputs/ (path is configurable at the top of main()):
ps_01_inputs.png— the three lit input imagesps_02_normals.png— recovered vs ground-truth surface normalsps_03_depth_heatmap.png— depth heatmap with numerical labels in cmps_04_comparison.png— estimated vs ground-truth depth side by sideps_05_3d_surface.png— 3D surface plot of the reconstructed sphere
10. References
- Woodham, R. J. (1980). Photometric method for determining surface orientation from multiple images. Optical Engineering 19(1).
- Batlle, V. M., Montiel, J. M. M., & Tardós, J. D. (2022). Photometric single-view dense 3D reconstruction in endoscopy. arXiv:2204.09083.
- Frankot, R. T. & Chellappa, R. (1988). A method for enforcing integrability in shape from shading algorithms. IEEE PAMI 10(4).
- MedicalTek (MDTK) MonoStereo product documentation: medicaltek.biz