Stereo Reconstruction¶

Preliminaries¶

Epipolar Geometry¶

A point \(\bar{x}\) in the left image must be located on the epipolar line \(\tilde{l}_2\)
This reduces correspondence search to a (much simpler) 1D problem I For VGA images: ～640 instead of ～300k hypotheses (factor 480 less)

Image Rectification¶

What if both cameras face exactly the same direction?

Image planes are co-planar ⇒ Epipoles at infinity, epipolar lines parallel.
Correspondences search along horizontal scanlines (simplifies implementation)
Let\(K_1 = K_2 = R = I\) and \(t = (t,0,0)^⊤\)
\(\bar{x}_2^T\tilde{E}\bar{x}_1=\bar{x_2}^T \begin{bmatrix}0&0&0\\0&0&-t\\0&t&0\end{bmatrix}\bar{x1}=ty_1-ty_2=0\)
Thus \(y_1=y_2\)

What if the images are not in the required setup?

There is a trick: We can rewarp them through rotation, mapping both image planes to a common plane parallel to the baseline, this is called rectification
For this rotation around the camera center, the 3D structure must not be known

How can we make epipolar lines horizontal?

Step1:Estimate\(\tilde{E}\) , decompose into \(t\) and \(R\)

Step2: Find \(R_{rect}\)

Choose \(OO^′=\vec{T}=(t_x,t_y,t_z)^T\)
\(e_1=\frac{T}{∣∣T∣∣}\)
\(e_2=\frac{1}{\sqrt{T_x^2+T_y^2}}(-T_y,T_x,0)^T= [(0, 0, 1)^⊤]×r_1\)
\(e_3=e_2\) x \(e_1\)

\(\Rightarrow R_{rect}=\begin{bmatrix}e_1^T\\e_2^T\\e_3^T\end{bmatrix}\)

Step3 : Adjust \(\tilde{x}_i\)

Warp pixels in the first image as follows: \(\tilde{x}_1^{'}= KR_{rect}K_1^{-1}x ̄_1\)
Warp pixels in the second image as follows: \(\tilde{x}_2^{'}=KRR_{rect}K_2^{-1}x ̄_2\)
NOTE ：Different coordinate systems result in different perception of \(R_{rect}\)

Thus the \(R_{rect}^{'}=RR_{rect}\)

\(K\) is a shared projection matrix that can be chosen arbitrarily (e.g., \(K = K_1\))
In practice, the inverse transformation is used for warping (i.e. query the source)

Disparity to Depth¶

Block Matching¶

Choose disparity range[0,D]
For all pixels \(x = (x, y)\) compute the best disparity ⇒ winner-takes-all (WTA)
Do this for both images and apply left-right consistency check to remove outliers

Zero Normalized Cross-Correlation¶

https://martin-thoma.com/zero-mean-normalized-cross-correlation/

Sum of squared differences (SSD)¶

Assumption Violations¶

Block matching assumes that all pixels inside the window are displaced by d

This is called the fronto-parallel assumption which is often invalid

Slanted surfaces deform perspectively when the viewpoint changes
Effect of Window Size

Small windows lead to matching ambiguities and noise in the disparity maps

Larger windows lead to smoother results, but loss of details and border bleeding

Border Bleeding:

Left-Right Consistency Test:

Siamese Networks¶

Training¶

Loss Function¶

Hinge Loss:\(L = max(0, m +\ s_-\ -\ s_+)\)

\(s_- / s_+\) is the score of the network for the negative/positive example
The loss is zero when the similarity of the positive example is greater than the similarity of the negative example by at least margin m
The network is trained by minimizing a hinge loss.
The loss is computed by considering pairs of examples centered around the same image position where one example belongs to the positive and one to the negative class.
Let \(s_+\)be the output of the network for the positive example, \(s_-\) be the output of the network for the negative example
Let m, the margin, be a positive real number.
The hinge loss for that pair of examples is defined as \(L = max(0, m + s_- - s_+)\) The loss is zero when the similarity of the positive example is greater than the similarity of the negative example by at least the margin m.
Set the margin to 0.2 in our experiments.

Paper¶

https://www.jmlr.org/papers/volume17/15-535/15-535.pdf

Spatial Regularization¶

Add Pairwise terms: Smoothness between adjacent pixels in addition to matching costs;
Potts: \(ψ_{smooth}(d, d^′) = [d\ne d^′]\)
Truncated\(l_1:ψ+{smooth}(d,d^′)=min(|d-d^′|,τ)\)
Paper

https://dash.harvard.edu/bitstream/handle/1/3637107/Mumford_StatRangeImage.pdf?sequence=3&isAllowed=y

End-to-End Learning¶

End to End learning in the context of AI and ML is a technique where the model learns all the steps between the initial input phase and the final output result. This is a deep learning process where all of the different parts are simultaneously trained instead of sequentially.

DISPNET¶

GCNET¶

STEREO MIXTURE DENSITY NETWORKS (SMD-NETS)¶

最后更新: 2024年3月25日 12:53:47
创建日期: 2023年11月11日 00:20:53