Probabilistic Graphical Models¶
Structured Prediction¶
Spatial regularization¶
\(p(D)\propto e^{\sum_i\phi_{data}(d_i)\lambda\sum_{(i，j)\in S}\phi_{smooth}(d_i,d_j)}\)

\(i\)~\(j\) neighbouring pixels (on a 4connected grid).

$\phi_{smooth} $ is a regularization term that encourages neighboring pixels to have similar disparities.
 \(\phi_{data}(d_i) =min(I(xi, yi)  J(xi  di, yi), σ)\)
 \(\phi_{smooth} (di,dj) = min(di  dj,τ)\)
where \(I\)and \(J\) are the image pairs. \(σ\) and \(τ\) are truncation thresholds.
 Structured Prediction:
Probabilistic graphical models encode local dependencies of the problem
Deep neural netwoks with imagebased outputs (stereo, flow, semantics)
Markov Random Fields¶
Undirected graphical models (UGM)¶
 Pairwise (noncausal) relationships
 Can write down model, and score specific configurations of the graph, but no explicit way to generate samples
 Contingency constrains on node configurations
cliques¶
Refers to fully connected subgraphs in a graphical model, particularly in models like Markov Random Fields or Conditional Random Fields.
In this context, a clique is a group of nodes in a graph where every pair of nodes is directly connected.
potential¶
 A potential \(φ(x)\)is a nonnegative function of the variable x
 A joint potential \(φ(x1, x2, . . . )\) is a nonnegative function of a set of variables.
Definations of an undirected graphical model¶
\(P(x_1……x_n)=\frac{1}{Z}\prod_{c\in C}\phi_c(x_c)\)
\(Z = \sum_{x_1……x_n}\prod_{c\in C}\phi_c(x_c)\)
Defination of Markov Random Field¶
 For a set of variables \(X ={x_1,...,x_M}\), a Markov Random Field is defined as a product of potentials over the (maximal) cliques \({(X_k)}_{k=1}^K\)of the undirected graph G
\(p(X)=\frac{1}{Z}\prod_{k=1}^K\phi_k(X_k)\)
 \(Z\) normalizes the distribution and is called partition function
Examples：
Properties¶
\(Condition \ One\)
Marginalizing over c makes a and b dependent¶
\(Proof\)
 Explain:take \(\sum_c\phi_1(a,c)\phi_2(b,c)\) for example
a  b  c  φ1(a,c)  φ2(b,c)  φ1(a,c)*φ2(b,c)  sum 
0  0  0  1  1  1  1 
1  0  0  0  
1  1  0  0  0  0  1 
1  1  1  1  
0  1  0  1  0  0  0 
1  0  1  0  
1  0  0  0  1  0  0 
1  1  0  0 
Conditioning on c makes a and b independent¶
\(Proof\)
Global Markov Property¶
Local Markov Property¶
 Markov blanket
HammersleyClifford Theorem¶
A probability distribution that has a strictly positive mass or density satisfies the Markov properties with respect to an undirected graph G if and only if it is a Gibbs random field,
i.e. its density can be factorized over the (maximal) cliques of the graph.
Factor Graphs¶
\(p(X) = \frac{1}{Z}\prod_{k=1}^Kf_k(X_k)_{k=1}^K\)
Example¶
Belief Propagation¶
Inference in Chain Structured Factor Graphs¶
\(p(a, b, c, d) = \frac{1}{Z}f_1(a, b)f_2(b, c)f_3(c, d)f4(d)\)
\(p(a,b,c) = \sum_{d}p(a,b,c,d)\)
\(\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{Z}f_1(a,b)f_2(b,c)\underbrace{\sum_{d}f_3(c,d)f_4(d)}_{μ_{d→c}(c)}\)
\(p(a,b) = \sum_{c}p(a,b,c)\)
\(\ \ \ \ \ \ \ \ \ \ = \frac{1}{Z}f_1(a,b)\underbrace{\sum_{c}f_2(b,c)μ_{d→c}(c)}_{μ_{c→b}(b)}\)
\(……\)
 Belief Propagation assumes a singlyconnected graph \(G = (V,E)\), which means it has \(V−1 = O(V)\) many edges (in contrast to \(V(V − 1)/2 = O(V^2)\) of a fully connected graph).
 That simplifies the computation of any marginal distribution significantly
Inference in Tree Structured Factor Graphs¶
FactortoVariable Messages
VariabletoFactor Messages
SumProduct Algorithm¶
Belief Propagation:¶
 Algorithm to compute all messages efficiently
 Assumes that the graph is singlyconnected (chain, tree)
Algorithm:¶
 Initialization
 Variable to Factor message
 Factor to Variable message
 Repeat until all messages have been calculated
 Calculate the desired marginals from the messages
Log Representation¶
MaxProduct Algorithm¶
 Example: Chain
\(\begin{align*}\underset{\text{a,b,c,d}}{\text{max}}p(a,b,c,d)&= \underset{\text{a,b,c,d}}{\text{max}}f_1(a,b)f2(b,c)f_3(c,d)\\&=\underset{\text{a,b,c}}{\text{max}}f_1(a,b)f2(b,c)\underbrace{\underset{\text{d}}{\text{max}}f_3(c,d)}_{\mu_{d>c}(c)}\\&=……\\ &=\underset{\text{a}}{\text{max}}\mu_{b>a}(a)\end{align*}\)
Loopy Belief Propagation¶
 Messages are also well defined for loopy graphs!
 Simply apply them to loopy graphs as well
 We loose exactness (⇒ approximate inference)
 Even no guarantee of convergence [Yedida et al. 2004] I But often works surprisingly well in practice
Summary¶
 REFER To PPT
Examples¶
Example 1: Vehicle Localization¶
MaxProduct Belief Propagation on chain structured Markov Random Fields for Vehicle Localization
Let's consider an autonomous vehicle driving on a highway and tracking a vehicle in front in order to initiate an overtaking maneuver. Let \(x_t\in\{1,2,3\}\) denote the lane the vehicle in front is driving on at time \(t\in\{1,\dots,10\}\). Unfortunately, the sensor readings are noisy as depicted below.
Selecting the most likely lane at each time \(t\) independently (green) leads to wrong estimates for \(t=\{3,7,10\}\). To solve this problem, and recover the correct situation depicted below
we can integrate prior knowledge and infer the most likely situation using maxproduct belief propagation. A sensible prior would favor staying on the same lane over changing one lane at a time over changing two lanes at a time. This prior can be integrated via a pairwise, chainstructured Markov Random Field (also called: Hidden Markov Model or HMM) where pairwise factors between adjacent frames modulate transition likelihoods:
 Coding (Refer to HW)
Example 2: Image Denoising¶
You are given a noisy binary image (\(10 \times 10\) pixels) which you want to denoise.
Make use of the Ising model for that purpose where neighboring pixels are encouraged to take the same value: \(\(p(x_1,\dots,x_{100}) \propto \exp \left\{\sum_{i=1}^{100} \psi_i(x_i) + \sum_{i\sim j} \psi_{ij} (x_i,x_j) \right\}\)\)
Here, \(i\) is the pixel index and \(i\sim j\) are neighboring pixels on a 4connected grid. The unary term \(\psi_i(x_i) = [x_i = o_i]\) models the observation at pixel \(i\), and the pairwise term is the Ising prior \(\psi_{ij}(x_i,x_j) = \alpha \cdot [x_i = x_j]\), where \(\alpha\) controls the strength of the interaction/smoothing.
Because we have large number of variables in this exercise, we use logarithm factors to avoid potential numerical underflow issues.
Inputs:
* num_vars
, num_states
, factors
, msg_fv
, msg_vf
, ne_var
Outputs:
* max_marginals
: num_vars
x num_states
array of estimated maxmarginals
* map_est
: array comprising the estimated MAP state of each variable
Algorithm Pseudocode:
 For
N=30
iterations do:  Update all unary factortovariable messages: \(\lambda_{f\rightarrow x}(x) = f(x)\)
 Update all pairwise factortovariable messages: \(\lambda_{f\rightarrow x}(x) = \max_y \left[f(x,y)+\lambda_{y\rightarrow f}(y)\right]\)

Update all variabletofactor messages: \(\lambda_{x\rightarrow f}(x) = \sum_{g\in\{ ne(x)\setminus f\}}\lambda_{g\rightarrow x}(x)\)

Calculate MaxMarginals: \(\gamma_x(x) = \sum_{g\in\{ ne(x)\}}\lambda_{g\rightarrow x}(x)\)
 Calculate MAP Solution: \(x^* = \underset{x}{\mathrm{argmax}} ~ \gamma_x(x)\)
 CODE : Refer to HW
Applications of Graphical Models¶
Stereo Reconstruction¶
 Depth varies slowly except at object discontinuities which are sparse
 [Matching cost computed directly]\(f_{data}(d_i)\) can be obtained by like: \(Siamese Network\) (in ex02).
 Adding Pairwise connections.

To minimize the ENERGY

Add Truncated penalty

\(\lambda\)  "tradeoff"
Nonlocal Priors¶
 Despite we have introduced the smoothness regularizaer.Due to Very Strong violation of matching assumptions.
Very LOCAL pairwise terms cannot deal with some cases (like reflections)
 Add Object Semantics & 3D Consistency
Summary¶

Block matching suffers from ambiguities

Choosing window size is problematic (tradeoff)

Incorporating smoothness constraints can resolve some of the ambiguities and allows for choosing small windows (no bleeding artifacts)

Can be formulated as MAP inference in a discrete MRF
 MAP solution can be obtained using belief propagation, graph cuts, etc.
 Integrating recognition cues can further regularize the problem
MultiView Reconstruction¶
Representation
 Voxel
Voxel occupancy: This concept describes whether a voxel is occupied by a solid entity. In many tasks such as 3D reconstruction, object detection, SLAM (Simultaneous Localization and Mapping), etc., voxel occupancy is crucial. Typically, a voxel is either occupied by a solid entity or empty (occupied by other objects or background in space). In representations of voxel occupancy, a common method is binary, where a voxel is considered occupied when it's 1 and empty otherwise.
Voxel appearance: This concept describes the visual properties of a voxel, i.e., how it appears in images or voxel grids. It may include color, texture, brightness, etc.
Image Formation Process
 Actually quite simple  only the first occupied can appear
Probabilistic Model

Joint Distribution: \(p(\bold{O},\bold{A})=\frac{1}{Z}\Pi_{v\in\bold{V}}φ_v(o_v)\Pi_{r\in\bold{R}}ψ_r(o_r, a_r)\)

Unary Potentails : \(φ_v(o_v) = γ^{o_v} (1 − γ)^{(1−o_v)}\)
Most voxels are empty ⇒ \(γ\) < 0.5

Ray Potentials :
 We know \(I_r\) ?

If first occupied voxel \(a_i^r\) similar to the corresponding pixel , \(ψ(o_r,a_r)\) will increase
Depth Distribution For Single Ray
 Let : \(o_1=0\ ……o_{k1}=0\ o_k=1\)

Message from and to the unary factors \(\mu_{φ_i\to o_i}\) \(\mu_{o_i\to φ_i}\) and \(\mu_{o_i\to ψ_i}\) can be easilly computed.

Occupancy message
Thus, resulting in [DERIVATION SEE PAPER BELOW]
\(\mu_{ψ\to o_1}(o_1=1)=\int_{a_1}v(a_1)\mu(a_1)da_1\)
\(\mu_{ψ\to o_1}(o_1=0)=\sum_{j=2}^N\mu(o_j=1)\Pi_{k=2}^{j1}\mu(o_k=0)\rho_j\)
\(\rho_j=\int _{a_j}v(a_j)u(a_j)da_j\)
More General :[see PAPER]
 Appearance messages [see PAPER]
Bayes Optimal Depth Estimation
 Consider a single ray r in space
 Let \(d_k\) be the distance from the camera to voxel \(k\) along ray \(r\)

Depth \(D∈\{d_1,...,d_N\}\):distance to closest occupied voxel

Optimal depth estimate:

\(p(D=d_i)∝\mu(o_i=1)\Pi_{j=1}^{i1}\mu(o_j=0)\rho_i\) [Derivation see PPT & Paper]

Requires marginal depth distribution \(p(D)\) along each ray
Optical Flow¶
 Motion Field Vs Optical Flow
Motion field:
 2D motion field representing the projection of the 3D motion of points in the scene onto the image plane
 Can be the result of camera motion or object motion (or both)
Optical flow:
 2D velocity field describing the apparent motion in the image(i.e., the displacement of pixels looking “similar”)
 Optical flow \(\ne\) motion field! Why?
Determining Optical Flow¶
 A single observation is not enough to determine flow
 \(\lambda\) positive
Solution: linearize the brightness constancy assumption
\(f(x,y)≈f(a,b)+ \frac{\partial f(a,b)}{∂x} (x−a)+ \frac{\partial f(a,b)}{∂y} (y−b)\)
Thus , we have \(I(x + u(x, y), y + v(x, y), t + 1) \approx I(x,y,t)+I_x(x,y,t)u(x,y)+I_y(x,y,t)v(x,y)+I_t(x,y,t)\)
\(E(u,v) \approx \iint [(I_x(x,y,t)u(x,y)+I_y(x,y,t)v(x,y)+I_t(x,y,t))^2+\lambda(\triangledown u(x,y)^2+\triangledown v(x,y)^2]dxdy\)
which leads to the following discretized objective:
 The HS results are quite plausible already
 However, the flow is very smooth, i.e., to overcome ambituities we need to set \(λ\) to a high value which oversmooths flow discontinuities. Why?
 We use a quadratic penalty for penalizing changes in the flow
 This does not allow for discontinuities in the flow field
 In other words, it penalizes large changes too much and causes oversmoothing
Robust Estimation of Optical Flow¶
 Gibbs Energy \(E(u,v) \approx \iint [(I_x(x,y,t)u(x,y)+I_y(x,y,t)v(x,y)+I_t(x,y,t))^2+\lambda(\triangledown u(x,y)^2+\triangledown v(x,y)^2]dxdy\)
 Both assumptions are invalid (e.g., discontinuities at object boundaries). Why?
The text points out the invalidity of two assumptions. Firstly, the assumption of having a smooth probability distribution in the image at object boundaries is invalid because object boundaries typically exhibit discontinuities, whereas Gaussian distributions assume continuity.
 Gaussian distributions correspond to squared loss functions
Gaussian distributions correspond to quadratic loss functions. This means that using Gaussian distributions to model smoothness is equivalent to using squared terms in the loss function, which is somewhat sensitive to outliers.
 Squared loss functions are not robust to outliers!
Squared loss functions are not sufficiently robust to outliers. This means that in the presence of outliers, using squared loss functions may lead to larger errors, as they heavily influence the process of minimizing the error.
 Outliers occur at object boundaries (violation of smoothness/regularizer)
Outliers typically occur at object boundaries, violating the assumptions of smoothness and regularization. Object boundaries usually have sharp transitions, so outliers may occur in these areas.
 Outliers occur at specular highlights (violation of photoconsistency/data term)
Outliers may also occur at specular highlights, violating the assumptions of photoconsistency and data terms. Specular highlights often result in very high brightness or intensity in image regions, which may differ significantly from the surrounding areas, thus being considered outliers.
Learning in Graphical Models¶
Conditional Random Fields¶
\(p(x_1,...,x_{100})= \frac{1}{\bold{Z}}exp\{\sum_iψ_i(x_i)+λ\sum_{i\to j}ψ_{ij}(x_i,x_j)\}\) * How to estimate the parameters , say \(\lambda\)
Parameter Estimation¶
 Refer to PPT
Deep Structured Models¶
 Refer to PPT
创建日期: 2023年11月11日 00:20:53