Coordinate-Based Networks

Implicit Neural Representations¶

What is a good output representation?¶

Voxels

Flat surface only in the canonical directions

Points

No surface

Meshes

#### Occupancy Networks

Also possible: signed distance field [Park et al., 2019]

Equivalent to a neural network for binary classification, except that we are interested in the decision boundary which implicitly represents the object’s surface.

\(L(\theta,ψ) = \sum_{j=1}^KBCE(f_θ(p_{ij},z_{i}),o_{ij})+KL[q_ψ(z|(p_{ij},o_{ij})_{j=1:K})∥p0(z)]\)

BCD ： Binaray Cross-Entropy Loss
K : Randomly sampled 3D points (K = 2048)
\(q_ψ\): Encoder

Representing Materials and Lighting¶

Conditional Surface Light Field

model p,v,l to a 3D colour value

Given any light & view direction, we want to know the color

Representing Motion¶

Extending Occupancy Networks to 4D is hard (curse of dimensionality)
Represent shape at \(t = 0\) using a 3D Occupancy Network
Represent motion by temporally and spatially continuous vector field
Relationship between 3D trajectorys and velocity \(v\) given by (differentiable) ODE: \(\frac{\partial s(t)}{\partial t} = v(s(t),t)\)

Representing Scenes¶

Problems for occupancy network

Global latent code ⇒ no local information, overly smooth geometry
Fully connected architecture ⇒ does not exploit translation equivariance

Implicit models work well for simple objects but poorly on complex scenes

Convolutional Occupancy Networks¶

Differentiable Volumetric Rendering¶

Learning from Images¶

Given only RGB images

Forward Pass (Rendering)¶

Secant Method : A finite-difference approximation of Newton’s method.

Backward Pass(Differentiation)¶

Comes to Backward Pass

Image Observation I
Loss \(L(\hat{I},I) = \sum_u||\hat{I}_u-I_u||\)
Gradient of Loss Function

\(\frac{\partial{L}}{\partial{\theta}} = \sum_u \frac{\partial L}{\partial\hat{I_u}}·\frac{\partial{\hat{I_u}}}{\partial\theta}\)

\(\frac{\partial{\hat{I_u}}}{\partial\theta}=\frac{\partial{t_{\theta}(\hat{p})}}{\partial\theta} +\frac{\partial{t_{\theta}(\hat{p})}}{\partial\hat{p}}·\frac{\partial{\hat{p}}}{\partial\theta}\)

Note : \(\hat{p}\) (The Surface Intersection) also depends on \(\theta\) !!

So we have to apply total derivative !!
Differentiation of \(f_{\theta}(\hat{p}=\tau)\) yields :

\(\frac{\partial\hat{p}}{\partial\theta}=-w(\frac{\partial f_{\theta}(\hat{p})}{\partial\hat{p}}·w)^{-1}\frac{\partial f_{\theta}(\hat{p})}{\partial\theta}\)

Derivation :

Note \(\bold{w}\) is a vector!

Neural Radiance Fields¶

Task: Given a set of images of a scene, render image from novel viewpoint

A Nerual Network related to a singls scene ! Like "overfit" the nerual network to a single scene!

NeRF Training¶

\(min_{\theta}=\sum_i||render_i(F_{\theta})-I_i||^2\)

Shoot ray, render ray to pixel, minimize reconstruction error via backpropagation

NeRF’s parameters are optimized on many different views of a single scene

From the Paper :

We encourage the representation to be multiview consistent by restricting the network to predict the volume density σ as a function of only the location x, while allowing the RGB color c to be predicted as a function of both location and viewing direction.

To accomplish this, the MLP \(F_Θ\) first processes the input 3D coordinate x with 8 fully-connected layers (using ReLU activations and 256 channels per layer), and outputs σ and a 256-dimensional feature vector.

This feature vector is then concatenated with the camera ray’s viewing direction and passed to one additional fully-connected layer (using a ReLU activation and 128 channels) that output the view-dependent RGB color.

Volume Rendering with Radiance Fields¶

From the Paper :

The volume density \(σ(x)\) can be interpreted as the differential probability of a ray terminating at an infinitesimal particle at location \(x\).
The expected color \(C(r)\) of camera ray \(r(t) = o + td\) with near and far bounds \(t_n\) and \(t_f\) is:

\(T(t)\) probability that the ray doesn't hit anything

Optimizing a Neural Radiance Field¶

Positional encoding¶

Having the network \(F_Θ\) directly operate on \(xyzθφ\) input coordinates results in renderings that perform poorly at representing high-frequency variation in color and geometry
(Recap Transformers)

Pass low-dimensional coordinates through fixed positional encoding or random Fourier features of varying frequencies controlled by L
These features let networks learn high-freq. functions in low-dim. domains

Generative Radiance Fields¶

Refer to Slides

最后更新: 2024年3月25日 12:53:47
创建日期: 2024年2月26日 15:13:14