Geometric deep learning is the field of research that tries to apply the success of deep learning to non-euclidean data
Deep learning have proven success over numerous modalities
Timeline of deep learning architectures until 2017 - (Jure Leskovec, Stanford University)
Training datasets sizes over time. Left: language; right: vision.
Multi-Layer Perceptron (MLP) Deep neural network consisting of \(L\) layers
Linear layer \(\mathbf{g}^{(k)} = \xi \left( \mathbf{W}^{(k)} \mathbf{g}^{(k-1)} \right)\)
Activation, e.g. \(\xi(x) = \max \{ x, 0 \}\) rectified linear unit (ReLU)
Parameters weights of all layers \(\mathbf{W}^{(1)}, \ldots, \mathbf{W}^{(L)}\) (including biases)
Applying MLPs directly on the input data is usually too inefficient!
An RGB image of size \(512 \times 512\) leads to input size:
\(f_{\text{in}} = 512^2 \times 3 \approx 10^6\) nodes.
If \(f_{\text{out}} = f_{\text{in}}\) then a single-layer
MLP would have \(\approx 10^{12}\) trainable parameters!
Need to exploit structure in the data!
Geometric deep learning is the field of research that tries to apply the success of deep learning to non-euclidean data.
Success of deep learning : Have a large scale dataset. Applying the original MLP idea, but adapt the linear and non-linear to the underlying structure of data.
Euclidean data The representation space of data is \(\mathbb{R}^n\).
Non-euclidean data The representation space is non-flat. It can mean that:
Different representations of 3D shapes. Image from Silvia Sellan.
Comparison of dataset sizes
Left: different transformations of a human shape. Right: Different discretization of the same human shape.
Left: linear and geodesic paths in the shape space. Right: the human shape space is non linear.
The human shape space can be represented by the manifold of immersions of a human template, equipped with a well-designed Riemannian metric.
By making computation on those manifolds, we can beat complex deep learning architectures!
Comparison of Bare-ESA interpolation against deep learning baselines.
Geometric deep learning is the field of research that tries to apply the success of deep learning to non-euclidean data.
Success of deep learning : Have a large scale dataset. Applying the original MLP idea, but adapt the linear and non-linear to the underlying structure of data.
Non-euclidean data : The geometry of the data space is important! Plus, the datasets size are smaller (in medical imaging, some datasets have only … a few hundreds shapes or less)
Advantages
Limitations
Voxelize a surface into a \(\mathbb{R}^{d*d*d}\) grid.
A simple network (shared MLP).
Processing each point independently! We will learn a function from 3D coordinates to a label. No communication between points = “shape awareness”.
Reshape input to a matrix X of size (3*N). Fully connected layers (MLP) from X to a C-dimensional vector.
Many works have tried to tackle this problem, including :
Current state-of-the-art: PointTransformer v3 (2024). However, research remains active in this area.
Morphable model of faces
Simple model, Besnier et al. 2023
Comparisons against FLAME morphable model
Reconstruction of unregistered faces
Advantages
Disadvantages
Robustness to parameterizations
PointNet is not robust to extreme cases of reparameterizations (Besnier et al. 2022)
Le-Net 5 architecture, 1999
Convolutions exploits the structure of data! Can we apply them to 3D?
-> This field have been very active. We will cover only two solutions.
Two cases:
Let \(x\) be a mesh vertex, \(R^d(x)\) be the d-ring, \(R^d_j(x)\) denotes the jth element in the d-ring. The spiral patch operator is defined as: \[ S(x) = \{x, R^1_1(x) R^1_2(x),..., R^h_{|Rh|}(x)\} \] The spiral convolution of a signal \(f\), with a filter \(g\) is defined by: \[ (f \ast g) = \sum_{\ell=1}^{L} g_{\ell}\, f\!\left(S_{\ell}(x)\right) \]
The pooling is done using mesh downsampling like in CNNs. The features are extracted in a hierarchical way!
Robustness to parameterizations
SpiralNet is not robust to reparameterizations (Besnier et al. 2022)
and more …
Advantages
Disadvantages
Idea: Define a parameterized convolutions and apply it to patches (Geodesic CNNs).
Problem: Need to define the patch integration (costly)
Advantages
Disadvantages
Geometric deep learning is the field of research that tries to apply the success of deep learning to non-euclidean data.
In the specific case of non-rigid shapes, it is used successfully for:
Overview of CAT3D
Overview of TRELLIS
Overview of Molmo
Transformers are able to learn (partly) the underlying structure of data with the attention mechanism. They can be applied in any context.