torch_geometric.nn.conv.MeshCNNConv
- class MeshCNNConv(in_channels: int, out_channels: int, kernels: Optional[ModuleList] = None)[source]
Bases:
MessagePassing
The convolutional layer introduced by the paper “MeshCNN: A Network With An Edge”.
Recall that, given a set of categories \(C\), MeshCNN is a function that takes as its input a triangular mesh \(\mathcal{m} = (V, F) \in \mathbb{R}^{|V| \times 3} \times \{0,...,|V|-1\}^{3 \times |F|}\), and returns as its output a \(|C|\)-dimensional vector, whose \(i\) th component denotes the probability of the input mesh belonging to category \(c_i \in C\).
Let \(X^{(k)} \in \mathbb{R}^{|E| \times \text{Dim-Out}(k)}\) denote the output value of the prior (e.g. \(k\) th ) layer of our neural network. The \(i\) th row of \(X^{(k)}\) is a \(\text{Dim-Out}(k)\)-dimensional vector that represents the features computed by the \(k\) th layer for edge \(e_i\) of the input mesh \(\mathcal{m}\). Let \(A \in \{0, ..., |E|-1\}^{2 \times 4*|E|}\) denote the edge adjacency matrix of our input mesh \(\mathcal{m}\). The \(j\) th column of \(A\) returns a pair of indices \(k,l \in \{0,...,|E|-1\}\), which means that edge \(e_k\) is adjacent to edge \(e_l\) in our input mesh \(\mathcal{m}\). The definition of edge adjacency in a triangular mesh is illustrated in Figure 1. In a triangular mesh, each edge \(e_i\) is expected to be adjacent to exactly \(4\) neighboring edges, hence the number of columns of \(A\): \(4*|E|\). We write the neighborhood of edge \(e_i\) as \(\mathcal{N}(i) = (a(i), b(i), c(i), d(i))\) where
1. \(a(i)\) denotes the index of the first counter-clockwise edge of the face above \(e_i\).
2. \(b(i)\) denotes the index of the second counter-clockwise edge of the face above \(e_i\).
3. \(c(i)\) denotes the index of the first counter-clockwise edge of the face below \(e_i\).
4. \(d(i)\) denotes the index of the second counter-clockwise edge of the face below \(e_i\).
Figure 1: The neighbors of edge \(\mathbf{e_1}\) are \(\mathbf{e_2}, \mathbf{e_3}, \mathbf{e_4}\) and \(\mathbf{e_5}\), respectively. We write this as \(\mathcal{N}(1) = (a(1), b(1), c(1), d(1)) = (2, 3, 4, 5)\)
Because of this ordering constraint,
MeshCNNConv
requires that the columns of \(A\) be ordered in the following way:\[\begin{split}&A[:,0] = (0, \text{The index of the "a" edge for edge } 0) \\ &A[:,1] = (0, \text{The index of the "b" edge for edge } 0) \\ &A[:,2] = (0, \text{The index of the "c" edge for edge } 0) \\ &A[:,3] = (0, \text{The index of the "d" edge for edge } 0) \\ \vdots \\ &A[:,4*|E|-4] = \bigl(|E|-1, a\bigl(|E|-1\bigr)\bigr) \\ &A[:,4*|E|-3] = \bigl(|E|-1, b\bigl(|E|-1\bigr)\bigr) \\ &A[:,4*|E|-2] = \bigl(|E|-1, c\bigl(|E|-1\bigr)\bigr) \\ &A[:,4*|E|-1] = \bigl(|E|-1, d\bigl(|E|-1\bigr)\bigr)\end{split}\]Stated a bit more compactly, for every edge \(e_i\) in the input mesh, \(A\), should have the following entries
\[\begin{split}A[:, 4*i] &= (i, a(i)) \\ A[:, 4*i + 1] &= (i, b(i)) \\ A[:, 4*i + 2] &= (i, c(i)) \\ A[:, 4*i + 3] &= (i, d(i))\end{split}\]To summarize so far, we have defined 3 things:
1. The activation of the prior (e.g. \(k\) th) layer, \(X^{(k)} \in \mathbb{R}^{|E| \times \text{Dim-Out}(k)}\)
2. The edge adjacency matrix and the definition of edge adjacency. \(A \in \{0,...,|E|-1\}^{2 \times 4*|E|}\)
The ways the columns of \(A\) must be ordered.
We are now finally able to define the
MeshCNNConv
class/layer. In the following definition we assumeMeshCNNConv
is at the \(k+1\) th layer of our neural network.The
MeshCNNConv
layer is a function,\[\text{MeshCNNConv}^{(k+1)}(X^{(k)}, A) = X^{(k+1)},\]that, given the prior layer’s output \(X^{(k)} \in \mathbb{R}^{|E| \times \text{Dim-Out}(k)}\) and the edge adjacency matrix \(A\) of the input mesh (graph) \(\mathcal{m}\) , returns a new edge feature tensor \(X^{(k+1)} \in \mathbb{R}^{|E| \times \text{Dim-Out}(k+1)}\), where the \(i\) th row of \(X^{(k+1)}\), denoted by \(x^{(k+1)}_i\), represents the \(\text{Dim-Out}(k+1)\)-dimensional feature vector of edge \(e_i\), and is defined as follows:
\[\begin{split}x^{(k+1)}_i &= W^{(k+1)}_0 x^{(k)}_i \\ &+ W^{(k+1)}_1 \bigl| x^{(k)}_{a(i)} - x^{(k)}_{c(i)} \bigr| \\ &+ W^{(k+1)}_2 \bigl( x^{(k)}_{a(i)} + x^{(k)}_{c(i)} \bigr) \\ &+ W^{(k+1)}_3 \bigl| x^{(k)}_{b(i)} - x^{(k)}_{d(i)} \bigr| \\ &+ W^{(k+1)}_4 \bigl( x^{(k)}_{b(i)} + x^{(k)}_{d(i)} \bigr).\end{split}\]\(W_0^{(k+1)},W_1^{(k+1)},W_2^{(k+1)},W_3^{(k+1)}, W_4^{(k+1)} \in \mathbb{R}^{\text{Dim-Out}(k+1) \times \text{Dim-Out}(k)}\) are trainable linear functions (i.e. “the weights” of this layer). \(x_i\) is the \(\text{Dim-Out}(k)\)-dimensional feature of edge \(e_i\) vector computed by the prior (e.g. \(k\)) th layer. \(x^{(k)}_{a(i)}, x^{(k)}_{b(i)}, x^{(k)}_{c(i)}\), and \(x^{(k)}_{d(i)}\) are the \(\text{Dim-Out}(k)\)-feature vectors, computed in the \(k\) th layer, that are associated with the \(4\) neighboring edges of \(e_i\).
- Parameters:
in_channels (int) – Corresponds to \(\text{Dim-Out}(k)\) in the above overview. This represents the output dimension of the prior layer. For the given input mesh \(\mathcal{m} = (V, F)\), the prior layer is expected to output a \(X \in \mathbb{R}^{|E| \times \textit{in_channels}}\) feature matrix. Assuming the instance of this class is situated at layer \(k+1\), we write that \(X^{(k)} \in \mathbb{R}^{|E| \times \textit{in_channels}}\).
out_channels (int) – Corresponds to \(\text{Dim-Out}(k+1)\) in the above overview. This represents the output dimension of this layer. Assuming the instance of this class is situated at layer \(k+1\), we write that \(X^{(k+1)} \in \mathbb{R}^{|E| \times \textit{out_channels}}\).
kernels (torch.nn.ModuleList, optional) – A list of length of 5, where each element is a
torch.nn.module
(i.e a neural network), that each MUST take as input a vector of dimension :obj:in_channels and return a vector of dimensionout_channels
. In particular, obj:kernels[0] is \(W^{(k+1)}_0\) in the above overview (seeMeshCNNConv
), obj:kernels[1] is \(W^{(k+1)}_1\), obj:kernels[2] is \(W^{(k+1)}_2\), obj:kernels[3] is \(W^{(k+1)}_3\) obj:kernels[4] is \(W^{(k+1)}_4\). Note that this input is optional, in which case each of the 5 elements in the kernels will be a linear neural networktorch.nn.modules.Linear
correctly configured to take as inputin_channels
-dimensional vectors and return a vector of dimensionsout_channels
.
- Discussion:
The key difference that separates
MeshCNNConv
from a traditional message passing graph neural network is thatMeshCNNConv
requires the set of neighbors for a node \(\mathcal{N}(u) = (v_1, v_2, ...)\) to be an ordered set (i.e. a tuple). In fact,MeshCNNConv
goes further, requiring that \(\mathcal{N}(u)\) always return a set of size \(4\). This is different to most message passing graph neural networks, which assume that \(\mathcal{N}(u) = \{v_1, v_2, ...\}\) returns an ordered set. This lendsMeshCNNConv
more expressive power, at the cost of no longer being permutation invariant to \(\mathbb{S}_4\). Put more plainly, in tradition message passing GNNs, the network is unable to distinguish one neighboring node from another. In contrast, inMeshCNNConv
, each of the 4 neighbors has a “role”, either the “a”, “b”, “c”, or “d” neighbor. We encode this fact by requiring that \(\mathcal{N}\) return the 4-tuple, where the first component is the “a” neighbor, and so on.To summarize this comparison, it may re-define
MeshCNNConv
in terms of \(\text{UPDATE}\) and \(\text{AGGREGATE}\) functions, which is a general way to define a traditional GNN layer. If we let \(x_i^{(k+1)}\) denote the output of a GNN layer for node \(i\) at layer \(k+1\), and let \(\mathcal{N}(i)\) denote the set of nodes adjacent to node \(i\), then we can describe the \(k+1\) th layer as traditional GNN as\[x_i^{(k+1)} = \text{UPDATE}^{(k+1)}\bigl(x^{(k)}_i, \text{AGGREGATE}^{(k+1)}\bigl(\mathcal{N}(i)\bigr)\bigr).\]Here, \(\text{UPDATE}^{(k+1)}\) is a function of \(2\) \(\text{Dim-Out}(k)\)-dimensional vectors, and returns a \(\text{Dim-Out}(k+1)\)-dimensional vector. \(\text{AGGREGATE}^{(k+1)}\) function is a function of a unordered set of nodes that are neighbors of node \(i\), as defined by \(\mathcal{N}(i)\). Usually the size of this set varies across different nodes \(i\), and one of the most basic examples of such a function is the “sum aggregation”, defined as \(\text{AGGREGATE}^{(k+1)}(\mathcal{N}(i)) = \sum_{j \in \mathcal{N}(i)} x^{(k)}_j\). See
SumAggregation
for more.In contrast, while
MeshCNNConv
‘s \(\text{UPDATE}\) function follows a tradition GNN, its \(\text{AGGREGATE}\) is a function of a tuple (i.e. an ordered set) of neighbors rather than a unordered set of neighbors. In particular, while the \(\text{UPDATE}\) function ofMeshCNNConv
for \(e_i\) is\[x_i^{(k+1)} = \text{UPDATE}^{(k+1)}(x_i^{(k)}, s_i^{(k+1)}) = W_0^{(k+1)}x_i^{(k)} + s_i^{(k+1)},\]in contrast,
MeshCNNConv
‘s \(\text{AGGREGATE}\) function is\[\begin{split}s_i^{(k+1)} = \text{AGGREGATE}^{(k+1)}(A, B, C, D) &= W_1^{(k+1)}\bigl|A - C \bigr| \\ &= W_2^{(k+1)}\bigl(A + C \bigr) \\ &= W_3^{(k+1)}\bigl|B - D \bigr| \\ &= W_4^{(k+1)}\bigl(B + D \bigr),\end{split}\]where \(A=x_{a(i)}^{(k)}, B=x_{b(i)}^{(k)}, C=x_{c(i)}^{(k)},\) and \(D=x_{d(i)}^{(k)}\).
The \(i\) th row of \(V \in \mathbb{R}^{|V| \times 3}\) holds the cartesian \(xyz\) coordinates for node \(v_i\) in the mesh, and the \(j\) th column in \(F \in \{1,...,|V|\}^{3 \times |V|}\) holds the \(3\) indices \((k,l,m)\) that correspond to the \(3\) nodes \((v_k, v_l, v_m)\) that construct face \(j\) of the mesh.
- forward(x: Tensor, edge_index: Tensor)[source]
Forward pass.
- Parameters:
x (torch.Tensor) – \(X^{(k)} \in \mathbb{R}^{|E| \times \textit{in_channels}}\). The edge feature tensor returned by the prior layer (e.g. \(k\)). The tensor is of shape \(|E| \times \text{Dim-Out}(k)\), or equivalently,
(|E|, self.in_channels)
.edge_index (torch.Tensor) –
\(A \in \{0,...,|E|-1\}^{2 \times 4*|E|}\). The edge adjacency tensor of the networks input mesh \(\mathcal{m} = (V, F)\). The edge adjacency tensor MUST have the following form:
\[\begin{split}&A[:,0] = (0, \text{The index of the "a" edge for edge } 0) \\ &A[:,1] = (0, \text{The index of the "b" edge for edge } 0) \\ &A[:,2] = (0, \text{The index of the "c" edge for edge } 0) \\ &A[:,3] = (0, \text{The index of the "d" edge for edge } 0) \\ \vdots \\ &A[:,4*|E|-4] = \bigl(|E|-1, a\bigl(|E|-1\bigr)\bigr) \\ &A[:,4*|E|-3] = \bigl(|E|-1, b\bigl(|E|-1\bigr)\bigr) \\ &A[:,4*|E|-2] = \bigl(|E|-1, c\bigl(|E|-1\bigr)\bigr) \\ &A[:,4*|E|-1] = \bigl(|E|-1, d\bigl(|E|-1\bigr)\bigr)\end{split}\]See
MeshCNNConv
for what “index of the ‘a’(b,c,d) edge for edge i” means, and also for the general definition of edge adjacency in MeshCNN. These definitions are also provided in the paper itself.
- Returns:
\(X^{(k+1)} \in \mathbb{R}^{|E| \times \textit{out_channels}}\). The edge feature tensor for this (e.g. the \(k+1\) th) layer. The \(i\) th row of \(X^{(k+1)}\) is computed according to the formula
\[\begin{split}x^{(k+1)}_i &= W^{(k+1)}_0 x^{(k)}_i \\ &+ W^{(k+1)}_1 \bigl| x^{(k)}_{a(i)} - x^{(k)}_{c(i)} \bigr| \\ &+ W^{(k+1)}_2 \bigl( x^{(k)}_{a(i)} + x^{(k)}_{c(i)} \bigr) \\ &+ W^{(k+1)}_3 \bigl| x^{(k)}_{b(i)} - x^{(k)}_{d(i)} \bigr| \\ &+ W^{(k+1)}_4 \bigl( x^{(k)}_{b(i)} + x^{(k)}_{d(i)} \bigr),\end{split}\]where \(W_0^{(k+1)},W_1^{(k+1)}, W_2^{(k+1)},W_3^{(k+1)}, W_4^{(k+1)} \in \mathbb{R}^{\text{Dim-Out}(k+1) \times \text{Dim-Out}(k)}\) are the trainable linear functions (i.e. the trainable “weights”) of this layer, and \(x^{(k)}_{a(i)}, x^{(k)}_{b(i)}, x^{(k)}_{c(i)}\), \(x^{(k)}_{d(i)}\) are the \(\text{Dim-Out}(k)\)-dimensional edge feature vectors computed by the prior (\(k\) th) layer, that are associated with the \(4\) neighboring edges of \(e_i\).
- Return type: