torch_geometric.nn.attention.PolynormerAttention

class PolynormerAttention(channels: int, heads: int, head_channels: int = 64, beta: float = 0.9, qkv_bias: bool = False, qk_shared: bool = True, dropout: float = 0.0)[source]

Bases: Module

The polynomial-expressive attention mechanism from the “Polynormer: Polynomial-Expressive Graph Transformer in Linear Time” paper.

Parameters:

channels (int) – Size of each input sample.
heads (int, optional) – Number of parallel attention heads.
head_channels (int, optional) – Size of each attention head. (default: 64.)
beta (float, optional) – Polynormer beta initialization. (default: 0.9)
qkv_bias (bool, optional) – If specified, add bias to query, key and value in the self attention. (default: False)
qk_shared (bool optional) – Whether weight of query and key are shared. (default: True)
dropout (float, optional) – Dropout probability of the final attention output. (default: 0.0)

forward(x: Tensor, mask: Optional[Tensor] = None) → Tensor[source]

Forward pass.

Parameters:

x (torch.Tensor) – Node feature tensor \(\mathbf{X} \in \mathbb{R}^{B \times N \times F}\), with batch-size \(B\), (maximum) number of nodes \(N\) for each graph, and feature dimension \(F\).
mask (torch.Tensor, optional) – Mask matrix \(\mathbf{M} \in {\{ 0, 1 \}}^{B \times N}\) indicating the valid nodes for each graph. (default: None)

Return type:

Tensor

reset_parameters() → None[source]

Return type:: None