torch_geometric.nn.models.GPSE

class GPSE(dim_in: int = 20, dim_out: int = 51, dim_inner: int = 512, layer_type: str = 'resgatedgcnconv', layers_pre_mp: int = 1, layers_mp: int = 20, layers_post_mp: int = 2, num_node_targets: int = 51, num_graph_targets: int = 11, stage_type: str = 'skipsum', has_bn: bool = True, head_bn: bool = False, final_l2norm: bool = True, has_l2norm: bool = True, dropout: float = 0.2, has_act: bool = True, final_act: bool = True, act: str = 'relu', virtual_node: bool = True, multi_head_dim_inner: int = 32, graph_pooling: str = 'add', use_repr: bool = True, repr_type: str = 'no_post_mp', bernoulli_threshold: float = 0.5)[source]

Bases: Module

The Graph Positional and Structural Encoder (GPSE) model from the “Graph Positional and Structural Encoder” paper.

The GPSE model consists of a (1) deep GNN that consists of stacked message passing layers, and a (2) prediction head to predict pre-computed positional and structural encodings (PSE). When used on downstream datasets, these prediction heads are removed and the final fully-connected layer outputs are used as learned PSE embeddings.

GPSE also provides a static method from_pretrained() to load pre-trained GPSE models trained on a variety of molecular datasets.

from torch_geometric.nn import GPSE, GPSENodeEncoder
from torch_geometric.transforms import AddGPSE
from torch_geometric.nn.models.gpse import precompute_GPSE

gpse_model = GPSE.from_pretrained('molpcba')

# Option 1: Precompute GPSE encodings in-place for a given dataset
dataset = ZINC(path, subset=True, split='train')
precompute_gpse(gpse_model, dataset)

# Option 2: Use the GPSE model with AddGPSE as a pre_transform to save
# the encodings
dataset = ZINC(path, subset=True, split='train',
               pre_transform=AddGPSE(gpse_model, vn=True,
               rand_type='NormalSE'))

Both approaches append the generated encodings to the pestat_GPSE attribute of Data objects. To use the GPSE encodings for a downstream task, one may need to add these encodings to the x attribute of the Data objects. To do so, one can use the GPSENodeEncoder provided to map these encodings to a desired dimension before appending them to x.

Let’s say we have a graph dataset with 64 original node features, and we have generated GPSE encodings of dimension 32, i.e. data.pestat_GPSE = 32. Additionally, we want to use a GNN with an inner dimension of 128. To do so, we can map the 32-dimensional GPSE encodings to a higher dimension of 64, and then append them to the x attribute of the Data objects to obtain a 128-dimensional node feature representation. GPSENodeEncoder handles both this mapping and concatenation to x, the outputs of which can be used as input to a GNN:

encoder = GPSENodeEncoder(dim_emb=128, dim_pe_in=32, dim_pe_out=64,
                          expand_x=False)
gnn = GNN(...)

for batch in loader:
    x = encoder(batch.x, batch.pestat_GPSE)
    out = gnn(x, batch.edge_index)
Parameters:
  • dim_in (int, optional) – Input dimension. (default: 20)

  • dim_out (int, optional) – Output dimension. (default: 51)

  • dim_inner (int, optional) – Width of the encoder layers. (default: 512)

  • layer_type (str, optional) – Type of graph convolutional layer for message-passing. (default: resgatedgcnconv)

  • layers_pre_mp (int, optional) – Number of MLP layers before message-passing. (default: 1)

  • layers_mp (int, optional) – Number of layers for message-passing. (default: 20)

  • layers_post_mp (int, optional) – Number of MLP layers after message-passing. (default: 2)

  • num_node_targets (int, optional) – Number of individual PSEs used as node-level targets in pretraining GPSE. (default: 51)

  • num_graph_targets (int, optional) – Number of graph-level targets used in pretraining GPSE. (default: 11)

  • stage_type (str, optional) – The type of staging to apply. Possible values are: skipsum, skipconcat. Any other value will default to no skip connections. (default: skipsum)

  • has_bn (bool, optional) – Whether to apply batch normalization in the layer. (default: True)

  • final_l2norm (bool, optional) – Whether to apply L2 normalization to the outputs. (default: True)

  • has_l2norm (bool, optional) – Whether to apply L2 normalization after

  • (default (of virtual nodes.) – True)

  • dropout (float, optional) – Dropout ratio at layer output. (default: 0.2)

  • has_act (bool, optional) – Whether has activation after the layer. (default: True)

  • final_act (bool, optional) – Whether to apply activation after the layer stack. (default: True)

  • act (str, optional) – Activation to apply to layer output if has_act is True. (default: relu)

  • virtual_node (bool, optional) – Whether a virtual node is added to graphs in GPSE computation. (default: True)

  • multi_head_dim_inner (int, optional) – Width of MLPs for PSE target prediction heads. (default: 32)

  • graph_pooling (str, optional) – Type of graph pooling applied before post_mp. Options are add, max, mean. (default: add)

  • use_repr (bool, optional) – Whether to use the hidden representation of the final layer as GPSE encodings. (default: True)

  • repr_type (str, optional) – Type of representation to use. Options are no_post_mp, one_layer_before. (default: no_post_mp)

  • bernoulli_threshold (float, optional) – Threshold for Bernoulli sampling

  • (default0.5)

forward(batch)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_parameters()[source]
classmethod from_pretrained(name: str, root: str = 'GPSE_pretrained')[source]

Returns a pretrained GPSE model on a dataset.

Parameters:
  • name (str) – The name of the dataset ("molpcba", "zinc", "pcqm4mv2", "geom", "chembl").

  • root (str, optional) – The root directory to save the pre-trained model. (default: "GPSE_pretrained")