torch_geometric.llm

LargeGraphIndexer

For a dataset that consists of multiple subgraphs that are assumed to be part of a much larger graph, collate the values into a large graph store to save resources.

RAGQueryLoader

Loader meant for making RAG queries from a remote backend.

class LargeGraphIndexer(nodes: Iterable[str], edges: Iterable[Tuple[str, str, str]], node_attr: Optional[Dict[str, List[Any]]] = None, edge_attr: Optional[Dict[str, List[Any]]] = None)[source]

For a dataset that consists of multiple subgraphs that are assumed to be part of a much larger graph, collate the values into a large graph store to save resources.

classmethod from_triplets(triplets: Iterable[Tuple[str, str, str]], pre_transform: Optional[Callable[[Tuple[str, str, str]], Tuple[str, str, str]]] = None) LargeGraphIndexer[source]

Generate a new index from a series of triplets that represent edge relations between nodes. Formatted like (source_node, edge, dest_node).

Parameters:
  • triplets (KnowledgeGraphLike) – Series of triplets representing knowledge graph relations. Example: [(“cats”, “eat”, dogs”)]. Note: Please ensure triplets are unique.

  • pre_transform (Optional[Callable[[TripletLike], TripletLike]]) – Optional preprocessing function to apply to triplets. Defaults to None.

Returns:

Index of unique nodes and edges.

Return type:

LargeGraphIndexer

classmethod collate(graphs: Iterable[LargeGraphIndexer]) LargeGraphIndexer[source]

Combines a series of large graph indexes into a single large graph index.

Parameters:

graphs (Iterable[LargeGraphIndexer]) – Indices to be combined.

Returns:

Singular unique index for all nodes and edges

in input indices.

Return type:

LargeGraphIndexer

get_unique_node_features(feature_name: str = 'pid') List[str][source]

Get all the unique values for a specific node attribute.

Parameters:

feature_name (str, optional) – Name of feature to get. Defaults to NODE_PID.

Returns:

List of unique values for the specified feature.

Return type:

List[str]

add_node_feature(new_feature_name: str, new_feature_vals: Union[Sequence[Any], Tensor], map_from_feature: str = 'pid') None[source]
Adds a new feature that corresponds to each unique node in

the graph.

Parameters:
  • new_feature_name (str) – Name to call the new feature.

  • new_feature_vals (FeatureValueType) – Values to map for that new feature.

  • map_from_feature (str, optional) – Key of feature to map from. Size must match the number of feature values. Defaults to NODE_PID.

Return type:

None

get_node_features(feature_name: str = 'pid', pids: Optional[Iterable[str]] = None) List[Any][source]
Get node feature values for a given set of unique node ids.

Returned values are not necessarily unique.

Parameters:
  • feature_name (str, optional) – Name of feature to fetch. Defaults to NODE_PID.

  • pids (Optional[Iterable[str]], optional) – Node ids to fetch for. Defaults to None, which fetches all nodes.

Returns:

Node features corresponding to the specified ids.

Return type:

List[Any]

get_node_features_iter(feature_name: str = 'pid', pids: Optional[Iterable[str]] = None, index_only: bool = False) Iterator[Any][source]

Iterator version of get_node_features. If index_only is True, yields indices instead of values.

Return type:

Iterator[Any]

get_unique_edge_features(feature_name: str = 'e_pid') List[str][source]

Get all the unique values for a specific edge attribute.

Parameters:

feature_name (str, optional) – Name of feature to get. Defaults to EDGE_PID.

Returns:

List of unique values for the specified feature.

Return type:

List[str]

add_edge_feature(new_feature_name: str, new_feature_vals: Union[Sequence[Any], Tensor], map_from_feature: str = 'e_pid') None[source]

Adds a new feature that corresponds to each unique edge in the graph.

Parameters:
  • new_feature_name (str) – Name to call the new feature.

  • new_feature_vals (FeatureValueType) – Values to map for that new feature.

  • map_from_feature (str, optional) – Key of feature to map from. Size must match the number of feature values. Defaults to EDGE_PID.

Return type:

None

get_edge_features(feature_name: str = 'e_pid', pids: Optional[Iterable[str]] = None) List[Any][source]
Get edge feature values for a given set of unique edge ids.

Returned values are not necessarily unique.

Parameters:
  • feature_name (str, optional) – Name of feature to fetch. Defaults to EDGE_PID.

  • pids (Optional[Iterable[str]], optional) – Edge ids to fetch for. Defaults to None, which fetches all edges.

Returns:

Node features corresponding to the specified ids.

Return type:

List[Any]

get_edge_features_iter(feature_name: str = 'e_pid', pids: Optional[Iterable[Tuple[str, str, str]]] = None, index_only: bool = False) Iterator[Any][source]

Iterator version of get_edge_features. If index_only is True, yields indices instead of values.

Return type:

Iterator[Any]

to_data(node_feature_name: str, edge_feature_name: Optional[str] = None) Data[source]
Return a Data object containing all the specified node and

edge features and the graph.

Parameters:
  • node_feature_name (str) – Feature to use for nodes

  • edge_feature_name (Optional[str], optional) – Feature to use for edges. Defaults to None.

Returns:

Data object containing the specified node and

edge features and the graph.

Return type:

Data

class RAGQueryLoader(graph_data: Tuple[RAGFeatureStore, RAGGraphStore], subgraph_filter: Optional[Callable[[Data, Any], Data]] = None, augment_query: bool = False, vector_retriever: Optional[VectorRetriever] = None, config: Optional[Dict[str, Any]] = None)[source]

Loader meant for making RAG queries from a remote backend.

property config

Get the config for the RAGQueryLoader.

query(query: Any) Data[source]

Retrieve a subgraph associated with the query with all its feature attributes.

Return type:

Data

Models

SentenceTransformer

A wrapper around a Sentence-Transformer from HuggingFace.

VisionTransformer

A wrapper around a Vision-Transformer from HuggingFace.

LLM

A wrapper around a Large Language Model (LLM) from HuggingFace.

LLMJudge

Uses NIMs to score a triple of (question, model_pred, correct_answer) This whole class is an adaptation of Gilberto's work for PyG.

TXT2KG

A class to convert text data into a Knowledge Graph (KG) format.

GRetriever

The G-Retriever model from the "G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering" paper.

MoleculeGPT

The MoleculeGPT model from the "MoleculeGPT: Instruction Following Large Language Models for Molecular Property Prediction" paper.

GLEM

This GNN+LM co-training model is based on GLEM from the "Learning on Large-scale Text-attributed Graphs via Variational Inference" paper.

ProteinMPNN

The ProteinMPNN model from the "Robust deep learning--based protein sequence design using ProteinMPNN" paper.

GITMol

The GITMol model from the "GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text" paper.

Utils

KNNRAGFeatureStore

A feature store that uses a KNN-based retrieval.

NeighborSamplingRAGGraphStore

Neighbor sampling based graph-store to store & retrieve graph data.

DocumentRetriever

Retrieve documents from a vector database.