torch_geometric.nn.attention.QFormer
- class QFormer(input_dim: int, hidden_dim: int, output_dim: int, num_heads: int, num_layers: int, dropout: float = 0.0, activation: Callable = ReLU())[source]
Bases:
Module
The Querying Transformer (Q-Former) from “BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models” paper.
- Parameters:
input_dim (int) – The number of features in the input.
hidden_dim (int) – The dimension of the fnn in the encoder layer.
output_dim (int) – The final output dimension.
num_heads (int) – The number of multi-attention-heads.
num_layers (int) – The number of sub-encoder-layers in the encoder.
dropout (int) – The dropout value in each encoder layer.
Note
This is a simplified version of the original Q-Former implementation.