Tensorflow API

Vector-input functions

class geometric_algebra_attention.tensorflow.VectorAttention(n_dim, *args, **kwargs)[source]

Calculates geometric product attention.

This layer implements a set of geometric products over all tuples of length rank, then sums over them using an attention mechanism to perform a permutation-covariant (reduce=False) or permutation-invariant (reduce=True) result.

VectorAttention calculates attention using geometric products of input vectors, whereas MultivectorAttention calculates attention over geometric products of input multivectors. Other arguments remain the same between the two classes.

The overall calculation proceeds as follows. First, geometric products \(p_{ijk...}\) are calculated for all tuples of the given rank given input (multi-)vectors \(\vec{r}_i\). Rotation-invariant attributes of the geometric products, \(q_{ijk...}\), are calculated from each product. Summary representations of the tuple \(v_{ijk...}\) are computed using the given joining- and merging-functions \(\mathcal{J}\) and \(\mathcal{M}\), per-bond embeddings \(v_i\), and as a value-generating function \(\mathcal{V}\). Attention weights are generated by softmax over score logits generated by a score-generating function \(\mathcal{S}\), before the output value is generated using a simple sum as follows:

\[\begin{split}p_{ijk...} &= \vec{r}_i\vec{r}_j\vec{r}_k ... \\ q_{ijk...} &= \text{invariants}(p_{ijk...}) \\ v_{ijk...} &= \mathcal{J}(\mathcal{V}(q_{ijk...}), \mathcal{M}(v_i, v_j, v_k, ...)) \\ w_{ijk...} &= \operatorname*{\text{softmax}}\limits_{jk...}(\mathcal{S}(v_{ijk...})) \\ y_i &= \sum\limits_{jk...} w_{ijk...} v_{ijk...}\end{split}\]

Permutation equivariance. The attention weight softmax and sum can be either performed over the remaining indices \(jk...\) or all indices \(ijk...\) according to whether a permutation-equivariant (reduce=False) or permutation-invariant (reduce=True) result is desired. Permutation-equivariant layers consume a point cloud and produce a point cloud’s worth of values, while permutation-invariant layers consume a point cloud and produce a summary value over the entire point cloud.

Geometric product modes. Different sets of geometric products can be considered for the calculation of intermediates within the layer. When invariant_mode=’single’, only the final geometric product \(p_{ijk...}\) is used to calculate rotation-invariant features for the geometric representation of the tuple. For invariant_mode=’partial’, all intermediate calculations on the way to the final product are used and concatenated: \(\vec{r}_i, \vec{r}_i\vec{r}_j, \vec{r}_i\vec{r}_j\vec{r}_k, ...\). For invariant_mode=’full’, all combinations of geometric products for the tuple up to the given rank are used and concatenated: \(\vec{r}_i, \vec{r}_j, \vec{r}_i\vec{r}_j, ...\). While non-single modes can introduce some redundancy into the calculation, they may also simplify the functions the layer must learn.

Linear terms. The set of geometric product terms \(p_{ijk...}\) can also be augmented by linear combinations such as \(p_{i}-p_{j}\) or, for multivector-valued attention, \(p_{i}+p_{jk}\). For vector-valued attention, these linear terms are generated by learning a binary softmax distribution for each product over incorporating the product or the additive inverse of the product, applying the dual as necessary to generate the correct scalar/bivector or vector/trivector output, according to the rank of the calculation. For multivector-valued attention, linear terms are generated to incorporate the product, negative product, product dual, and negative product dual before combining. Much like the geometric product modes described above, these linear terms may introduce redundancy and increase the cost of calculation, but can also improve expressivity of networks.

Parameters:
  • score_net – function producing logits for the attention mechanism

  • value_net – function producing values in the embedding dimension of the network

  • reduce – if True, produce a permutation-invariant result; otherwise, produce a permutation-covariant result

  • merge_fun – Function used to merge the input values of each tuple before being passed to join_fun: ‘mean’ (no parameters) or ‘concat’ (learned projection for each tuple position)

  • join_fun – Function used to join the representations of the rotation-invariant quantities (produced by value_net) and the tuple summary (produced by merge_fun): ‘mean’ (no parameters) or ‘concat’ (learned projection for each representation)

  • rank – Degree of correlations to consider. 2 for pairwise attention, 3 for triplet-wise attention, and so on. Memory and computational complexity scales as N**rank

  • invariant_mode – Type of rotation-invariant quantities to embed into the network. ‘single’ (use only the invariants of the final geometric product), ‘partial’ (use invariants for the intermediate steps to build the final geometric product), or ‘full’ (calculate all invariants that are possible when building the final geometric product)

  • include_normalized_products – If True, for whatever set of products that will be computed (for a given invariant_mode), also include the normalized multivector for each product

  • linear_mode – Type of geometric product terms to use to build linear combinations for the final attention mechanism: ‘partial’ (use invariants for the intermediate steps to build the final geometric product) or ‘full’ (use all invariants that are possible when building the final geometric product)

  • linear_terms – Number of linear terms to incorporate for the final attention mechanism

__call__(inputs, return_attention=False)

Evaluate the attention calculation for this layer.

class geometric_algebra_attention.tensorflow.Vector2Multivector[source]

Convert a 3D vector representation into a multivector representation.

Pads each input vector on the left with 1 zero (the scalar component) and the right with 4 zeros (the bivector components and trivector component).

class geometric_algebra_attention.tensorflow.Vector2VectorAttention(n_dim, score_net, value_net, scale_net, reduce=True, merge_fun='mean', join_fun='mean', rank=2, invariant_mode='single', covariant_mode='partial', include_normalized_products=False, linear_mode='partial', linear_terms=0, convex_covariants=False, **kwargs)[source]

Calculate rotation-covariant (vector-valued) geometric product attention.

This layer implements a set of geometric products over all tuples of length rank, then sums over them using an attention mechanism to perform a permutation-covariant (reduce=False) or permutation-invariant (reduce=True) result.

The resulting value is a (geometric) vector, and will rotate in accordance to the input vectors of the layer. The overall attention scheme is similar to VectorAttention with slight modifications, including a rescaling function \(\mathcal{R}\) and a set of geometric products \(p_n\) calculated according to the given covariant_mode and learned combination weights \(\alpha_n\); consult VectorAttention for arguments and description of geometric product modes.

\[\begin{split}p_{ijk...} &= \vec{r}_i\vec{r}_j\vec{r}_k ... \\ q_{ijk...} &= \text{invariants}(p_{ijk...}) \\ v_{ijk...} &= \mathcal{J}(\mathcal{V}(q_{ijk...}), \mathcal{M}(v_i, v_j, v_k, ...)) \\ w_{ijk...} &= \operatorname*{\text{softmax}}\limits_{jk...}(\mathcal{S}(v_{ijk...})) \\ r_i^\prime &= \sum\limits_{jk...} w_{ijk...} \mathcal{R}(v_{ijk...}) \sum\limits_{n \in ijk...} \alpha_n \text{vector}(p_n)\end{split}\]
Parameters:
  • score_net – function producing logits for the attention mechanism

  • value_net – function producing values in the embedding dimension of the network

  • scale_net – function producing a scalar rescaling value for the vectors produced by the network

  • reduce – if True, produce a permutation-invariant result; otherwise, produce a permutation-covariant result

  • merge_fun – Function used to merge the input values of each tuple before being passed to join_fun: ‘mean’ (no parameters) or ‘concat’ (learned projection for each tuple position)

  • join_fun – Function used to join the representations of the rotation-invariant quantities (produced by value_net) and the tuple summary (produced by merge_fun): ‘mean’ (no parameters) or ‘concat’ (learned projection for each representation)

  • rank – Degree of correlations to consider. 2 for pairwise attention, 3 for triplet-wise attention, and so on. Memory and computational complexity scales as N**rank

  • invariant_mode – Type of rotation-invariant quantities to embed into the network. ‘single’ (use only the invariants of the final geometric product), ‘partial’ (use invariants for the intermediate steps to build the final geometric product), or ‘full’ (calculate all invariants that are possible when building the final geometric product)

  • covariant_mode – Type of rotation-covariant quantities to use in the output calculation. ‘single’ (use only the vectors produced by the final geometric product), ‘partial’ (use all vectors for intermediate steps along the path of building the final geometric product), or ‘full’ (calculate the full set of vectors for the tuple)

  • include_normalized_products – If True, for whatever set of products that will be computed (for a given invariant_mode or covariant_mode), also include the normalized multivector for each product

  • convex_covariants – If True, use a convex combination of the rotation-covariant inputs (including the origin (0, 0, 0)) available, rather than an arbitrary linear combination

class geometric_algebra_attention.tensorflow.LabeledVectorAttention(n_dim, score_net, value_net, scale_net, reduce=True, merge_fun='mean', join_fun='mean', rank=2, invariant_mode='single', covariant_mode='partial', include_normalized_products=False, linear_mode='partial', linear_terms=0, convex_covariants=False, **kwargs)[source]

Use labels to translate one point cloud to another.

This layer calculates a new point cloud from a set of reference point cloud values and coordinates, and a query set of point cloud values. It produces one point corresponding to each query label (reduce=True) or one point cloud, corresponding to each reference point, for each query label (reduce=False).

This layer augments the per-tuple representation with an additional single set of labeled point values \(c_l\). This type of calculation can be used to implement translation between two sets of point clouds, for example. The overall attention scheme is as follows; consult VectorAttention for elaboration on arguments.

\[\begin{split}p_{ijk...} &= \vec{r}_i\vec{r}_j\vec{r}_k ... \\ q_{ijk...} &= \text{invariants}(p_{ijk...}) \\ v_{l, ijk...} &= \mathcal{J}(c_l, \mathcal{V}(q_{ijk...}), \mathcal{M}(v_i, v_j, v_k, ...)) \\ w_{l, ijk...} &= \operatorname*{\text{softmax}}\limits_{ijk...}(\mathcal{S}(v_{l, ijk...})) \\ y_l &= \sum\limits_{ijk...} w_{l, ijk...} v_{l, ijk...}\end{split}\]
Parameters:
  • score_net – function producing logits for the attention mechanism

  • value_net – function producing values in the embedding dimension of the network

  • scale_net – function producing a scalar rescaling value for the vectors produced by the network

  • reduce – if True, produce a permutation-invariant result; otherwise, produce a permutation-covariant result

  • merge_fun – Function used to merge the input values of each tuple before being passed to join_fun: ‘mean’ (no parameters) or ‘concat’ (learned projection for each tuple position)

  • join_fun – Function used to join the representations of the rotation-invariant quantities (produced by value_net) and the tuple summary (produced by merge_fun): ‘mean’ (no parameters) or ‘concat’ (learned projection for each representation)

  • rank – Degree of correlations to consider. 2 for pairwise attention, 3 for triplet-wise attention, and so on. Memory and computational complexity scales as N**rank

  • invariant_mode – Type of rotation-invariant quantities to embed into the network. ‘single’ (use only the invariants of the final geometric product), ‘partial’ (use invariants for the intermediate steps to build the final geometric product), or ‘full’ (calculate all invariants that are possible when building the final geometric product)

  • covariant_mode – Type of rotation-covariant quantities to use in the output calculation. ‘single’ (use only the vectors produced by the final geometric product), ‘partial’ (use all vectors for intermediate steps along the path of building the final geometric product), or ‘full’ (calculate the full set of vectors for the tuple)

  • include_normalized_products – If True, for whatever set of products that will be computed (for a given invariant_mode), also include the normalized multivector for each product

class geometric_algebra_attention.tensorflow.TiedVectorAttention(n_dim, score_net, value_net, scale_net, reduce=True, merge_fun='mean', join_fun='mean', rank=2, invariant_mode='single', covariant_mode='partial', include_normalized_products=False, linear_mode='partial', linear_terms=0, convex_covariants=False, **kwargs)[source]

Simultaneously calculates rotation-covariant and -invariant geometric product attention.

This layer implements a set of geometric products over all tuples of length rank, then sums over them using an attention mechanism to perform a permutation-covariant (reduce=False) or permutation-invariant (reduce=True) result.

Instead of returning a single rotation-invariant result (as VectorAttention) or rotation-equivariant result (as Vector2VectorAttention), this layer returns both rotation-invariant and -equivariant results simultaneously. The learned attention weights are “tied” in the sense that the same attention weights are used to reduce the set of rotation-invariant outputs as the rotation-equivariant outputs.

Parameters:
  • score_net – function producing logits for the attention mechanism

  • value_net – function producing values in the embedding dimension of the network

  • scale_net – function producing a scalar rescaling value for the vectors produced by the network

  • reduce – if True, produce a permutation-invariant result; otherwise, produce a permutation-covariant result

  • merge_fun – Function used to merge the input values of each tuple before being passed to join_fun: ‘mean’ (no parameters) or ‘concat’ (learned projection for each tuple position)

  • join_fun – Function used to join the representations of the rotation-invariant quantities (produced by value_net) and the tuple summary (produced by merge_fun): ‘mean’ (no parameters) or ‘concat’ (learned projection for each representation)

  • rank – Degree of correlations to consider. 2 for pairwise attention, 3 for triplet-wise attention, and so on. Memory and computational complexity scales as N**rank

  • invariant_mode – Type of rotation-invariant quantities to embed into the network. ‘single’ (use only the invariants of the final geometric product), ‘partial’ (use invariants for the intermediate steps to build the final geometric product), or ‘full’ (calculate all invariants that are possible when building the final geometric product)

  • covariant_mode – Type of rotation-covariant quantities to use in the output calculation. ‘single’ (use only the vectors produced by the final geometric product), ‘partial’ (use all vectors for intermediate steps along the path of building the final geometric product), or ‘full’ (calculate the full set of vectors for the tuple)

  • include_normalized_products – If True, for whatever set of products that will be computed (for a given invariant_mode or covariant_mode), also include the normalized multivector for each product

  • convex_covariants – If True, use a convex combination of the rotation-covariant inputs (including the origin (0, 0, 0)) available, rather than an arbitrary linear combination

Multivector-input functions

class geometric_algebra_attention.tensorflow.MultivectorAttention(n_dim, *args, **kwargs)[source]

Calculates geometric product attention.

This layer implements a set of geometric products over all tuples of length rank, then sums over them using an attention mechanism to perform a permutation-covariant (reduce=False) or permutation-invariant (reduce=True) result.

VectorAttention calculates attention using geometric products of input vectors, whereas MultivectorAttention calculates attention over geometric products of input multivectors. Other arguments remain the same between the two classes.

The overall calculation proceeds as follows. First, geometric products \(p_{ijk...}\) are calculated for all tuples of the given rank given input (multi-)vectors \(\vec{r}_i\). Rotation-invariant attributes of the geometric products, \(q_{ijk...}\), are calculated from each product. Summary representations of the tuple \(v_{ijk...}\) are computed using the given joining- and merging-functions \(\mathcal{J}\) and \(\mathcal{M}\), per-bond embeddings \(v_i\), and as a value-generating function \(\mathcal{V}\). Attention weights are generated by softmax over score logits generated by a score-generating function \(\mathcal{S}\), before the output value is generated using a simple sum as follows:

\[\begin{split}p_{ijk...} &= \vec{r}_i\vec{r}_j\vec{r}_k ... \\ q_{ijk...} &= \text{invariants}(p_{ijk...}) \\ v_{ijk...} &= \mathcal{J}(\mathcal{V}(q_{ijk...}), \mathcal{M}(v_i, v_j, v_k, ...)) \\ w_{ijk...} &= \operatorname*{\text{softmax}}\limits_{jk...}(\mathcal{S}(v_{ijk...})) \\ y_i &= \sum\limits_{jk...} w_{ijk...} v_{ijk...}\end{split}\]

Permutation equivariance. The attention weight softmax and sum can be either performed over the remaining indices \(jk...\) or all indices \(ijk...\) according to whether a permutation-equivariant (reduce=False) or permutation-invariant (reduce=True) result is desired. Permutation-equivariant layers consume a point cloud and produce a point cloud’s worth of values, while permutation-invariant layers consume a point cloud and produce a summary value over the entire point cloud.

Geometric product modes. Different sets of geometric products can be considered for the calculation of intermediates within the layer. When invariant_mode=’single’, only the final geometric product \(p_{ijk...}\) is used to calculate rotation-invariant features for the geometric representation of the tuple. For invariant_mode=’partial’, all intermediate calculations on the way to the final product are used and concatenated: \(\vec{r}_i, \vec{r}_i\vec{r}_j, \vec{r}_i\vec{r}_j\vec{r}_k, ...\). For invariant_mode=’full’, all combinations of geometric products for the tuple up to the given rank are used and concatenated: \(\vec{r}_i, \vec{r}_j, \vec{r}_i\vec{r}_j, ...\). While non-single modes can introduce some redundancy into the calculation, they may also simplify the functions the layer must learn.

Linear terms. The set of geometric product terms \(p_{ijk...}\) can also be augmented by linear combinations such as \(p_{i}-p_{j}\) or, for multivector-valued attention, \(p_{i}+p_{jk}\). For vector-valued attention, these linear terms are generated by learning a binary softmax distribution for each product over incorporating the product or the additive inverse of the product, applying the dual as necessary to generate the correct scalar/bivector or vector/trivector output, according to the rank of the calculation. For multivector-valued attention, linear terms are generated to incorporate the product, negative product, product dual, and negative product dual before combining. Much like the geometric product modes described above, these linear terms may introduce redundancy and increase the cost of calculation, but can also improve expressivity of networks.

Parameters:
  • score_net – function producing logits for the attention mechanism

  • value_net – function producing values in the embedding dimension of the network

  • reduce – if True, produce a permutation-invariant result; otherwise, produce a permutation-covariant result

  • merge_fun – Function used to merge the input values of each tuple before being passed to join_fun: ‘mean’ (no parameters) or ‘concat’ (learned projection for each tuple position)

  • join_fun – Function used to join the representations of the rotation-invariant quantities (produced by value_net) and the tuple summary (produced by merge_fun): ‘mean’ (no parameters) or ‘concat’ (learned projection for each representation)

  • rank – Degree of correlations to consider. 2 for pairwise attention, 3 for triplet-wise attention, and so on. Memory and computational complexity scales as N**rank

  • invariant_mode – Type of rotation-invariant quantities to embed into the network. ‘single’ (use only the invariants of the final geometric product), ‘partial’ (use invariants for the intermediate steps to build the final geometric product), or ‘full’ (calculate all invariants that are possible when building the final geometric product)

  • include_normalized_products – If True, for whatever set of products that will be computed (for a given invariant_mode), also include the normalized multivector for each product

  • linear_mode – Type of geometric product terms to use to build linear combinations for the final attention mechanism: ‘partial’ (use invariants for the intermediate steps to build the final geometric product) or ‘full’ (use all invariants that are possible when building the final geometric product)

  • linear_terms – Number of linear terms to incorporate for the final attention mechanism

class geometric_algebra_attention.tensorflow.Multivector2MultivectorAttention(n_dim, score_net, value_net, scale_net, reduce=True, merge_fun='mean', join_fun='mean', rank=2, invariant_mode='single', covariant_mode='partial', include_normalized_products=False, linear_mode='partial', linear_terms=0, convex_covariants=False, **kwargs)[source]

Calculate rotation-equivariant (multivector-valued) geometric product attention.

This layer implements a set of geometric products over all tuples of length rank, then sums over them using an attention mechanism to perform a permutation-covariant (reduce=False) or permutation-invariant (reduce=True) result.

The resulting value is a (geometric) multivector, and will rotate in accordance to the input multivectors of the layer. The overall attention scheme is similar to VectorAttention with slight modifications, including a rescaling function \(\mathcal{R}\) and a set of geometric products \(p_n\) calculated according to the given covariant_mode and learned combination weights \(\alpha_n\); consult VectorAttention for arguments and description of geometric product modes.

\[\begin{split}p_{ijk...} &= \vec{r}_i\vec{r}_j\vec{r}_k ... \\ q_{ijk...} &= \text{invariants}(p_{ijk...}) \\ v_{ijk...} &= \mathcal{J}(\mathcal{V}(q_{ijk...}), \mathcal{M}(v_i, v_j, v_k, ...)) \\ w_{ijk...} &= \operatorname*{\text{softmax}}\limits_{jk...}(\mathcal{S}(v_{ijk...})) \\ r_i^\prime &= \sum\limits_{jk...} w_{ijk...} \mathcal{R}(v_{ijk...}) \sum\limits_{n \in ijk...} \alpha_n (p_n)\end{split}\]
Parameters:
  • score_net – function producing logits for the attention mechanism

  • value_net – function producing values in the embedding dimension of the network

  • scale_net – function producing a scalar rescaling value for the multivectors produced by the network

  • reduce – if True, produce a permutation-invariant result; otherwise, produce a permutation-covariant result

  • merge_fun – Function used to merge the input values of each tuple before being passed to join_fun: ‘mean’ (no parameters) or ‘concat’ (learned projection for each tuple position)

  • join_fun – Function used to join the representations of the rotation-invariant quantities (produced by value_net) and the tuple summary (produced by merge_fun): ‘mean’ (no parameters) or ‘concat’ (learned projection for each representation)

  • rank – Degree of correlations to consider. 2 for pairwise attention, 3 for triplet-wise attention, and so on. Memory and computational complexity scales as N**rank

  • invariant_mode – Type of rotation-invariant quantities to embed into the network. ‘single’ (use only the invariants of the final geometric product), ‘partial’ (use invariants for the intermediate steps to build the final geometric product), or ‘full’ (calculate all invariants that are possible when building the final geometric product)

  • covariant_mode – Type of rotation-covariant quantities to use in the output calculation. ‘single’ (use only the multivectors produced by the final geometric product), ‘partial’ (use all multivectors for intermediate steps along the path of building the final geometric product), or ‘full’ (calculate the full set of multivectors for the tuple)

  • include_normalized_products – If True, for whatever set of products that will be computed (for a given invariant_mode or covariant_mode), also include the normalized multivector for each product

  • convex_covariants – If True, use a convex combination of the rotation-covariant inputs available, rather than an arbitrary linear combination

class geometric_algebra_attention.tensorflow.Multivector2Vector[source]

Convert a multivector representation into a 3D vector representation.

This class simply strips out the non-vector components of the result.

class geometric_algebra_attention.tensorflow.LabeledMultivectorAttention(n_dim, score_net, value_net, scale_net, reduce=True, merge_fun='mean', join_fun='mean', rank=2, invariant_mode='single', covariant_mode='partial', include_normalized_products=False, linear_mode='partial', linear_terms=0, convex_covariants=False, **kwargs)[source]

Use labels to translate one point cloud to another.

This layer calculates a new point cloud from a set of reference point cloud values and coordinates, and a query set of point cloud values. It produces one point corresponding to each query label (reduce=True) or one point cloud, corresponding to each reference point, for each query label (reduce=False).

This layer augments the per-tuple representation with an additional single set of labeled point values \(c_l\). This type of calculation can be used to implement translation between two sets of point clouds, for example. The overall attention scheme is as follows; consult VectorAttention for elaboration on arguments.

\[\begin{split}p_{ijk...} &= \vec{r}_i\vec{r}_j\vec{r}_k ... \\ q_{ijk...} &= \text{invariants}(p_{ijk...}) \\ v_{l, ijk...} &= \mathcal{J}(c_l, \mathcal{V}(q_{ijk...}), \mathcal{M}(v_i, v_j, v_k, ...)) \\ w_{l, ijk...} &= \operatorname*{\text{softmax}}\limits_{ijk...}(\mathcal{S}(v_{l, ijk...})) \\ y_l &= \sum\limits_{ijk...} w_{l, ijk...} v_{l, ijk...}\end{split}\]
Parameters:
  • score_net – function producing logits for the attention mechanism

  • value_net – function producing values in the embedding dimension of the network

  • scale_net – function producing a scalar rescaling value for the vectors produced by the network

  • reduce – if True, produce a permutation-invariant result; otherwise, produce a permutation-covariant result

  • merge_fun – Function used to merge the input values of each tuple before being passed to join_fun: ‘mean’ (no parameters) or ‘concat’ (learned projection for each tuple position)

  • join_fun – Function used to join the representations of the rotation-invariant quantities (produced by value_net) and the tuple summary (produced by merge_fun): ‘mean’ (no parameters) or ‘concat’ (learned projection for each representation)

  • rank – Degree of correlations to consider. 2 for pairwise attention, 3 for triplet-wise attention, and so on. Memory and computational complexity scales as N**rank

  • invariant_mode – Type of rotation-invariant quantities to embed into the network. ‘single’ (use only the invariants of the final geometric product), ‘partial’ (use invariants for the intermediate steps to build the final geometric product), or ‘full’ (calculate all invariants that are possible when building the final geometric product)

  • covariant_mode – Type of rotation-covariant quantities to use in the output calculation. ‘single’ (use only the vectors produced by the final geometric product), ‘partial’ (use all vectors for intermediate steps along the path of building the final geometric product), or ‘full’ (calculate the full set of vectors for the tuple)

  • include_normalized_products – If True, for whatever set of products that will be computed (for a given invariant_mode), also include the normalized multivector for each product

class geometric_algebra_attention.tensorflow.TiedMultivectorAttention(n_dim, score_net, value_net, scale_net, reduce=True, merge_fun='mean', join_fun='mean', rank=2, invariant_mode='single', covariant_mode='partial', include_normalized_products=False, linear_mode='partial', linear_terms=0, convex_covariants=False, **kwargs)[source]

Simultaneously calculates rotation-covariant and -invariant geometric product attention.

This layer implements a set of geometric products over all tuples of length rank, then sums over them using an attention mechanism to perform a permutation-covariant (reduce=False) or permutation-invariant (reduce=True) result.

Instead of returning a single rotation-invariant result (as MultivectorAttention) or rotation-equivariant result (as Multivector2MultivectorAttention), this layer returns both rotation-invariant and -equivariant results simultaneously. The learned attention weights are “tied” in the sense that the same attention weights are used to reduce the set of rotation-invariant outputs as the rotation-equivariant outputs.

Parameters:
  • score_net – function producing logits for the attention mechanism

  • value_net – function producing values in the embedding dimension of the network

  • scale_net – function producing a scalar rescaling value for the vectors produced by the network

  • reduce – if True, produce a permutation-invariant result; otherwise, produce a permutation-covariant result

  • merge_fun – Function used to merge the input values of each tuple before being passed to join_fun: ‘mean’ (no parameters) or ‘concat’ (learned projection for each tuple position)

  • join_fun – Function used to join the representations of the rotation-invariant quantities (produced by value_net) and the tuple summary (produced by merge_fun): ‘mean’ (no parameters) or ‘concat’ (learned projection for each representation)

  • rank – Degree of correlations to consider. 2 for pairwise attention, 3 for triplet-wise attention, and so on. Memory and computational complexity scales as N**rank

  • invariant_mode – Type of rotation-invariant quantities to embed into the network. ‘single’ (use only the invariants of the final geometric product), ‘partial’ (use invariants for the intermediate steps to build the final geometric product), or ‘full’ (calculate all invariants that are possible when building the final geometric product)

  • covariant_mode – Type of rotation-covariant quantities to use in the output calculation. ‘single’ (use only the vectors produced by the final geometric product), ‘partial’ (use all vectors for intermediate steps along the path of building the final geometric product), or ‘full’ (calculate the full set of vectors for the tuple)

  • include_normalized_products – If True, for whatever set of products that will be computed (for a given invariant_mode or covariant_mode), also include the normalized multivector for each product

  • convex_covariants – If True, use a convex combination of the rotation-covariant inputs (including the origin (0, 0, 0)) available, rather than an arbitrary linear combination