You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix self-attention for decoder-only models and add set_alignment_heads API
Self-attention was not returning attention weights for decoder-only models
(Generator) because the attention pointer was always nullptr in
TransformerDecoderLayer. Now passes the attention pointer to self-attention
when there is no encoder-attention (decoder-only case).
Also adds set_alignment_heads() to Generator Python API, allowing users to
select specific (layer, head) pairs instead of the default (last layer,
head 0). The attention from selected heads is concatenated in the output
and can be reshaped to (num_heads, context_length).
Fixed multi-head attention handling in decoding.cc to support variable-rank
attention tensors (rank 3 for multi-head vs rank 2 for averaged).
0 commit comments