What mechanism allows the Transformer model to weigh the importance of different words in a sequence?
Answer options
A
Self-Attention Mechanism
B
Diffusion Model
C
Support Vector Machines
D
Decision Trees
Correct answer: Self-Attention Mechanism
Explanation
The correct answer is: Self-Attention Mechanism.