How does Multi-head attention differ from standard attention?

Question

Accepted Answer

Correct answer: It allows the model to focus on multiple parts of the input, simultaneously. Multi-head attention runs multiple parallel attention mechanisms (heads), each attending to different representation subspaces, allowing the model to jointly attend to information from multiple positions simultaneously and capture diverse relationships. Options [2] and [3] form the complete answer: 'It allows the model to focus on multiple parts of the input simultaneously.'

How does Multi-head attention differ from standard attention?

Explanation

Related Accenture Generative AI questions

Practice more Accenture Generative AI questions

How does Multi-head attention differ from standard attention?

Answer options

Explanation

Related Accenture Generative AI questions

Practice more Accenture Generative AI questions