Decoder-only models like GPT are trained to:
Answer options
A
Predict masked tokens bidirectionally
B
Predict next-token in an autoregressive fashion
C
Compute exact likelihoods always
D
Implement HMMs
Correct answer: Predict next-token in an autoregressive fashion
Explanation
The correct answer is: Predict next-token in an autoregressive fashion.