Generating one token at a time is a blessing in disguise

(kachkach.com)

2 points | by halflings 1 hour ago

1 comments

halflings 1 hour ago
LLMs generate their output one token at a time. The first thought when you learn this is that this is a huge performance bottleneck, as we are used to highly parallelized systems.
However, a large part of what makes LLMs feel so magical comes from this bottleneck.