1 Comment
⭠ Return to thread

The important thing here is that the larger model defines the top k completions that are considered. To me, reverse stupidity would be letting the small model pick the top k.

Expand full comment