@heliophobicdude 12d
I think these are two very separate concepts.

What we are mostly seeing when it comes to fine-tuning is making a model promptable. Models like LLaMA or the original GPT3 weren't promptable. They were fine-tuned with demonstration data that looks like a prompt input, prompt output.

See below: { "instruction": "What would be the output of the following JavaScript snippet?", "input": "let area = 6 * 5;\nlet radius = area / 3.14;", "output": "The output of the JavaScript snippet is the radius, which is 1.91." }, [1]

Prompt engineering is really just carefully designing what inputs and outputs on a prompt-ready model work best.

I highly recommend skimming this RLHF article and looking for the parts where it talks about demonstration data [2]

1: https://github.com/sahil280114/codealpaca/blob/master/data/c...

2: https://huyenchip.com/2023/05/02/rlhf.html

@tstrimple 13d
From what I've seen, it's when embeddings get too large for the token limit or the embeddings drive the cost up too much because you're always operating near the max token limit. In those cases, it may be worth the up front training cost and slightly higher per-token cost to dramatically reduce the amount of tokens in the average request. If you're building a higher throughput solution, the difference in cost can be quite large.
@messe 13d
When you're starting to run into context limits.
@oddthink 12d
It's worth it whenever you have a reasonable amount of training data. You can get substantial quality improvements automatically. Unless you're doing some kind of prompt-optimization, prompt-tuning is a lot of random guessing and trial-and-error. It's also most necessary when you have a smaller base model, as opposed to one of the big ones.
@snovv_crash 12d
If you want to teach it eg. all of the text in your private training manuals and internal documentation, which wouldn't fit in the input token size.