@londons_explore 12d
Ideally you train a model right to begin with, and no fine tuning is necessary.

However, sometimes you can't do that. For example, perhaps you want your model to always talk like a pirate, but you don't have billions of words spoken like a pirate to train on.

So the next best thing is to train a model on all english text (which you have lots of), and then finetune on your smaller dataset of pirate speech.

Finetuning is simply more training, but with a different dataset and often a different learning rate.

Typically, finetuning uses far far far less data and compute, and can be done by individuals with a home PC, whereas training a large language model from scratch is in the $1M - $1B range.

@[deleted] 12d
[deleted]
@Taek 12d
For "full fine tuning", mathematically there's no difference. Fine tuning is just extending the training on new data.

What you are suggesting is called "curriculum learning", and though it hasn't been applied to LLMs yet to the best of my knowledge, it has proven to improve learning and decrease training times in other areas of ML.

@worldsayshi 12d
Yeah, since fine tuning seems to be so much more cheaper than training why haven't OpenAI fine tuned ChatGPT on data past 2021?
@swalsh 12d
Not an expert, but my high level understanding is this: If a model is a set of inputs, some middle layers, and a set of outputs. Fine tuning concentrates on only the output layers.

Useful for taking a generic model with a base level of knowledge, and tuning it so the output is more useful for an application specific use case.