However, sometimes you can't do that. For example, perhaps you want your model to always talk like a pirate, but you don't have billions of words spoken like a pirate to train on.
So the next best thing is to train a model on all english text (which you have lots of), and then finetune on your smaller dataset of pirate speech.
Finetuning is simply more training, but with a different dataset and often a different learning rate.
Typically, finetuning uses far far far less data and compute, and can be done by individuals with a home PC, whereas training a large language model from scratch is in the $1M - $1B range.
What you are suggesting is called "curriculum learning", and though it hasn't been applied to LLMs yet to the best of my knowledge, it has proven to improve learning and decrease training times in other areas of ML.
Useful for taking a generic model with a base level of knowledge, and tuning it so the output is more useful for an application specific use case.