This is a very weird type of paper. They take a specific approach, then make arguments about a broad class of approaches that are under constant development. The finding that distilled LLMs must be more specialized than the giant LLMs that train them is unsurprising; nobody at this point expects a 13B parameter model to succeed with the same accuracy at the broad range of tasks supported by what may be a 1T parameter model.
I don't know if it's bad news per say. It helps to know where to deploy a tool, it's limitations and where to focus to build something competitive / better.
They don't even use more methodological evaluations of fine-tuned LLMs, they use metrics that are specifically built to support a (false) contrarian conclusion in order to generate attention for their "paper."