Before that we have bundler's lifecycle, although this is on a much smaller scale and often happen on a single machine. The lifecycle of a model is so large and needed to be distributed.
Sure we can feed as much as we can in pretraining, but we have already running out of data and a lot of the newer data are already generated by models. What now? What if we JUST want models that a browser can support? What if we want models that can fit in a Rasberry Pi and robots? We wil break it down during training and serving in production, then go into how they influence orchestration toolings and machine design choices.
Behind a model's lifecycle during training
Oh I have heard of the dragons' book...destroys everyone who opens it.
Compilers: Principles, Techniques, and Tools