Nvidia's 900 tons of GPU muscle bulks up server market, slims down wallets




Now that's what I call Big Iron


And they are simply unable to reliably release Linux drivers


Linux graphics you mean? The compute ones are fine.

It's not as much of an issue about supporting old versions, it's just that new versions of Linux breaks the drivers since they don't know what's in it.


Almost 100% of their datacenter GPUs are used on Linux. Do do think they run without drivers?


They run with drivers, of course.

But such servers have no need to, for example, suspend and resume correctly. Or handle hot-plugging of displays correctly. Or install updates in a completely reliable manner. Or include the 32 bit support needed for Steam to work.


> unable

Correction: unwilling, not unable




Pardon me for the stupid question, but wouldn’t you get much better bang for the buck with high end consumer GPUs from Nvidia? I think you’d be limited to 16gb memory per card, but you could have a full 8 card system for less than the price of one H100. Is there really no way to partition the workload to run with 16gb memory per card?


The Nvidia driver license for consumer hardware doesn't allow you to virtualize GPUs.


It is not only about computation, but also the communication bandwidth and memory, especially in training large models. The top consumer GPUs e.g. 3090/4090 still cannot beat H100 in this area even if a lot of techniques are applied.


> Is there really no way to partition the workload to run with 16gb memory per card?

It depends on the model architecture you are using. Once you cannot fit a single instance of your model on a single GPU or at minimum on a single node, things start becoming very complicated. If you are lucky and you have a generic transformer model, you can just use Deepspeed with their transformer kernel. But if you have another architecture it will likely not be compatible with Deepspeed or Fairscale or any of the other scaling frameworks and you will end up having to write your own CUDA kernels.

So the per GPGPU RAM is quite important.


> but wouldn’t you get much better bang for the buck with high end consumer GPUs from Nvidia

It depends on the workload, but generally for large neural network model tasks, no, while for most other accelerated tasks, yes.

> Is there really no way to partition the workload to run with 16gb memory per card?

With regards to the neural networks in the news, you can partition the workloads. The 8xH100 machines are also partitioned in the same way. Nonetheless the performance is very poor on the consumer cards, because they lack the very high speed interconnect of the H100 that makes partitioning performant. Additionally, the size of the RAM on the H100 is matched to the capabilities of the chip itself - it sits in at the peak of an upside down U of a performance curve - which is why it is not possible to "just" add more RAM to these chips.


The 24GB of VRAM of the 3090/4090 (instead of 80GB) is a deal breaker for a lot of stuff. Also 3090 only does 2x nvlink (and 4090 has no nvlink).

But a lot of people (including me) run 3090/4090 rigs at home, or rent them at runpod/vastai to save a buck.


Since you seem to know and I'm having a rough time finding good info, I'd love to put together a mid range 3090 box to mess around with LLMs. Do you have any pointers to a good build list?


There's some EUA that prohibts you from using consumer-grade GPUs in server or renting them (can't remember the exact rule, but for sure it's not allowed for Amazon or others to use them).


Could you use them for your own purposes? Not to rent/resell?

At that price differential, on premises might make much more sense, provided you have decent utilization. Your breakeven is likely well under 10%.



"No Datacenter Deployment. The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted."


Can I stick it in the corner of the office? Serious question.


Is your office a data center?


That's the question.


Haha so as long as you append some light proof of work to every task, you get them on a technicality?

Honestly though, who the fuck are they to tell you what to do with the hardware that you bought? Like telling people they are not allowed to drive their car on gravel roads or not being allowed to drink wine from a beer glass without paying for a special overpriced version or something.


It mostly just means that don't expect them to provide you any support for that use case. It isn't all that uncommon for smaller studios to just cluster together a couple of computers with 3-4 high end gaming GPUs each as a rendering farm for instance.


Usually it involves warranty issues. Although in this case, it seems especially odd since this clause was added to effectively force data centers to buy their more expensive hardware. It goes back a while (~2018), so you'd think they would have revised this policy:



You can't legally use their drivers, particularly CUDA, for consumer gear in a datacenter.


Very strange metric considering actual GPU is probably below 1% of the shipped weight.


I'm glad they're measuring by weight instead of volume for more accuracy