The tiny corp raised $5.1M
Comments
Agree with this insight. One thing Nvidia got right was a focus on software. They introduced CUDA [1] back in 2007 when the full set of use cases for it didn't seem very obvious. Then their GPUs had Tensor cores, and more complementary software like TensorRT to take full advantage of them post deep learning boom.
Right as Nvidia reported insane earnings beat too [2]. Would love more players in this space for sure.
[1] - https://en.wikipedia.org/wiki/CUDA [2] - https://www.cnbc.com/2023/05/24/nvidia-nvda-earnings-report-...
> One Humanity is 20,000 Tampas.
I'll never think of humanity the same way!
[0]: https://geohot.github.io/blog/jekyll/update/2023/04/26/a-per...
> I promise it’s better than the chip you taped out! It has 58B transistors on TSMC N5, and it’s like the 20th generation chip made by the company, 3rd in this series. Why are you so arrogant that you think you can make a better chip? And then, if no one uses this one, why would they use yours?
> So why does no one use it? The software is terrible!
> Forget all that software. The RDNA3 Instruction Set is well documented. The hardware is great. We are going to write our own software.
So why not just fix AMD accelerators in pytorch? Both ROCm and pytorch are open sourced. Isn't the point of the OSS community to use the community to solve problems? Shouldn't this be the killer advantage over CUDA? Making a new library doesn't democratize access to the 123 (fp16-)TFLOP accelerator. You fix pytorch and suddenly all the existing code has access to these accelerators. Millions of people now have This then puts significant pressure on Nvidia, as they can't corner the DL market. But it is a catch-22 because the DL market already is mostly Nvidia so it takes priority. Isn't this EXACTLY where OSS is supposed to help? I get Hotz wants to make money, and there's nothing wrong with that (it also complements his other company), but the arguments here seem more for fixing ROCm and specifically the pytorch implementation.
The mission is great, but AMD is in a much better position to compete with AMD. They caught up in the gamer's market (mostly) but have a long way to go for scientific work (which is what Nvidia is shifting focus to). This is realistically the only way to drive GPU prices down. Intel tried their hand (including in supercomputers) but failed too. I have to think there's a reason that's not obvious to most of us as to why this is happening.
Note 1:
I will add that supercomputers like Frontier (current #1) do use AMDs and a lot of the hope has been that this will fund the optimization from two places: 1) DOE optimizing their own code because that's the machine that they have access to and 2) AMD using the contract money to hire more devs. But this doesn't seem to be happening fast enough (I know some grad students working on ROCm).
Note 2:
There's a clear difference in how AMD and Nvidia measure TFLOPS. techpowerup shows AMD at 2-3x Nvidia, but performance is similar. Either AMD is crazy underutilized or something is wrong. Does anyone know the answer?
For such a smart guy, locking yourself out of a ton of talent by requiring software developers to be on-site in 2023 seems...out of character, to put it politely.
(Rephrased, my original post was a bit too ad hominem and accumulating downvotes rapidly. I wanted to delete this entire comment but apparently HN no longer allows comments to be deleted.)
My prediction is AMD is already working on this internally, except more oriented around PyTorch not Hotz's Tinygrad, which I doubt will get much traction.
The past decade or so, they haven't been able to create any good software for their hardware. They made small improvements but the competition, Nvidia, has also made improvements to their already good software.
It too the point where their software is the reason why most people/companies don't use their products. Their drivers for their customer products are just as bad.
They are very competitive in hardware, but Nvidia dominates them at software which make companies buy Nvidia. No one wants to deal with the pain of AMD software.
AMD is a better company to work with than Nvidia, but it not worth it when it comes to dealing with their software lol.
And then we get a computer that... how do I interact with it? Will it have its own OS? Some flavor of linux? Is the intent to work on it directly, or use it as an inference server, and talk over a network?
This project fits the pattern of his previous projects: he gets excited about the currently hot thing in tech, makes his own knockoff version, generates a ton of buzz in the tech press for it, and then it fizzles out because he doesn't have the resources or attention span to actually make something at that scale.
In 2016, Tesla and self-driving cars led to his comma one project ("I could build a better vision system than Tesla autopilot in 3 months"). In 2020, Ethereum got hot and so he created "cheapETH". In 2022 it was Elon's Twitter, which led him to "fixing Twitter search". And in 2023 it's NVIDIA.
I'd love to see an alternative to CUDA / NVIDIA so I hope this one breaks the pattern, but I'd be very, very careful before giving him a deposit.
> Unfortunately, this advantage is thrown away the minute you have something like CUDA in your stack. Once you are calling in to Turing complete kernels, you can no longer reason about their behavior. You fall back to caching, warp scheduling, and branch prediction.
> tinygrad is a simple framework with a PyTorch like frontend that will take you all the way to the hardware, without allowing terrible Turing completeness to creep in.
I like his thinking here, constraining the software to something less than Turing complete so as to minimize complexity and maximize performance. I hope this approach succeeds as he anticipates.
He made a very good point about how this isn’t general purpose computing. The tensors and the layers are static. There’s an opportunity for a new type of optimization at the hardware level.
I don’t know much about Google’s TPUs, except that they use a fraction of the power used by a GPU.
For this experiment though, my sincere hope is that all the bugs are software only. Supporting argument - if they were hardware bugs, the buggy instructions would not have worked during gameplay.
This makes very little sense. Even if he was able to achieve his goals, consumer GPU hardware is bounded by network and memory, so it's a bad target to optimize. Fast device-to-device communication is only available on datacenter GPUs, and is essential for models training like LLaMA, Stable Diffusion, etc. Amdahl's law strikes again.
Where is this number coming from? The number of spikes per second?
Edit: doing a quick search, it doesn’t seem like there’s a consensus on the order of magnitude of this. Here’s a summary of various estimates: https://aiimpacts.org/brain-performance-in-flops/
Even if adopting "hardware/software co-design", leadership might be hardware people, and they might not understand that there's tons more to systems software engineering than the class they had in school or the Web&app development that their 5 year-old can do. That misunderstanding can exhibit in product concepts, resource allocation, scheduling, decisions on technical arguments, etc.
(Granted, the stereotypical software techbro image in popular culture probably doesn't help the respect situation.)
To be fair, the previous page has a bit more details on the hardware.
I can run LLaMA 65B GPTQ4b on my $2300 PC (built from used parts, 128GB RAM, Dual RTX 3090 @ PCIe 4.0x8 + NVLink), and according to the GPTQ paper(§) the quality of the model will not suffer much at all by the quantization.
Just saying, open source is squeezing an amazing amount of LLM goodness out of commodity hardware.