The tiny corp raised $5.1M



@coffeebeqn 5d
The way they make money is for AMD to buy them out if they’re successful
@freediver 5d
True entrepreneur. Having a vision and ignoring naysayers. Go George!
@tzhenghao 5d
> I think the only way to start an AI chip company is to start with the software. The computing in ML is not general purpose computing. 95% of models in use today (including LLMs and image generation) have all their compute and memory accesses statically computable.

Agree with this insight. One thing Nvidia got right was a focus on software. They introduced CUDA [1] back in 2007 when the full set of use cases for it didn't seem very obvious. Then their GPUs had Tensor cores, and more complementary software like TensorRT to take full advantage of them post deep learning boom.

Right as Nvidia reported insane earnings beat too [2]. Would love more players in this space for sure.

[1] - [2] -

@turnsout 5d
I clicked through to the previous blog post, to read more about the unit of a "person" of compute [0]. Definitely worth a read, if only for this quote:

> One Humanity is 20,000 Tampas.

I'll never think of humanity the same way!


@godelski 5d
> There’s a [Radeon RX 7900 XTX 24GB] already on the market. For $999, you get a 123 TFLOP card with 24 GB of 960 GB/s RAM. This is the best FLOPS per dollar today, and yet…nobody in ML uses it.

> I promise it’s better than the chip you taped out! It has 58B transistors on TSMC N5, and it’s like the 20th generation chip made by the company, 3rd in this series. Why are you so arrogant that you think you can make a better chip? And then, if no one uses this one, why would they use yours?

> So why does no one use it? The software is terrible!

> Forget all that software. The RDNA3 Instruction Set is well documented. The hardware is great. We are going to write our own software.

So why not just fix AMD accelerators in pytorch? Both ROCm and pytorch are open sourced. Isn't the point of the OSS community to use the community to solve problems? Shouldn't this be the killer advantage over CUDA? Making a new library doesn't democratize access to the 123 (fp16-)TFLOP accelerator. You fix pytorch and suddenly all the existing code has access to these accelerators. Millions of people now have This then puts significant pressure on Nvidia, as they can't corner the DL market. But it is a catch-22 because the DL market already is mostly Nvidia so it takes priority. Isn't this EXACTLY where OSS is supposed to help? I get Hotz wants to make money, and there's nothing wrong with that (it also complements his other company), but the arguments here seem more for fixing ROCm and specifically the pytorch implementation.

The mission is great, but AMD is in a much better position to compete with AMD. They caught up in the gamer's market (mostly) but have a long way to go for scientific work (which is what Nvidia is shifting focus to). This is realistically the only way to drive GPU prices down. Intel tried their hand (including in supercomputers) but failed too. I have to think there's a reason that's not obvious to most of us as to why this is happening.

Note 1:

I will add that supercomputers like Frontier (current #1) do use AMDs and a lot of the hope has been that this will fund the optimization from two places: 1) DOE optimizing their own code because that's the machine that they have access to and 2) AMD using the contract money to hire more devs. But this doesn't seem to be happening fast enough (I know some grad students working on ROCm).

Note 2:

There's a clear difference in how AMD and Nvidia measure TFLOPS. techpowerup shows AMD at 2-3x Nvidia, but performance is similar. Either AMD is crazy underutilized or something is wrong. Does anyone know the answer?

@mhh__ 5d
Good idea. I don't think George Hotz has the skill set to actually deliver on a lot of this stuff (specifically I suspect trying to replace the compiler for the GPU is something that he will probably make a song and dance about with some simple prototype but then quietly scrap it because even for AI workloads its still a very very tricky problem) but he has the strength of vision to get and direct other people to do it for him.
@neom 5d
Seems like a much better mission for George Hotz to go on than single handedly trying to fix Twitter.
@csense 5d
I respect Geohot's reputation and this company looks amazing. I might be in the market to work there... except "No Remote."

For such a smart guy, locking yourself out of a ton of talent by requiring software developers to be on-site in 2023 seems...out of character, to put it politely.

(Rephrased, my original post was a bit too ad hominem and accumulating downvotes rapidly. I wanted to delete this entire comment but apparently HN no longer allows comments to be deleted.)

@thatguyknows 5d
I was always surprised at how AMD hasn't already thrown a bunch of money at this problem. Maybe they have and are just incompetent in this area.

My prediction is AMD is already working on this internally, except more oriented around PyTorch not Hotz's Tinygrad, which I doubt will get much traction.

@Havoc 5d
Glad he is going ahead with this. Will make for many entertaining live streams no doubt
@dharma1 5d
So much untapped potential in AMD, and funny they keep failing at the software and geohot has to save them
@impulser_ 5d
Why wouldn't AMD throw a few million at this? Worst case they lose a small amount of money, but best case they finally get good software for their hardware.

The past decade or so, they haven't been able to create any good software for their hardware. They made small improvements but the competition, Nvidia, has also made improvements to their already good software.

It too the point where their software is the reason why most people/companies don't use their products. Their drivers for their customer products are just as bad.

They are very competitive in hardware, but Nvidia dominates them at software which make companies buy Nvidia. No one wants to deal with the pain of AMD software.

AMD is a better company to work with than Nvidia, but it not worth it when it comes to dealing with their software lol.

@daveed 5d
I don't want to cast any judgement, I just want to ask what the initial product is. The claim is they sell computers, and there's a link to the tinybox. There's a $100 preorder, for a 15k computer (I guess I'd have to pay 14.9k eventually?).

And then we get a computer that... how do I interact with it? Will it have its own OS? Some flavor of linux? Is the intent to work on it directly, or use it as an inference server, and talk over a network?

@thesausageking 5d
For background, "geohot", is George Hotz, who's a known hacker / tech personality[0]

This project fits the pattern of his previous projects: he gets excited about the currently hot thing in tech, makes his own knockoff version, generates a ton of buzz in the tech press for it, and then it fizzles out because he doesn't have the resources or attention span to actually make something at that scale.

In 2016, Tesla and self-driving cars led to his comma one project ("I could build a better vision system than Tesla autopilot in 3 months"). In 2020, Ethereum got hot and so he created "cheapETH". In 2022 it was Elon's Twitter, which led him to "fixing Twitter search". And in 2023 it's NVIDIA.

I'd love to see an alternative to CUDA / NVIDIA so I hope this one breaks the pattern, but I'd be very, very careful before giving him a deposit.


@SkyMarshal 5d
> I think the only way to start an AI chip company is to start with the software. The computing in ML is not general purpose computing. 95% of models in use today (including LLMs and image generation) have all their compute and memory accesses statically computable.

> Unfortunately, this advantage is thrown away the minute you have something like CUDA in your stack. Once you are calling in to Turing complete kernels, you can no longer reason about their behavior. You fall back to caching, warp scheduling, and branch prediction.

> tinygrad is a simple framework with a PyTorch like frontend that will take you all the way to the hardware, without allowing terrible Turing completeness to creep in.

I like his thinking here, constraining the software to something less than Turing complete so as to minimize complexity and maximize performance. I hope this approach succeeds as he anticipates.

@sheepscreek 5d
This is great news. I’ve oft wondered the same about AMD’s GPUs. NVIDIA’s got a clear monopoly.

He made a very good point about how this isn’t general purpose computing. The tensors and the layers are static. There’s an opportunity for a new type of optimization at the hardware level.

I don’t know much about Google’s TPUs, except that they use a fraction of the power used by a GPU.

For this experiment though, my sincere hope is that all the bugs are software only. Supporting argument - if they were hardware bugs, the buggy instructions would not have worked during gameplay.

@ipsum2 5d
I have some experience in this area, having both worked on machine learning frameworks, trained large models on datacenters, and have my own personal machine for tinkering around with.

This makes very little sense. Even if he was able to achieve his goals, consumer GPU hardware is bounded by network and memory, so it's a bad target to optimize. Fast device-to-device communication is only available on datacenter GPUs, and is essential for models training like LLaMA, Stable Diffusion, etc. Amdahl's law strikes again.

@Lapsa 5d
I find it peculiar. just recently folks were tryharding to make everything and your kitchen sink Turing complete and now it creeps in menacingly on its own
@smasher164 5d
Turing-completeness != un-optimizable! Literally the areas of type systems and compilers exist to serve this endeavor. It's gotta be a meme at this point every time someone brings up the halting problem or rice's theorem.
@fancyfredbot 5d
AMD only enabled their ROCm stack on consumer cards last month. This finally corrects a huge mistake - Nvidia made cuda available on all their cards for free from the start and made it easy/cheap for people to get started. Of course once they'd started they stuck with it... I hope it's not too late to turn this around.
@agnosticmantis 5d
> The human brain has about 20 PFLOPS of compute.

Where is this number coming from? The number of spikes per second?

Edit: doing a quick search, it doesn’t seem like there’s a consensus on the order of magnitude of this. Here’s a summary of various estimates:

@sidcool 5d
George was in the list of my favourite programmers till a few months ago. Now there are Jeff Dean, John Carmack and Karpathy
@neilv 5d
I don't know whether it's a factor in the alleged software quality issues he mentions, but it's not unusual for a company that thinks of itself as a hardware company to neither understand nor respect software enough.

Even if adopting "hardware/software co-design", leadership might be hardware people, and they might not understand that there's tons more to systems software engineering than the class they had in school or the Web&app development that their 5 year-old can do. That misunderstanding can exhibit in product concepts, resource allocation, scheduling, decisions on technical arguments, etc.

(Granted, the stereotypical software techbro image in popular culture probably doesn't help the respect situation.)

@Tepix 5d
When you click on the strip link to preorder the tinybox, it is advertised as a box running LLaMA 65B FP16 for $15000.

To be fair, the previous page has a bit more details on the hardware.

I can run LLaMA 65B GPTQ4b on my $2300 PC (built from used parts, 128GB RAM, Dual RTX 3090 @ PCIe 4.0x8 + NVLink), and according to the GPTQ paper(§) the quality of the model will not suffer much at all by the quantization.

Just saying, open source is squeezing an amazing amount of LLM goodness out of commodity hardware.


@paulus-saulus 5d
I reached the limit of free articles. impossible to browse and read the page