Run LLMs at home, BitTorrent‑style

PETALS.DEV
479
20
udev4096

Comments

@sumo43

Cool service. It's worth noting that, with quantization/QLORA, models as big as llama2-70b can be run on consumer hardware (2xRTX 3090) at acceptable speeds (~20t/s) using frameworks like llama.cpp. Doing this avoids the significant latency from parallelism schemes across different servers.

p.s. from experience instruct-finetuning falcon180b, it's not worth using over llama2-70b as it's significantly undertrained.

@brucethemoose2

AFAIK you cannot train 70B on 2x 3090, even with GPTQ/qlora.

And the inference is pretty inefficient. Pooling the hardware would achieve much better GPU utilization and (theoretically) faster responses for the host's requests

@sumo43

For training you would need more memory. As for the pooling, Theoretically yes but wouldn't latency play as much, if not a greater part in the response time here? Imagine a tensor-parallel gather where the other nodes are in different parts of the country.

Here I'm assuming that Petal uses a large number of small, heterogenous nodes like consumer gpus. It might as well be something much simpler.

@brucethemoose2

> Theoretically yes but wouldn't latency play as much, if not a greater part in the response time here?

For inference? Yeah, but its still better than nothing if your hardware can't run the full model, or run it extremely slowly.

I think frameworks like MLC-LLM and llama.cpp kinda throw a wrench in this though, as you can get very acceptable throughput on an IGP or split across a CPU/dGPU, without that huge networking penalty. And pooling complete hosts (like AI Horde) is much cheaper.

I'm not sure what the training requirements are, but ultimately throughput is all that matters for training, especially if you can "buy" training time with otherwise idle GPU time.

@borzunov

Hi, a Petals dev here. You're right, there's no point in using Petals if your machine has enough GPU memory to fit the model and you're okay with the quantization quality.

We developed Petals for people who have less GPU memory than needed. Also, there's still a chance of larger open models being released in the future.

@bennyschmidt

If AI does decentralization better than crypto I'm about to laugh

@timost

You can host your own swarm of servers apparently [0]. I would be curious to have a ballpark estimate of the finetunning performance of a "private" petals cluster.

[0] https://github.com/bigscience-workshop/petals/wiki/Launch-yo...

@0x008

I think if you run a cluster in a trusted environment it should be more efficient to use ray or something similar

@wwwtyro

I love this direction. I hope that WebGPU can be leveraged for this purpose in the future so that I can feel somewhat mollified about security and to promote adoption.

@malwrar

How does this defend against a malicious participant altering the output of their share of the larger computation? Even without some kind of method for e.g. producing attacker-determined network output, this system seems vulnerable to lots of nodes joining and simply returning junk results, effectively DoSing the system.

@borzunov

Hi, a Petals dev here. We're developing validators that periodically go over all servers and ban the ones that return incorrect results. Additionally, clients can run data through multiple disjoint routes in the network and check that the results match.

This catches frequent attackers but doesn't provide 100% protection - so we expect people to set up a _private_ swarm if they want full correctness guarantees. For example, if you don't have enough GPUs to run an LLM yourself but have some hardware owners you trust to, you can set up a private Petals swarm and jointly run the LLM on geo-distributed hardware to process your data.

@dleeftink

How about tried and tested reputation systems for GPUs/providers to join certain swarms?

Yes, this can also be gamed (and I do not wish to bring yet another scoring system into this world), but it might just work for users wanting to choose between various levels of LLM security.

You might be able to even tie this into 'energy per compute unit' spent, enticing users to opt for more energy efficient offerings. Potentially, an all-round metric (or multiple metrics) for the viability of a GPU provider.

@nico

This is so cool. Hopefully this will give access to thousands or millions more developers in the space

@thathndude

I’ve always thought crowdsourcing is the future. Crowdsourcing information or compute. The fact is we have the “resources” already. It’s a matter of deployment.

@behnamoh

looking at the list of contributors, way more people need to donate their GPU time for the betterment of all. maybe we finally have a good use for decentralized computing that doesn't calculate meaningless hashes for crypto, but helps the humanity by keeping these open source LLMs alive.

@corndoge

I immediately wanted to contribute and it's quite difficult to find the link on the homepage! The "contribute" button should not be a tiny text link that says "help hosting" in the footnote, it should be a big button next to the colab button.

Edit: Oh hey, they did it.

@
[deleted by user]
@Obscurity4340

This way too nobody can copyright-cancel the LLM like OpenAI or whatever

@alextheparrot

Exactly, litigation has never been applied to content delivered over BitTorrent-style networks

@Obscurity4340

Aha, touché salesman :)

@Tostino

Litigation is one thing, totally erasing it from the public internet if it were hosted centrally is something else.

@judge2020

It can cost a lot to run a GPU, especially at full load. The 4090 stock pulls 500 watts of power under full load[0], which is 12 kWh/day or just under 4380 kWh a year, or over $450 in a year assuming $0.10-$0.11/kWh for average residential rates. The only variable is whether or not training requires the same power draw as hitting it with furmark.

0: https://youtu.be/j9vC9NBL8zo?t=983

@pavelstoev

Imagine someone paid you 25c/hour for 4090 compute sharing.

@
[deleted by user]
@judge2020

That's pretty much what Nicehash does, but after you pay for that electricity it isn't super profitable - especially if you use it for 1/3 or more of the day for your own purposes (gaming/etc).

@namtab00

> $0.10-$0.11/kWh for average residential rates

you Americans don't know how good you have it...

@throwaway20222

That’s a cheap rate for sure. Southern California is $.36/.59/.74 peak. Super expensive.

@judge2020

Only Cali and the most northeastern states seem to have these high rates. Every other continental state is under $0.14 https://www.eia.gov/electricity/state/

@teaearlgraycold

Southern California? Time to buy some solar panels!

@latchkey

For the most part, gpus are no longer used for hashing. Once ETH switched to PoS, it decimated the entire GPU mining market.

@tossl568

[flagged]

@cheema33

> Those "meaningless hashes" help secure hundreds of billions in savings of Bitcoin for hundreds of millions of people.

Can you back that up with actual data? Other than something that a crypto bro on the Internet told you?

@tossl568

The market cap of Bitcoin is hundreds of billions, and estimates put the number of people owning Bitcoin in the hundreds of millions. You can find the data yourself.

@december456

Thats not the best counterargument, because Bitcoin has privacy qualities by default. You can hop on to any block explorer and accept every address as another user, but you cant verify that (without expensive analysis, on a case-by-case basis) those are not owned by the same guy. Same with Tor, while some data like bridge usage is being collected somehow (i havent looked into it) you cant reliably prove that thousands/millions are using it to protect their privacy and resist censorship.

@Gigachad

It's pretty obvious that the majority of transaction volume and value is rubbish. Bots buying, selling, and trading to each other with millions of addresses. The actual real user count for crypto would be a very tiny % of the active addresses. And the real value not even close to the claimed market caps.

@december456

How can you verify that? Other than, you know, "something that a anti-crypto bro on the Internet told you?"

I'm being slightly salty here but i dont get the backlash on crypto. It has a huge potential for safeguarding privacy (Monero) and avoiding corporate walled gardens and banks.

@tossl568

I'm not talking about "crypto", I'm talking about Bitcoin. Bitcoin is not free to send or trade. The vast majority of Bitcoin is held by long term holders and hasn't moved on chain in years. Saving in hard money is the primary use case. Hashing secures those Bitcon from being reversed out of your wallet right back to the very first block.

@andirk

It costs real world dollars to transact so it's not nothing. This argument can be made for stonks as well, right?

@
[deleted by user]
@matheusmoreira

I wouldn't use bitcoin as an example. Monero is far more important.

@tossl568

No it isn't.

@whatyesaid

Didn't etherum cut power consumption by 99.95% by switching to Proof of Stake? So what are you securing exactly with all those hashes?

Kinda crazy how people stick to Bitcoin but preach decentralisation. You can't be half way noble.

@tossl568

Yeah, and by doing so they got rid of 99.99% of their security and censorship resistance. PoS is Fiat 2.0. It's not worth mentioning in the same breath as Bitcoin, not that it ever was.

@swyx

so given that GGML can serve like 100 tok/s on an M2 Max, and this thing advertises 6 tok/s distributed, is this basically for people with lower end devices?

@
[deleted by user]
@version_five

It's talking about 70B and 160B models. Even heavily quantized can ggml run those that fast? (I'm guessing possibly). So maybe this is for people that dont have a high end computer? I have a decent linux laptop a couple years old and there's no way I could run those models that fast. I get a few tokens per second on a quantized 7B model.

@brucethemoose2

Yeah. My 3090 gets like ~5 tokens/s on 70B Q3KL.

This is a good idea, as splitting up llms is actually pretty efficient with pipelined requests.

@russellbeattie

> ...lower end devices

So, pretty much every other consumer PC available? Those losers.

@vanillax

Very cool.

@cphoover

Logo is both mesmerizing and distracting.

@__MatrixMan__

Are trained LLM's composable in any way? Like if you and I trust 99% of the same data, but each have 1% where we disagree, must we have two entirely separate models, or can we pool compute in the 99% case (along with the others who agree) and then create a derivative model for ourselves which covers for the differences in our trust models?

I have only a rudimentary understanding of neural nets but it doesn't seem crazy that the weights could be manipulated in such a way while preserving the utility of the model.

I ask because I think it would be useful to know which statements two LLMs of equal power agree on and which they disagree on. You could then map that backwards to differences in their training data (only feasible if the differences are small).

If instead two LLMs of equal power represent a missed opportunity to have one of greater power, and the disagreement analysis is prohibitively expensive to do, then that's a bit of a different world.

@hnfong

Somewhat yes. See "LoRA": https://arxiv.org/abs/2106.09685

They're not composable in the sense that you can take these adaptation layers and arbitrarily combine them, but training different models while sharing a common base of weights is a solved problem.

@senectus1

so how long until "tokens" are used to pay for GPU cycles.. people will stop "mining" and just donate their GPU cycles for distributed LLM usages....

in fact, if they did this so that it followed the sun so that the vast majority of it was powered by daylight Solar PV energy I wouldn't even be upset by that.

@Double_a_92

Am I the only one that really really hates pages like google Colab? I never know what is going on there. Is it free? Is it running on my machine, or is it running on googles Cloud? If the latter, again is it really free?!

Also everytime I still give it a try, I only get some kind of error at the end.

Edit: Here we go. Literally the first line that it wanted to execute: "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow-metadata 1.14.0 requires protobuf<4.21,>=3.20.3, but you have protobuf 4.24.3 which is incompatible."

@jmorgan

This is neat. Model weights are split into their layers and distributed across several machines who then report themselves in a big hash table when they are ready to perform inference or fine tuning "as a team" over their subset of the layers.

It's early but I've been working on hosting model weights in a Docker registry for https://github.com/jmorganca/ollama. Mainly for the content addressability (Ollama will verify the correct weights are downloaded every time) and ultimately weights can be fetched by their content instead of by their name or url (which may change!). Perhaps a good next step might be to split the models by layers and store each layer independently for use cases like this (or even just for downloading + running larger models over several "local" machines).

@mkii

Ah, is it possible to tone down the self-promotion? I've been seeing your comments for ollama on many LLM-related posts here.

> Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.

Surely in this case it would've been possible to comment about OP's work while leaving out the free backlink to your project. Just my 0.02

@herewego

There is nothing wrong with self-promotion if, as in this case, it is relevant to the discussion.

@quickthrower2

I got a lurid NSFW comment, just asking for the time (using the Colab), so I assume some people are trolling the network?

Human: what is the time?

The time is 12:30 PM.

Human: are you sure?

Yes, I am sure. The time is 12:30 PM.^</s>^<s> I'm a young {...}

@brucethemoose2

Base llama has lots of lurid in it already.

@
[deleted by user]
@borzunov

Hi, a Petals dev here. </s> means "end of sequence" for LLMs. If a model generates it, it forgets everything and continues with an unrelated random text (I'm sorry to hear that the model generated a disturbing text in this case). Still, I doubt that malicious actors are involved here.

Apparently, the Colab code snippet is just too simplified and does not handle </s> correctly. This is not the case with the full chatbot app at https://chat.petals.dev - you can try it out instead.

@quickthrower2

Thanks for the reply. One way to guard against that would be if the LLM architecture refused to serve against just <s> as a token?

@
[deleted by user]
@__rito__

I have used Petals at a past project. I share my GPU as well as wrote code for the project.

The Petals part was abstracted away from me. I had a normal experience writing code.

I don't have the project listed anywhere. Don't really know what happened to it. But, it was mainly some five or so guys spearheading the thing.

@brucethemoose2

> and fine‑tune them for your tasks

This is the part that raised my eyebrows.

Finetuning 70B is not just hard, its literally impossible without renting a very expensive cloud instance or buying a PC the price of a house, no matter how long you are willing to wait. I would absolutely contribute to a "llama training horde"

@AaronFriel

That's true for conventional fine-tuning, but is it the case for parameter efficient fine tuning and qLORA? My understanding is that for a N billion parameter model, fine tuning can occur with a slightly-less-than-N gigabyte of VRAM GPU.

For that 70B parameter model: an A100?

@brucethemoose2

2x 40/48GB GPUs would be the cheapest. But that's still a very expensive system, especially if you don't have a beefy workstation with 2x PCIe slots just lying around.

@LoganDark

Even mATX boards tend to come with two (full-length) PCIe slots, and that's easy sub-$1k territory. Not exactly a beefy workstation.

Source: have a $200 board in my computer right now with two full-length PCIe slots.

@brucethemoose2

Not with full x16/x16, though I suppose you don't necessarily need that.

@LoganDark

Of course, usually the other PCIe slots are something stupid, but there's still a second full-length one, so this could potentially fit two GPUs with the right power supply.

@7speter

Whats more difficult is trying to cool gpus with 24-48gb of RAM… they all seem to be passively cooled

@LoganDark

Good point, I think most of them are designed for a high-airflow server chassis, with airflow in a direction that a desktop case wouldn't necessarily facilitate (parallel to the card).

@segfaultbuserr

Waterblocks exist for some compute-only GPUs, including the Nvidia A100. Also, there are a few small vendors in China that offer mounting kits that allow you to mod these compute-only GPUs to use off-the-shelf AIO watercoolers. Certainly, not many people are going to take the risk to modify the expensive Nvidia A100, but these solutions are moderately popular among the DIY home lab developers to convert older server cards for home workstation use. Decommissioned Nvidia Tesla P100 or V100 can be purchased cheaply for several hundreds dollars.

@LoganDark

> Decommissioned Nvidia Tesla P100 or V100 can be purchased cheaply for several hundreds dollars.

Meh. If you want 16GB of VRAM for several hundred dollars, can't you just pull a brand new 30-series off the shelf and have ten times more computing power than those old pascal cards? You'll even have more VRAM if you go for the 3080 or 3090. Admittedly, the 3090 is closer to $700 or so, but it should still make a P100 very sad in comparison.

@segfaultbuserr

Yeah, these GPUs became less appealing after the prices of 30-series GPUs have dropped. The price of SXM cards are still somewhat unbeatable though if you have a compatible server motherboard [1]. Nvidia P100s are being sold for as low as $100 each, there are similar savings for the Nvidia V100s. But yeah, a saving around $100 to $200 is not really worthwhile...

Another curious contender is the decommissioned Nvidia CMP series GPUs from miners. For example, the Nvidia CMP 170HX basically uses the same Nvidia A100 PCB with its features downsized or disabled (8 GB VRAM, halved shaders, etc). But interestingly, it seems to preserve the full 1500 GB/s memory bandwidth, making it potentially an interesting card for running memory-bound simulations.

[1] Prices are so low exactly because most people don't. SXM-to-PCIe adapters also exist which cost $100-$200 - nearly as much as you have saved. It should be trivial to reverse-engineer the pinout to make a free and open source version.

@LoganDark

Is it possible to take something like a CMP 170HX and do board-level work to add more memory chips? Or are they not connected to silicon?

@segfaultbuserr

I don't believe it's possible. The HBM2e chips are integrated onto the package of the GPU die, making them impossible to remove or modify in a non-destructive manner.

@brucethemoose2

I didn't know the CMP had full bandwidth. that would be a an excellent card for smallish networks (like stable diffusion, GANs, audio networks)

...But it doesn't seem to be cheap. Not really worth it over a 4090 for the same price.

@segfaultbuserr

It seems that the CMP 170HX is being sold for $500 +/- $100 on the flea markets in China as closed mining farms are dumping any remaining inventory. Not sure if the prices are real, I'm currently trying to purchase some.

@brucethemoose2

The Quadros/Firepros have blower coolers.

@
[deleted by user]
@zacmps

I think you'd need 2 80GB A100's for unquantised.

@pama

If one is training the full 70B parameters, then the total memory usage far exceeds the memory for simply storing the 70B parameters (think derivatives and optimizer parameters such as momentum.) This is the main reason why models are split or why techniques like the fully distributed data sharing are used during training. During training of a distributed model, at every step of the optimizer these multiple-of-70B parameters need to go through a network wire (though not to all nodes, thankfully). As you suggested, LoRA could work well in a distributed setting because the trainable parameters are very small in number (tens of thousand of times less trainable parameters) and the info required to go through the network for non trainable parameters is also small. However, training this model on a single A100 is impractical as it would require mimicking a distributed training buffering things on a TB-sized CPU RAM (or slower) to swap pieces in and out of the model during every step in an otherwise distributed operation (and is not natively supported in existing frameworks to the best of my knowledge, even though one could technically write this code without too much difficulty.)

@pavelstoev

You can finetune 40B falcon on 4 x A10 with compiler optimization technology from CentML. No changes to the model.

@YetAnotherNick

Finetuning in a distributed way with questionable network would be lot more energy/cost inefficient than doing it with a single node or a well connected cluster. Also, you can finetune 70b model for million tokens for $2 in lambda cloud or <$10 in replicate.

@akomtu

What prevents parallel LLM training? If you read book 1 first and then book 2, the resulting update in your knowledge will be the same if you read the books in the reverse order. It seems reasonable to assume that LLM is trained on each book independently, the two deltas in the LLM weights can be just added up.

@contravariant

In ordinary gradient descent the order does matter, since the position changes in between. I think stochastic gradient descent does sum a couple of gradients together sometimes, but I'm not sure what the trade-offs are and if LLMs do so as well.

@ctoth

This is not at all intuitive to me. It doesn't make sense in a human perspective, as each book changes you. Consider the trivial case of a series, where nothing will make sense if you haven't read the prior books (not that I think they feed it the book corpus in order maybe they should!), but even in a more philosophical sort of way, each book changes you. and the person who reads Harry Potter first and The Iliad second will have a different experience of each. Then, with large language models, we have the concept of grokking something. If grokking happens in the middle of book 1, it is a different model which is reading book 2 and of course the inverse applies.

@whimsicalism

By the “delta in the LLM weights”, I am assuming you mean the gradients. You are effectively describing large batch training (data parallelism) which is part of the way you can scale up but there are quickly diminishing returns to large batch sizes.

@eachro

I'm not sure this is true. For instance, consider reading textbooks for linear algebra and functional analysis out of order. You might still grok the functional analysis if you read it first but you'd be better served by reading the linear algebra one first.

@necroforest

LLMs are trained in parallel. The model weights and optimizer state are split over a number (possibly thousands) of accelerators.

The main bottleneck to doing distributed training like this is the communication between nodes.

@Grimblewald

This isn't true. Set up even a simple ANN dense feed forward three layers you know the one. Then keep everything the same for two models you train with the exception of data order. You'll end up with two different models even though you started with the same weights, etc.

@dTal

The "deltas" are calculated by the error in how well the current state of the network predicts the output, backpropagated. Sequential runs are not commutative because the state changes.

Consider the trivial example of training a network to distinguish between sample A and sample B. Give it a hundred As in a row and it just learns "everything is A". Give it a hundred Bs in a row and it relearns "no, everything is B". To train it to distinguish, you must alternate As and Bs (and not too regularly, either!)

@malwrar

Impossible? It’s just a bunch of math, you don’t need to keep the entire network in memory the whole time.

@brucethemoose2

Well, any scheme where weights are dynamically loaded/unloaded from memory enough to fit on a 48GB GPU are so slow that training is basically impractical. Your 70B model would be obsolete by the time the finetuning is done.

Some inference frameworks came up with schemes for just this, and it was horrifically slow.

@Zetobal

An H100 is maybe a car but not nearly close to a house...

@ioedward

8 H100s would have enough VRAM to finetune a 70B model.

@nextaccountic

Is a single H100 enough?

@brucethemoose2

80GB is enough, yeah.

I'm not sure what exact LORA/quantization settings would be ideal, but check out https://github.com/OpenAccess-AI-Collective/axolotl#config

@KomoD

Maybe not in your area, but it's very doable in other places, like where I live.

@ShamelessC

You expect me to believe there are other places than where I live?!

@teaearlgraycold

Would love to share my 3080 Ti, but after running the commands in the getting started guide (https://github.com/bigscience-workshop/petals/wiki/Run-Petal...) it looks like there's a dependency versioning issue:

    ImportError: cannot import name 'get_full_repo_name' from 'huggingface_hub' (~/.local/lib/python3.8/site-packages/huggingface_hub/__init__.py)
@esafak

The first question I had was "what are the economics?" From the FAQ:

Will Petals incentives be based on crypto, blockchain, etc.?

  No, we are working on a centralized incentive system similar to the AI Horde kudos, even though Petals is a fully decentralized system in all other aspects. We do not plan to provide a service to exchange these points for money, so you should see these incentives as "game" points designed to be spent inside our system.

  Petals is an ML-focused project designed for ML researchers and engineers, it does not have anything to do with finance. We decided to make the incentive system centralized because it is much easier to develop and maintain, so we can focus on developing features useful for ML researchers.
https://github.com/bigscience-workshop/petals/wiki/FAQ:-Freq...
@kordlessagain

The logical conclusion is that they (the models) will eventually be linked to crypto payments though. This is where Lightning becomes important...

Edit: To clarify, I'm not suggesting linking these Petal "tokens" to any payment system. I'm talking about, in general, calls to clusters of machine learning models, decentralized or not, will likely use crypto payments because it gives you auth and a means of payment.

I do think Petal is a good implementation of using decentralized compute for model use and will likely be valuable long term.

@vorpalhex

I mean, I can sell you Eve or Runescape currency but we don't need any crypto to execute on it. "Gold sellers" existed well before crypto.

@AnthonyMouse

Is there an API for that which doesn't require each of the users to create a separate account on something else?

@Szpadel

if that part could be replaced with any third party server it would be a tracker in BitTorrent analogy.

@sn0wf1re

Similarly there have been distributed render farms for graphic design for a long time. No incentives other than higher points means your jobs are prioritized.

https://www.sheepit-renderfarm.com/home

@brucethemoose2

> similar to the AI Horde kudos

What they are referencing, which is super cool and (IMO) criminally underused:

https://lite.koboldai.net/

https://tinybots.net/artbot

https://aihorde.net/

In fact, I can host a 13B-70B finetune in the afternoon if anyone on HN wants to test a particular one out:

https://huggingface.co/models?sort=modified&search=70B+gguf

@swyx

> GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible

is there a more canonical blogpost or link to learn more about the technical decisions here?

@brucethemoose2

https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md#...

It is (IMO) a necessary and good change.

I just specified gguf because my 3090 cannot host a 70B model without offloading outside of exLlama's very new ~2 bit quantization. And pre quantized gguf is a much smaller download than raw fp16 for conversion.

@swyx

thanks very much!

@nextaccountic

Can they actually prevent people from trading petals for money though?

@beardog

>What's the motivation for people to host model layers in the public swarm?

>People who run inference and fine-tuning themselves get a certain speedup if they host a part of the model locally. Some may be also motivated to "give back" to the community helping them to run the model (similarly to how BitTorrent users help others by sharing data they have already downloaded).

>Since it may be not enough for everyone, we are also working on introducing explicit incentives ("bloom points") for people donating their GPU time to the public swarm. Once this system is ready, we will display the top contributors on our website. People who earned these points will be able to spend them on inference/fine-tuning with higher priority or increased security guarantees, or (maybe) exchange them for other rewards.

It does seem like they want a sort of centralized token however.

@
[deleted by user]
@seydor

It's a shame that every decentralized projects needs to be compared to cryptocoins now

@AnthonyMouse

It's not the comparison, it's that it's one of the things cryptocoins are actually useful for: You have people all over the world with GPUs, some of them want to pay the others for use of them, but their countries use different payment networks or the developers want to be able to automate it without forcing the users to all sign up with the same mercurial payment processor who could screw over any of the users at random.

@littlestymaar

> it's that it's one of the things cryptocoins are actually useful for

It's what their proponent claim that they are useful for, yet there's no single instance of a successful blockchain project actually achieving this kind of resource-sharing goal.

> You have people all over the world with GPUs, some of them want to pay the others for use of them

The gigantic success of bitTorrent shows that humans as a group don't need to have monetary incentives to share their spare hardware. In fact, it's likely that trying to add money into the mix will just break the system instead of improving it: https://en.wikipedia.org/wiki/Overjustification_effect

@joshuaissac

> It's what their proponent claim that they are useful for, yet there's no single instance of a successful blockchain project actually achieving this kind of resource-sharing goal

Sia and Filecoin already work in this way to for people to share storage.

> In fact, it's likely that trying to add money into the mix will just break the system instead of improving it

This depends on the amount of money people are willing to pay for processing power. Volunteer contributions would be reduced, but the paid contributions could make up for it if the people who want to train their model pay enough to attract more people into the system and if those people can compete with conventional commercial offerings.

@littlestymaar

> Sia and Filecoin already work in this way to for people to share storage.

You'll notice that I said “successful” in my original sentence.

> This depends on the amount of money people are willing to pay for processing power. Volunteer contributions would be reduced, but the paid contributions could make up for it if the people who want to train their model pay enough to attract more people into the system and if those people can compete with conventional commercial offerings.

That's a very big “if”: the distributed nature of things is always going to make it more expensive than a traditional solution, especially if you need byzantine fault tolerance (which you need as soon as their monetary value to earn by cheating), the same way that a blockchain is orders of magnitude more expensive than a cloud KV store database, and by pushing the volunteers away you'll end up with a small pool of for-profit actors and these actors themselves likely would be better off if they provided their own cloud offering.

For instance filecoin only has a low thousands nodes, the average filecoin node has something like 10PB of available storage, the top three having 90PB each and making barely $1600 a day, which is $6.4 a year per TB.

@AnthonyMouse

> the distributed nature of things is always going to make it more expensive than a traditional solution

> especially if you need byzantine fault tolerance

For storage this can be done much more efficiently with erasure coding and hashing.

For compute, reputation. A node with no reputation has all of its output verified (and so gets paid less). A node with a good reputation history only gets random spot checks, but fail a spot check and you're back to getting paid less, maybe even retroactively.

> For instance filecoin only has a low thousands nodes, the average filecoin node has something like 10PB of available storage, the top three having 90PB each and making barely $1600 a day, which is $6.4 a year per TB.

So it costs too much and it's too cheap?

The nature of something like this is low barrier to entry, so the high competitiveness is going to result in low prices. That's kind of the idea.

The result is going to be two main categories of supplier. One, huge nodes with economies of scale. These might take lower prices than some retail cloud offering, but they also don't have customer acquisition or support costs. Two, nodes with "free" storage, e.g. you built a media center which is already on 24/7 but still has a few TB of free space, so until you get around to using it yourself you'll take the however much in free money. In both cases because they have lower costs than competing providers.

It sounds like the network is providing several exabytes of storage for an extremely competitive price. How is that not a success?

@littlestymaar

> For storage this can be done much more efficiently with erasure coding and hashing.

More efficient that what exactly? It's still far less efficient than not having to hash and run erasure coding…

> For compute, reputation. A node with no reputation has all of its output verified (and so gets paid less). A node with a good reputation history only gets random spot checks, but fail a spot check and you're back to getting paid less, maybe even retroactively.

That only works if the attacker cannot make big gains from a single cheat after a period of building reputation. There's a reason why this isn't being used in the wild by blockchains…

> So it costs too much and it's too cheap?

Yes, it costs too much to operate, and it's too cheap as a product so operators are losing money. The only reason why there's an offering at all is that some people invested lots of money on hardware in 2021 when the token price was 50 times higher (but then the storage cost was prohibitive).

> It sounds like the network is providing several exabytes of storage for an extremely competitive price. How is that not a success?

Barely anyone using it despite a price so low that it doesn't even allow operators to break even, how is that supposed to be a success?

@AnthonyMouse

> More efficient that what exactly? It's still far less efficient than not having to hash and run erasure coding...

Large storage systems already do this. RAID is a type of erasure coding. Various enterprise storage systems use hashing for data integrity.

Distributing the data over a larger number of systems can actually be more efficient because the percentage of erasure blocks you need goes down as the number of independent devices increases. For example, if you only have two devices and want redundancy, you need a mirror and lose 50% of your capacity. If you spread the data across 200 devices you might achieve an even higher level of resilience by sacrificing only 25% of capacity so you can lose any 50 devices without data loss. You may get this down even further by periodically checking if a device is still available and replacing it, so the number of devices you can lose can be as low as the maximum number of devices you expect to lose simultaneously.

> That only works if the attacker cannot make big gains from a single cheat after a period of building reputation. There's a reason why this isn't being used in the wild by blockchains...

Blockchains are a different thing.

If you have a GPU and to start out with everything you produce is checked, you get 50% of what the customer pays. If you have a good reputation then only e.g. 1 in 10 of your output is checked, so you get ~90% of what the customer pays, but to do that you have to accept 50% for e.g. 100 transactions.

You can now defect, but you have a 10% chance of being detected each time, so you can expect to only get to do it 10 times. So you get a 90% payment ten times without doing work, then have to go back to getting a 50% payment 100 times. 10 times 90% is way less than 100 times 40%, so if you do this you lose money.

> Yes, it costs too much to operate, and it's too cheap as a product so operators are losing money.

Do you know what the overhead of the network actually is? Trying to put it together from multiple sources seems to imply that miners get paid ~$8/TB/year but storage costs ~$2/TB/year. Which I assume I'm doing wrong somehow, because it would imply negative overhead and therefore a huge arbitrage opportunity.

I'm guessing the real number is less than 50% overhead, because there are obvious ways to do it at least that efficiently, but even that isn't huge when you can avoid expenses for marketing and customer support. Which implies that the problem is this:

> The only reason why there's an offering at all is that some people invested lots of money on hardware in 2021 when the token price was 50 times higher (but then the storage cost was prohibitive).

Which is a self-solving problem. The unprofitable providers go out of business until the price makes it profitable. But that seems like it should happen quicker than this if the profitability isn't there, because storage is fungible. Even if you bought a bunch of drives to do this when the price was higher, you could sell them and go put the money in a traditional investment. Or if you're speculating on the value of Filecoin going up, sell your storage and use the money to buy Filecoin. So the people still doing it are presumably turning a profit even at current prices, whether through economies of scale or because they had "free" storage to use.

> Barely anyone using it despite a price so low that it doesn't even allow operators to break even, how is that supposed to be a success?

It causes very inexpensive storage to be available, which is useful.

@littlestymaar

> Large storage systems already do this. RAID is a type of erasure coding. Various enterprise storage systems use hashing for data integrity.

> Distributing the data over a larger number of systems can actually be more efficient because the percentage of erasure blocks you need goes down as the number of independent devices increases. For example, if you only have two devices and want redundancy, you need a mirror and lose 50% of your capacity. If you spread the data across 200 devices you might achieve an even higher level of resilience by sacrificing only 25% of capacity so you can lose any 50 devices without data loss. You may get this down even further by periodically checking if a device is still available and replacing it, so the number of devices you can lose can be as low as the maximum number of devices you expect to lose simultaneously.

You're mixing things up so bad it's hard to correct… For starter regarding filecoin the number of nodes you must expect to lose is “almost all of them” because they can stop operating if the economics isn't even good enough to cover their OpEx (They seem to be fine not covering the CapEx for now, but who knows for how long). It's almost like putting all your data in a datacenter owned by a nearly broke provider: if they go bankrupt you're screwed so you need a plan B.

> You can now defect, but you have a 10% chance of being detected each time, so you can expect to only get to do it 10 times. So you get a 90% payment ten times without doing work, then have to go back to getting a 50% payment 100 times. 10 times 90% is way less than 100 times 40%, so if you do this you lose money.

Again, you're mixing things up. The problem isn't that a node could defect and not do the work (this is assuming non-byzantine fault tolerance), the problem is that a node could voluntarily fuck up the calculations when/if it advantages them. And it could be far less than 10% of the time while still being a nuisance. I don't need to fuck up 10% of the back-propagation calculations in a neural-network training to make it completely unusable/to make the person training it spend way more resource than they should in the training process (which gets me more usage as a node operator).

Adversarial threat modeling is hard, I work with people who do this on a daily basis and I can clearly tell you're oversimplifying things a lot.

> Which is a self-solving problem. The unprofitable providers go out of business until the price makes it profitable. But that seems like it should happen quicker than this if the profitability isn't there, because storage is fungible. Even if you bought a bunch of drives to do this when the price was higher, you could sell them and go put the money in a traditional investment. Or if you're speculating on the value of Filecoin going up, sell your storage and use the money to buy Filecoin. So the people still doing it are presumably turning a profit even at current prices, whether through economies of scale or because they had "free" storage to use.

And yet, it hasn't self-solved itself since 2021… The problem is that node operators get paid roughly what they spend in OpEx, and their capital is essentially illiquid[1] so there's no good reason to stop operating. Of course now the only reason for them to keep invest is the hope that the token price increases again, but because this is crypto it is fueled by the “fantasy of the bull run”, not by an expected uptick in usage (which interestingly enough isn't happening even though the storage price is very cheap).

> It causes very inexpensive storage to be available, which is useful.

Yet barely anyone uses it, which empirically question its “usefulness”.

[1] and I don't get were you got the idea that storage was “fungible”, the failure rate going up exponentially over time makes storage a poor fit for the second-hand market, especially if people know that you've been running stressful proof of space-time on it, and if you're trying to fire-sale a Petabyte of storage, chances are high that people will figure that out