The physical process that powers a new type of generative AI

The physical process that powers a new type of generative AI




OK, so switching to 1/x^2 diffusion, instead of heat-type diffusion, has some advantages. Interesting. Lots of other functions to try. Look at the history of activation functions. Sigmoids, ramps, thresholds.


Does using a physical process as the model by which the network learns and infers create an opportunity to implement the network physically? In this case it seems like it could be an interesting way to build things like modulators and demodulators, amplifiers, noise filters, etc.


The model has little to do with the electric field and much more to do with the probabilistic mechanics that underly observable phenomena. The fact that such phenomena are observable, stable, repeat, mean the probabilistic models underlying them are special. But the generative AI process has nothing to do with electricity or thermodynamics. Cool that insights from physics are useful in statistics (which has a rich history of such examples)


This reminds me of Boltzmann Machines:

I'm having trouble finding a good introduction, but if I remember correctly, BMs started with the premise that nerves that fire together wire together. It can be easier to visualize that process than to think about gradient descent.

Since all NNs can be reduced to a matrix taking an input vector and returning an output vector, I think we often get lost in the weeds arguing about the merits of various types. IMHO it's more useful to compare different classes of machine learning instead.

Honestly LLMs have strayed so far from what I thought was going to happen back in the late 90s that I'm still not sure how I feel about all of this. I think that our brains evolved more like an ant colony or mycelium in a forest, where countless small individuals form ever-more complex networks through mutual aid and evolution. What we think of as consciousness is already present in everything and the network happens to reach a level of complexity high enough for that source to interact with our 3D universe (like a radio antenna). Computing clusters could have achieved that decades ago but we went with SIMD video cards that can only run one algorithm per agent, not MIMD which could evolve each agent individually and run say 100 billion agents simultaneously like the neurons in our brain.

And the fact that AI is going in a SAAS/subscription direction really concerns me. I don't know if there's time now to try these other algorithms before 1 or 2 companies dominate the industry under some kind of New World Order. Maybe it's just me though hah.


I can picture a PKD novel about God being a subscription based AI, perhaps "salvation" requires monthly "tithing" or else you get cut off from everlasting life (or the real grease under the hood : social connections/power). Perhaps you could subscribe to various differing AI gods... hedge your bets as it were through a costly polytheism. Perhaps the social power by those who donate the most... Almost like a Scientology type religion, where those who buy in get more compute power to enact their goals.

Apologies for the derail.


As WEF states it: Every product is a service waiting to happen.


Has anyone used any of the generative models mentioned in the article? Didn’t see any images or direct comparisons of the outputs with current diffusion models

@GaggiX You can see some results in the paper.


That’s beautiful.

Wishful thinking that quantum computing will find some kind of application in this…


Unfortunately, unlike in NLP where swapping out decoding techniques frequently leads to far better generation (i.e. novelty sampling like top-p/top-k for text gen, beam search for seq-seq tasks), this is not necessarily the case for diffusion based image generators.

Despite many methods being mathematically "superior" to traditional euler adaptive, it remains the default settings for a reason in all the major UIs (Automatic1111, Comfy). I have a lot of fun playing with other samplers, especially ones which converge and thus allow large sampling steps, but the impact is not as massive here as it is in NLP and we may be over thinking this side of the pipeline and not thinking enough about other important things (i.e. mixing in other losses besides regular diffusion loss for more control similar to controlnet).