Dwarf Fortress’ graphical upgrade provides a new way into a wildly wonky game

Stable Diffusion 2.0

hardmaru
1362
490
12d
16
Technology
STABILITY.AI

Comments

rogers18445 12d
They apparently tried to combat NSFW generation by filtering the training dataset not to include any.
minimaxir 12d
GitHub Repo: https://github.com/Stability-AI/stablediffusion

HuggingFace Space (currently overloaded unsurprisingly): https://huggingface.co/spaces/stabilityai/stable-diffusion

Doing a 2.0 release on a (US) 2-day holiday weekend is an interesting move.

It seems a tad more difficult to set up the model than the previous version.

LASR 12d
I am a solo dev working on a creative content creation app to leverage the latest developments in AI.

Demoing even the v1 of stable diffusion to the non-technical general users blows them away completely.

Now that v2 is here, it’s clear we’re not able to keep pace in developing products to take advantage of it.

The general public still is blown away by autosuggest in mobile OS keyboards. Very few really know how far AI tech has evolved.

Huge market opportunity for folks wanting to ride the wave here.

This is exciting for me personally, since I can keep plugging in newer and better versions of these models into my app and it becomes better.

Even some of the tech folks I demo my app to, are simply amazed how I can manage to do this solo.

liuliu 12d
Seems the structure of UNet hasn't changed other than the text encoder input (768 to 1024). The biggest change is on the text encoder, switched from ViT-L14 to ViT-H14 and fine-tuned based on https://arxiv.org/pdf/2109.01903.pdf.

Seems the 768-v model, if used properly, can substantially speed-up the generation, but not exactly sure yet. Seems straightforward to switch to 512-base model for my app next week.

wyldfire 12d
This one's publicly downloadable? I think I must've missed 1.5. it had been postponed for a while (for good reasons discussed throughout threads here) and I didn't notice whether it had been released.
natch 12d
> greatly improves the quality of the generated images compared to earlier V1 releases

Not to look a gift horse in the mouth, but this line looks a lot like intentional ambiguity. I’m going to assume it does not improve quality much at all compared to later V1 releases.

wyldfire 12d
depth2img looks really interesting. I was thinking that someone should train an art model like SD on 3d models+textures. This isn't quite that but it seems like it gets some of that effect.
WatchDog 12d
Are the GPU memory requirements different for this release?

Is now it possible to generate higher resolution images with less memory?

satvikpendem 12d
Looks good. I've gotten bored with AI image generation these days however, after using a lot of SD the past few months. I suppose that's the hedonic treadmill in action.
knicholes 12d
Does anyone have a good source for all sorts of prompts for image generation?
kmeisthax 12d
Is there a good explanation of how to train this from scratch with a custom dataset[0]?

I've been looking around the documentation on Huggingface, but all I could find was either how to train unconditional U-Nets[1], or how to use the pretrained Stable Diffusion model to process image prompts (which I already know how to do). Writing a training loop for CLIP manually wound up with me banging against all sorts of strange roadblocks and missing bits of documentation, and I still don't have it working. I'm pretty sure I also need some other trainables at some point, too.

[0] Specifically, Wikimedia Commons images in the PD-Art-100 category, because the images will be public domain in the US and the labels CC-BY-SA. This would rule out a lot of the complaints people have about living artists' work getting scraped into the machine; and probably satisfy Debian's ML guidelines.

[1] Which actually does work

1024core 12d
What's the bare minimum hardware required to generate images with this model? Can I do something with a 8GB 980? Probably not..? What about CPU only?
88stacks 12d
Awesome, I’ve put stable diffusion on an api to train a model for anyone to sue for free. I’m adding 2.0 to it as we speak! https://88stacks.com
vsskanth 12d
Can this transform an image into a vector illustration ?
coldblues 12d
I just thought about this, so bare in mind that I don't know much of the technical implications of this, but:

Couldn't we train a very good model by distributing the dataset along with the computing power using something similar to folding@home?

prawn 12d
Hopefully related: If I'm a photographer wanting to improve resolution of my content for printing, what's my current best bet for upscaling?

Is it realistic to make use of this on the command line, feeding it my own images? Or has someone wrapped it in an app or online service?

sorenjan 12d
I've seen references to merging models together to be able to generate new kinds of imagery or styles, how does that work? I think you use Dreambooth to make specialized models, and I think I got an idea about how that basically assigns a name to a vector in the latent space that represents the thing you want to generate new imagery of, but can you generate multiple models and blend them together?
radu_floricica 12d
Speaking of business models for AI, and the fact that stable diffusion is anti-trained for porn. Somebody with an old terra byte image porn collection right now: "Hold my beer, my time has come!"
imran-khan 12d
To put things in perspective, the dataset it's trained on is ~240TB and Stability has over ~4000 Nvidia A100 (which is much faster than a 1080ti). Without those ingredients, you're highly unlikely to get a model that's worth using (it'll produce mostly useless outputs).

That argument also makes little sense when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".

But if you want to create custom versions of SD, you can always try out dreambooth: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion, that one is actually feasible without spending millions of dollars on GPUs.

imran_khan 12d
To put things in perspective, the dataset it's trained on is ~240TB and Stability has over ~4000 Nvidia A100 (which is much faster than a 1080ti). Without those ingredients, you're highly unlikely to get a model that's worth using (it'll produce mostly useless outputs).

That argument also makes little sense when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".

But if you want to create custom versions of SD, you can always try out dreambooth: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion, that one is actually feasible without spending millions of dollars on GPUs.

imran_khan 12d
trained on is ~240TB and Stability has over ~4000 Nvidia A100 (which is much faster than a 1080ti). Without those ingredients, you're highly unlikely to get a model that's worth using (it'll produce mostly useless outputs).

That argument also makes little sense when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".

But if you want to create custom versions of SD, you can always try out dreambooth: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion, that one is actually feasible without spending millions of dollars on GPUs.

NKosmatos 12d
Wow, just wow!

Newbie question, why can’t someone just take a pre-trained model/network with all the settings/weights/whatever and run it on a different configuration (at a heavily reduced speed)?

Isn’t it like a Blender/3D studio/Autocad file, where you can take the original 3D model and then render it using your own hardware? With my single GOU it will take days to raytrace a big scene, whereas someone with multiple higher speced GPUs will need a few minutes.

kennyloginz 12d
This reminds me a bit of the rash of Thomas Kinkaid storefronts during the 90s.

Nothing personal against the work, I think it’s brilliant, and cheap. Just like a Kinkaid

s1k3s 12d
Is there any place where we can learn more about all these AI tools that keep popping up, that is not marketing speak? Also, I see the words 'open' and 'open source' and yet they all require me to sign up to some service, join some beta program, buy credits etc. Are they open source?
rahul_nyc 12d
Perfect, this is exciting for me personally since I’ve been using Stable Diffusion + Dreambooth to develop this service[1] for generating AI images of people.

One thing I’m wondering is what kind of different applications it can be used. Maybe there will be new experiences in the fashion industry like people can train their cloth designs and see how it looks on people. Maybe they don’t need to hire models to do the modelling?

[1] - https://PicasaAI.com (Founder)

Scandiravian 12d
What's the potential of using this for image restoration? I've been looking into this recently as I've found a ton of old family photos, that I'd like to digitize and repair some of the damage on them

There are a lot of tools available, but I haven't found anything where the result isn't just another kind of bad, so if the upscaling and inference in this model is good, it should in theory be possible to restore images by using the old photos as the seed, right?

myrloc 12d
“Adoption” is a generous term to use for a description of Github stars (referring to the first graph). There’s no denying stable diffusion has been gaining popularity, but I think it’s hard to say it’s really being adopted at the same rate it’s getting starred on Github.
ghaff 12d
I suspect that, if many of the people whining about copyrights in the context of generative AI got their way and made this usage a violation, they wouldn't be happy with the knock-on effects.
in3d 12d
In addition to removing NSFW images from the training set, this 2.0 release apparently also removed commercial artist styles and celebrities. While it should be possible to fine tune this model to create them anyway using DreamBooth or a similar approach, they clearly went for the safe route after taking some heat.
tormeh 12d
I can't see any progress on AMD/Intel GPU support :( Would love to see Vulkan or at least ROCm support. With SD1 you could follow some guides online to make it work, since PyTorch itself supports ROCm, but the state of non-Nvidia GPU support in the DL space is quite sad.
spcebar 12d
Well darn. This is an awesome leap, but I've spent the last few months making a card game using Stable Diffusion art and I guess now I need to go back and go over everything again. Congratulations to the SD team on another wonderful step forward!
acidburnNSA 12d
Awesome. I'm installing on Ubuntu 22.04 right now.

Ran into a few errors with the default instructions related to CUDA version mismatches with my nvidia driver. Now I'm trying without conda at all. Made a venv. I upgraded to the latest that Ubuntu provides and then downloaded and installed the appropriate CUDA from [1].

That got me farther. Then ran into the fact that the xformers binaries I had in my earlier attempts is now incompatible with my current drivers and CUDA, so rebuiding that one. I'm in the 30-minute compile, but did the `pip install ninja` as recommended by [2] and it's running on a few of my 32 threads now. Ope! Done in 5 mins. Test info from `python -m xformers.info` looks good.

Damn still hitting CUDA out of memory issues. I knew I should have bought a bigger GPU back in 2017. Everyone says I have to downgrade pytorch to 1.12.1 for this to not happen. But oh dang that was compiled with a different cuda, oh groan. Maybe I should get conda to work afterall.

`torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 5.93 GiB total capacity; 5.62 GiB already allocated; 15.44 MiB free; 5.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF`

Guess I better go read those docs... to be continued.

[1] https://developer.nvidia.com/cuda-downloads?target_os=Linux&...

[2] https://github.com/facebookresearch/xformers

nessus42 9d
There's nothing in the release notes that says whether 2.0 can do hands without a 99% chance of producing deformed results.
nl 12d
Highlights:

768x768 native models (v1.x maxed out at 512x512)

a built-in 4x upscaler: "Combined with our text-to-image models, Stable Diffusion 2.0 can now generate images with resolutions of 2048x2048–or even higher."

Depth-to-Image Diffusion Model: "infers the depth of an input image, and then generates new images using both the text and depth information." Depth-to-Image can offer all sorts of new creative applications, delivering transformations that look radically different from the original but which still preserve the coherence and depth of that image (see the demo gif if you haven't looked)

Better inpainting model

Trained with a stronger NSFW filter on training data.

For me the depth-to-image model is a huge highlight and something I wasn't expecting. The NSFW filter is a nothing (it's trivially easy to fine-tune the model on porn if you want, and porn collections are surprisingly easy to come by...).

The higher resolution features are interesting. HuggingFace has got the 1.x models working for inference in under 1G of VRAM, and if those optimizations can be preserved it opens up a bunch of interesting possibilities.