Diffusionary | AI Supremacy via Txt2Img
I’ve been working on a Stable Diffusion model based on public domain Life magazine covers; last week I had some neat examples from the black and white print era of the late 1890s and early 1900s. This week I incorporated two more decades worth of colour covers from the 1910s and 1920s. I think this ended up being too ambitious in terms of the number of styles I added into the mix; the results from the model aren’t quite coherent enough, so I will be going back to the drawing board. I may end up releasing one specifically for black and white drawings and another for colour paintings. My learning from this with respect to the source material for models is that sometimes it’s better to have fewer higher quality sources than a larger amount of mediocre or dissimilar ones. Here are a few of the ok-ish/weird/funny samples from the model:






One of the major news items this past week was that AI pioneer Geoffrey Hinton left Google so that he could speak freely about the dangers of AI. I think it’s clearly dangerous for humans to create an entity that’s “smarter" than humans, but I don’t foresee us getting close to that in the near future. I also wonder if we give precedence to language models like ChatGPT in terms of thinking about sentience because the inputs and outputs are language-based and many people equate language with thought.
If we consider how current LLMs operate, there’s a circuit where we give them a text prompt and they respond. The LLM is in an inert state prior to this and afterwards. It’s not thinking of devious ways to take over the world; it’s in the same state as any simple program that is waiting for input before doing work. LLMs also have a kind of memory buffer that lets them remember context, so that they can carry out a conversation over a series of prompts, making corrections or additions for a request. This memory buffer is a natural safety mechanism, because once it’s exhausted, those contextual memories disappear and the model resets back to its base. If you wanted to create safety protocols for LLM-based AIs, restricting the size of that memory would be a good starting point.
The other thing I wonder about is why nobody is concerned that image generators like Stable Diffusion will spontaneously gain sentience. The thought of it is kind of absurd, I agree, but the interactions that we have with LLMs and the interactions we have with text-to-image generators are structured very similarly. It’s just that image generators produce inert pictures, while LLMs produce language, and language is so fused with our sense of our own cognitive capabilities that a language output more readily seems to us like a form of thinking. I know that very large language models produce unexpected emergent behaviours, but I think we’re playing a cognitive trick on ourselves when we equate these with the emergence of some form of sentience.
In other news, DeepFloyd IF was released; it’s a multi-step modular neural network that creates a very small (64x64) image and then passes it through a series of upscalers. It looks amazing but requires at least 16GB of VRAM.
Researchers at Google documented a huge speed boost and VRAM savings running diffusion models across all devices including cell phones.
OpenLLaMa is a permissively licensed reproduction of Meta’s LLaMa LLM. LLaMa is restrictive in the sense that it can’t be used commercially, so this seems like a good potential replacement.
A gigantic thread of tips and tricks for running Stable Diffusion via the Automatic1111 web UI.
The code for Segment Anything All at Once has been released. SEEM lets you segment any part of an image so that it can be masked or otherwise altered.
A post that details training a base model for Stable Diffusion from scratch at a cost of $50,000.
A very cool looking library that lets you create a 3D object from a single image.
Until next week!