Crossing an AI Rubicon: Image Generation

(This is the second in a post series that starts here)

My story with Image Generation starts with DALL-E, and so I will start there. I then cover Stable Diffusion and Midjourney before heading into some thoughts — It’s hard to call what I have a conclusion, since I feel so utterly inconclusive about this technology. (Note: Many of the galleries below have captions and commentary)

DALL-E 2

A painting of single poplar tree in fall with leaves falling, lit just before golden hour, that evokes feelings of nostalgia and warmth. This was the prompt that gave me my first result from DALL-E that made me go "oh shit."

It's not a perfect painting; there's certainly some oddities… but this looked way better than it had any right to be.

How did I get here?

I was creating slides for my CMPUT 229 class, and I was discussing one of my favourite assembly mnemonics, eieio, which always puts the song "Old Macdonald" in my head. The slide was a bit barren, so I thought, it would be nice to have some art for this. I'd just been reading a bit about DALL-E, and so I signed up for an account, and after a bit of trial and error had an image I could use for my class.

“A fine art painting of a farmer holding a microprocessor"'

The experience of playing with DALL-E was interesting. The prompts they display on the front page are often very simple things, producing surprisingly coherent results. In reality, excellent results seem to take a bit more effort than the simple prompts they propose — that or this is a question of luck, and access to many many generations for the same prompt.

DALL-E intrigued me heavily, so I played with it, up to the limit provided by their free credits. If you’re even remotely interested in this stuff, I’d encourage you to play with this as well. Even if you find the whole idea viscerally upsetting, it’s worth playing to figure out the strengths and weaknesses — and to be sure, there are weaknesses.

Of course, I opened this post being impressed: There certainly were a few results I found impressive. Even in failure, DALL-E often produced images that were nevertheless aesthetically pleasing (for example, I quite like the failed John Constable painting above).

Unfortunately, the limited credits that came for free with DALL-E limited my ability to explore these systems. I sought out other choices, and the obvious next thing to explore was…

Stable Diffusion

Stable Diffusion is an image generation where the model has been released publicly; this has lead to a number of implementations of the algorithms and apps that have wrapped everything up making it possible to do local generation.

My experience with Stable Diffusion has largely been that the results are not quite up to par with what DALL-E can provide. Partially this is because the model is optimized for producing 512x512 images, where DALL-E does 1024x1024. But more generally I’ve found that prompts the produce lovely results in DALL-E don’t produce results nearly of the same quality with Stable Diffusion.

Having said that, the ability to iterate has been interesting. I’ve played with two wrappers around Stable Diffusion; DiffusionBee and Draw Things AI (very powerful, but I’m not going to lie, the interface is baffling), as well as a python library (the one that powers DiffusionBee I think?)

Perhaps the most interesting thing I’ve found with these tools is the ability to play with parameters. For example, you can use the randomness generation seed, but vary your prompt, to interesting effect:

Notice how the composition mostly stays the same; this is side effect of the same starting seed. Using a command line version of Stable Diffusion, I have done a bit of larger scale experimentation with prompt changing while holding the seed still, producing some interesting effects

“Still life of hydrangeas, artist born around X”, for X in [1400, 2025] in 25 year increments…

Another interesting parameter exposed by these tools is the “guidance” parameter, which as I understand it controls how much the model tries to take your prompt into account. Using 0 (don’t care about my prompt) has produced some wild images:

Midjourney

Midjourney is hard for me to write about, because I don’t understand it. It’s extremely clear they’re doing something clever, as Midjourney can often produce the most remarkable images from the simplest of prompts. Take a brief look through the Midjourney showcase, or look at these (deservedly!) New York Times Feature Article worthy images. Yet I have no idea how or why it works the way it does. I also find it painful to explore, as the interface (at least for free users) is a very noisy set of hundreds of channels on Discord; nothing like experimenting in public.

Despite the discomfort of working in public, it’s interesting to see what others produce. Some prompts are simple, some are complex, but I’m almost uniformly impressed by the results produced by Midjourney.

If I were an artist, Midjourney would be what scared me most — it’s clearly pulling modern styles from artists and reproducing them, sometimes with upsetting fidelity; showing Andrea the gallery and she said “it reminds me of my instagram feed”.

Someone described AI art as "discovery"; which does feel at least a bit apt; having said that, Midjourney has torqued itself incredibly to hit certain aesthetics with minimalist prompts.

Conclusions

It seems pretty clear that the ability to generate “good enough” art is going to have some very wide ranging impacts. As I said in my first post; the discussion of this is extremely challenging to separate from Capitalism. Some people are going to lose their jobs; more as these models get better. Will new jobs be created as a result? It seems to me that this is yet another automation that eliminates a class of jobs, making a smaller number of more valuable positions; another brick on the pedal of inequality.

I haven’t even touched on the questions of art and artistry here: Are the products of these systems art? Art prompt writers artists? Perhaps another post for another day…

Assorted Observations & Notes

  • My understanding of Stable Diffusion is that the model was trained on a data set released by LAION. There are a couple of tools to explore the data set used to train Stable Diffusion. I’ve played with this one, described here (note, there is NSFW content). Something that truly surprised me was the low quality of the captions. I had really expected that to provide good results the models would need excellent structured captions, yet it’s clearly not the case.

  • All these models thrive on the constraints provided by giving them an artist to ape. Looking at galleries of AI generated art, like the Midjourney Showcase and you’ll see a good number of the prompts including artists by name, sometimes many of them. For some reason “by Van Gogh” doesn’t nauseate me nearly the way “by Greg Rutkowski” does: this may just be the question of Capitalism again. There are already horrifying stories of models trained on single artists.

  • In a sense, my feelings about these programs are not directly affected by how they’re implemented; yet I find myself compelled to figure more out. I have only a rough understanding at the moment of how these systems are trained and deployed.

  • These models are far from the end of this work; Google has Imagen, Imagen Video, and Imagen Editor baking. Impressive results. The section on “Limitations and Societal Impact” is a worthwhile read: “There are several ethical challenges facing text-to-image research broadly. We offer a more detailed exploration of these challenges in our paper and offer a summarized version here. First, downstream applications of text-to-image models are varied and may impact society in complex ways. The potential risks of misuse raise concerns regarding responsible open-sourcing of code and demos. At this time we have decided not to release code or a public demo.”

CMPUT 229 Haikus

I wanted to give students an opportunity to demonstrate a bit of creativity on the final, and give me a bit of a of a bright spot while marking, and so the final question on my final exam was the following:

For two points, write a haiku about Computer Architecture. As a reminder, the Haiku form has three lines, where the first and third line have five syllables, and the second line has seven syllables. As an example, here’s one ChatGPT wrote, poorly...

I had many excellent examples from my students, some of which made me laugh out loud. To avoid this becoming a whole poetry collection, I limited myself to five students, asking if I could post their haikus with their permission.

Without further ado, in a random order:

With Tamed Lightning,

A world solemnly marches,

Keeping rhythmic time

by Benjamin Lehmann

Winter, sun shines not,

Though not halting cosmic rays

A bit flips, blue screen

by Nick Bjarnason

The Stack Canary

Sings a song to keep away

Those who come to play

by Saba Gul

Pipelines make it fast

But cause debilitating

headaches for students

by William Creaser

Never Code in C,

Not Good for Security,

Segmentation Fault

by Liam Houston

This was a nice little exercise in fun. Next final I’ll ask for permission to use them on the final themselves, just to avoid having to bother students later!

Polybius: The Rise of the Roman Empire

After reading Thucydides, I was fascinating by the idea of contemporaneous historians; people writing histories of times they lived through, of events they had even participated in.

When hunting for another contemporaneous history, Polybius was the first choice that seemed to come up in my searches. Polybius was a Greek hostage of the Romans, of a peculiar sort with a seemingly wide amount of freedom during his captivity that allowed him access many high tiers of Roman society; he was an active in both Greek and Roman politics in some of the times he wrote of in his history.

Despite the seeming alignment with what I was looking for, when I tried to read Polybius last time, I bounced off of him, and ended up reading Herodotus instead.

On a vacation last summer, we were in a wonderful used bookshop, and I picked up a few classical histories. Plutarch, another Xenophon (haven’t read yet) and The Rise of the Roman Empire, which is a selection of a subset Polybius’ Histories) by F. W. Walbank, that uses Polybius’ discussions of Rome to chart its growth into the super power it became.

Through this edition, I was able to get through Polybius. It required some effort, but I am glad I did it.

A forewarning; This blog post is filled with far more long quotes than any other. This is partially because in many cases, Polybius makes his case far better than any restatement I could make, and partially because I now can scan text via my phone, so I didn’t have to type out the quotes by hand.

The History

Can any one be so indifferent or idle as not to care to know by what means, and under what kind of polity, almost the whole inhabited world was conquered and brought under the dominion of the single city of Rome , and that too within a period of not quite fifty-three years? Or who again can be so completely absorbed in other subjects of contemplation or study, as to think any of them superior in importance to the accurate understanding of an event for which the past affords no precedent. -Plb 1.1.

Polybius believed that it was important to trace cause and effect through history, and so his history covers an enormous amount of ground; however to keep it tractable, Walbank focuses The Rise of the Roman Empire on the parts of Polybius’ history that talk about Rome, it’s interaction with other nations, and the story of Carthage, as told via the Barca family of Hamilcar, Hasdrubal and Hannibal, however it also interweaves some of Polybius’ own philosophy of both politics and history.

Polybius viewed history as an important education for statesmen, and saw history as valuable only insofar as it provided true stories of decision making that provides either positive or negative lessons to readers. He also thought it was critical that people who wrote history understood, by experience, the politics they would document:

There is, however, another category of authors, who appear to be justified in undertaking the writing of history, but who in fact are just like the theoretical doctors. They haunt the libraries and become thoroughly versed in memoirs and records, and then convince themselves that they are properly equipped for the task; but while they may appear to outsiders to bring everything that is needed to the writing of political history, yet in my opinion they provide no more than a part. Certainly the, study of the memoirs of the past has its value for discovering what the ancients believed and the ideas which people formerly entertained about conditions, places, nations, states and events, and also for understanding he circumstances and eventualities with which each nation in earlier times had to deal. And certainly past events are relevant in making us pay attention to the future, provided that a writer inquires in each case into the facts as they actually occurred. But to persuade oneself, as Timaeus did, that the resources of documentary research alone can equip one to write an adequate history of recent events is naive beyond words. It is as though a man were to imagine that he was a capable painter, indeed a master of the art, merely by virtue of having looked at the works of the past. — Plb 12.25e

The narrative history was all new ground for me, covering the three Punic wars (though, the Third Punic War was not covered with much depth). Honestly, I found that Polybius did a good job of achieving exactly his goal set at the introduction: Explaining how the Romans came to rule the known world in a very short period of time.

One of the things I love about reading history is the way it reminds me that human institutions are fragile, and change is always just around the corner.

Below, I cover two small sections of the history I found interesting, sharing some block quotes.

The Cycle of History

One of the ideas that Polybius is famous for is his notion of the ‘Rotation of the Polities’. He sees political systems as following in a natural cycle of succession. In his analysis, there are six different forms of government (or constitutions):

  1. Kingship
  2. Tyranny
  3. Aristocracy
  4. Oligarchy
  5. Democracy
  6. Mob rule

Each pair of these are a ‘good’ and ‘bad’ version of the same thing:

The truth of what I have just said may be illustrated by the following arguments. We cannot say that every example of one-man rule is necessarily a kingship, but only those which are voluntarily accepted by their subjects, and which are governed by an appeal to reason rather than by fear or by force. Nor again can we say that every oligarchy is an aristocracy, but only those in which the power is exercised by the justest and wisest men, who have been selected on their merits. In the same way a state in which the mass of citizens is free to do whatever it pleases or takes into its head is not a democracy. But where it is both traditional and customary to reverence the gods, to care for our parents, to respect our elders, to obey the laws, and in such a community to ensure that the will of the majority prevails - this situation it is proper to describe as democracy. — Plb 6.4

From natural beginnings, political systems transition, Kingship to Tyranny, to Aristocracy, to Oligarchy, to Democracy, to mob rule, and then finally a new king arises from the mob, and the cycle begins again.

Living in times where democracy feels fraught across the globe, it was interesting to read Polybius’ take on the decline of democracy (emphasis mine):

At this point the only hope which remains unspoiled lies with themselves, and it is in this direction that they then turn: they convert the state into a democracy instead of an oligarchy and themselves assume the superintendence and charge of affairs. Then so long as any people survive who endured the evils of oligarchical rule, they can regard their present form of government as a blessing and treasure the privileges of equality and freedom of speech. But as soon as a new generation has succeeded and the democracy falls into the hands of the grandchildren of its founders, they have become by this time so accustomed to equality and freedom of speech that they cease to value them and seek to raise themselves above their fellow-citizens, and it is noticeable that the people most liable to this temptation are the rich. So when they begin to hanker after office, and find that they cannot achieve it through their own efforts or on their merits, they begin to seduce and corrupt the people in every possible way, and thus ruin their estates. The result is that through their senseless craving for prominence they stimulate among the masses both an appetite for bribes and the habit of receiving them, and then the rule of democracy is transformed into government by violence and strong-arm methods. By this time the people have become accustomed to feed at the expense of others, and their prospects of winning a livelihood depend upon the property of their neighbours; then as soon as they find a leader who is sufficiently ambitious and daring, but is excluded from the honours of office because of his poverty, they will introduce a regime based on violence. After this they unite their forces, and proceed to massacre, banish and despoil their opponents, and finally degenerate into a state of bestiality,’ after which they once more find a master and a despot. Such is the cycle of political revolution, the law of nature according to which constitutions change, are transformed, and finally revert to their original form. - Plb 6.9

Every kind of state, we may say, is liable to decline from two sources, the one being external, and the other due to its own internal evolution. For the first we cannot lay down any fixed principle, but the second pursues a regular sequence. I have already indicated which kind of state is the first to evolve, which succeeds it, and how each is transformed into its successor, so that those who can connect the opening propositions of my argument with its conclusion will be able to make their own forecast concerning the future. This, in my opinion, is quite clear. When a state, after warding off many great perils, achieves supremacy and uncontested sovereignty, it is evident that under the influence of long-established prosperity life will become more luxurious, and among the citizens themselves rivalry for office and in other spheres of activity will become fiercer than it should. As these symptoms become more marked, the craving for office and the sense of humiliation which obscurity imposes, together with the spread of ostentation and extravagance, will usher in a period of general deterioration. The principal authors of this change will be the masses, who at some moments will believe that they have a grievance against the greed of other members of society, and at others are made conceited by the flattery of those who aspire to office. By this stage they will have been roused to fury and their deliberations will constantly be swayed by passion, so that they will no longer consent to obey or even to be the equals of their leaders, but will demand everything or by far the greatest share for themselves. When this happens, the constitution will change its name to the one which sounds the most imposing of all, that of freedom and democracy, but its nature to that which is the worst of all, that is the rule of the mob. — Plb 6.57

History doesn’t repeat, but it rhymes, feels appropriate here.

Views on Roman Superstition

I thought this passage was remarkable:

However, the sphere in which the Roman commonwealth seems to me to show its superiority most decisively is in that of religious belief.

Here we find that the very phenomenon which among other peoples is regarded as a subject for reproach, namely superstition, is actually the element which holds the Roman state together. These matters are treated with such solemnity and introduced so frequently both into public and into private life that nothing could exceed them in importance.

Many people may find this astonishing, but my own view is that the Romans have adopted these practices for the sake of the common people. This approach might not have been necessary had it ever been possible to form a state composed entirely of wise men. But as the masses are always fickle, filled with lawless desires, unreasoning anger and violent passions, they can only be restrained by mysterious terrors or other dramatizations of the subject. For this reason I believe that the ancients were by no means acting foolishly or haphazardly when they introduced to the people various notions concerning the gods and belief in the punishments of Hades, but rather that the moderns are foolish and take great risks in rejecting them. — Plb 6.56

There’s a modern-feeling cynicism about religion here; he talks about Roman religion as if it exists only as a tool, instilled in the Roman people as a means to the end by the powerful. He might even be right, and yet, in this he reminds me of a dark-version of Marx, praising the Roman opioid as a key to social control.

The Edition

I read Penguin Classics The Rise of the Roman Empire by Polybius, selected by F.W. Walbank, and translated by Ian Scott-Kilvert (the same one who translated the Plutarch I read).

The introduction was good and did an excellent job of introducing Polybius’ work, or at least the selection Walbank made. I’m not sure I would have read the book had it not been for the quality of the introduction. Each chapter lead with a good introductory note, setting the stage and providing historical context, which was very helpful.

Last time I read Polybius, I accused him (or the translation) of being stilted and awkward. This time, knowing how much I appreciated the translation of Plutarch by Scott-Kilvert, I suspect the blame lay on Polybius. (During the writing of this blog post, also used Tufts’ Perseus Digital Library, which has the Evelyn S. Shuckburgh translation, which really showed how much I appreciate Scott-Kilvert’s translation skills, and his use of a more modern vernacular).

I still found him to be stilted for the early chapters this time around, but over time I adapted, or he became a better writer and it bothered me appreciably less.

Despite the promise of the edition, it was overall one of the weakest classic history editions I’ve read:

  • The Footnotes were of strongly varying quality: some useful, some clearly superfluous or misplaced and no notes to fill in the missing pieces. While I don’t think Polybius needed the very detailed notation that I had in the edition of Thucydides that I read , I was nevertheless let down often by this edition’s supplemental information.
  • The maps are atrocious. They are almost entirely unusable through huge swathes of the book. Admittedly, I’m hugely spoiled from reading the Landmark Ancient Histories, which are absolutely lousy with excellent maps, but the maps in this book were just trash.
  • There’s a diagram of a Roman camp, which is trying to help you visualize what Polybius describes in one chapter… but all the labels are in un-translated latin. Which makes it a pain to decipher.