Legends and Lattes
Legends and Lattes was exactly the book I needed at the time I needed it.
A fantasy book where the stakes are far from the end of the world, with a loving centre and enjoyable ride.
I burned through it in about two days; now I wish there were a half dozen more like it.
Crossing an AI Rubicon: Image Generation
(This is the second in a post series that starts here)
My story with Image Generation starts with DALL-E, and so I will start there. I then cover Stable Diffusion and Midjourney before heading into some thoughts — It’s hard to call what I have a conclusion, since I feel so utterly inconclusive about this technology. (Note: Many of the galleries below have captions and commentary)
DALL-E 2
A painting of single poplar tree in fall with leaves falling, lit just before golden hour, that evokes feelings of nostalgia and warmth. This was the prompt that gave me my first result from DALL-E that made me go "oh shit."
It's not a perfect painting; there's certainly some oddities… but this looked way better than it had any right to be.
How did I get here?
I was creating slides for my CMPUT 229 class, and I was discussing one of my favourite assembly mnemonics, eieio, which always puts the song "Old Macdonald" in my head. The slide was a bit barren, so I thought, it would be nice to have some art for this. I'd just been reading a bit about DALL-E, and so I signed up for an account, and after a bit of trial and error had an image I could use for my class.
“A fine art painting of a farmer holding a microprocessor"'
The experience of playing with DALL-E was interesting. The prompts they display on the front page are often very simple things, producing surprisingly coherent results. In reality, excellent results seem to take a bit more effort than the simple prompts they propose — that or this is a question of luck, and access to many many generations for the same prompt.
DALL-E intrigued me heavily, so I played with it, up to the limit provided by their free credits. If you’re even remotely interested in this stuff, I’d encourage you to play with this as well. Even if you find the whole idea viscerally upsetting, it’s worth playing to figure out the strengths and weaknesses — and to be sure, there are weaknesses.
Of course, I opened this post being impressed: There certainly were a few results I found impressive. Even in failure, DALL-E often produced images that were nevertheless aesthetically pleasing (for example, I quite like the failed John Constable painting above).
Unfortunately, the limited credits that came for free with DALL-E limited my ability to explore these systems. I sought out other choices, and the obvious next thing to explore was…
Stable Diffusion
Stable Diffusion is an image generation where the model has been released publicly; this has lead to a number of implementations of the algorithms and apps that have wrapped everything up making it possible to do local generation.
My experience with Stable Diffusion has largely been that the results are not quite up to par with what DALL-E can provide. Partially this is because the model is optimized for producing 512x512 images, where DALL-E does 1024x1024. But more generally I’ve found that prompts the produce lovely results in DALL-E don’t produce results nearly of the same quality with Stable Diffusion.
Having said that, the ability to iterate has been interesting. I’ve played with two wrappers around Stable Diffusion; DiffusionBee and Draw Things AI (very powerful, but I’m not going to lie, the interface is baffling), as well as a python library (the one that powers DiffusionBee I think?)
Perhaps the most interesting thing I’ve found with these tools is the ability to play with parameters. For example, you can use the randomness generation seed, but vary your prompt, to interesting effect:
Notice how the composition mostly stays the same; this is side effect of the same starting seed. Using a command line version of Stable Diffusion, I have done a bit of larger scale experimentation with prompt changing while holding the seed still, producing some interesting effects
“Still life of hydrangeas, artist born around X”, for X in [1400, 2025] in 25 year increments…
Another interesting parameter exposed by these tools is the “guidance” parameter, which as I understand it controls how much the model tries to take your prompt into account. Using 0 (don’t care about my prompt) has produced some wild images:
Midjourney
Midjourney is hard for me to write about, because I don’t understand it. It’s extremely clear they’re doing something clever, as Midjourney can often produce the most remarkable images from the simplest of prompts. Take a brief look through the Midjourney showcase, or look at these (deservedly!) New York Times Feature Article worthy images. Yet I have no idea how or why it works the way it does. I also find it painful to explore, as the interface (at least for free users) is a very noisy set of hundreds of channels on Discord; nothing like experimenting in public.
Despite the discomfort of working in public, it’s interesting to see what others produce. Some prompts are simple, some are complex, but I’m almost uniformly impressed by the results produced by Midjourney.
If I were an artist, Midjourney would be what scared me most — it’s clearly pulling modern styles from artists and reproducing them, sometimes with upsetting fidelity; showing Andrea the gallery and she said “it reminds me of my instagram feed”.
Someone described AI art as "discovery"; which does feel at least a bit apt; having said that, Midjourney has torqued itself incredibly to hit certain aesthetics with minimalist prompts.
Conclusions
It seems pretty clear that the ability to generate “good enough” art is going to have some very wide ranging impacts. As I said in my first post; the discussion of this is extremely challenging to separate from Capitalism. Some people are going to lose their jobs; more as these models get better. Will new jobs be created as a result? It seems to me that this is yet another automation that eliminates a class of jobs, making a smaller number of more valuable positions; another brick on the pedal of inequality.
I haven’t even touched on the questions of art and artistry here: Are the products of these systems art? Art prompt writers artists? Perhaps another post for another day…
Assorted Observations & Notes
My understanding of Stable Diffusion is that the model was trained on a data set released by LAION. There are a couple of tools to explore the data set used to train Stable Diffusion. I’ve played with this one, described here (note, there is NSFW content). Something that truly surprised me was the low quality of the captions. I had really expected that to provide good results the models would need excellent structured captions, yet it’s clearly not the case.
All these models thrive on the constraints provided by giving them an artist to ape. Looking at galleries of AI generated art, like the Midjourney Showcase and you’ll see a good number of the prompts including artists by name, sometimes many of them. For some reason “by Van Gogh” doesn’t nauseate me nearly the way “by Greg Rutkowski” does: this may just be the question of Capitalism again. There are already horrifying stories of models trained on single artists.
In a sense, my feelings about these programs are not directly affected by how they’re implemented; yet I find myself compelled to figure more out. I have only a rough understanding at the moment of how these systems are trained and deployed.
This blog series by Lior Sinai, though I’m only through part one, seems very promising. It’s pushing my math skills though.
These models are far from the end of this work; Google has Imagen, Imagen Video, and Imagen Editor baking. Impressive results. The section on “Limitations and Societal Impact” is a worthwhile read: “There are several ethical challenges facing text-to-image research broadly. We offer a more detailed exploration of these challenges in our paper and offer a summarized version here. First, downstream applications of text-to-image models are varied and may impact society in complex ways. The potential risks of misuse raise concerns regarding responsible open-sourcing of code and demos. At this time we have decided not to release code or a public demo.”
Crossing an AI Rubicon
“A painting of a new disruptive artificial intelligence technology, large models, that will change some things forever, but probably not everything”
I remember being in my CMPUT 466 Machine Learning class in Fall of 2011, when the prof started explaining deep learning. For a brief shining moment it felt like I had understood how deep learning worked… and then the math and the understanding largely abandoned me. Despite getting an A- in that course, I never felt confident in the area.
My interests of course were drawn elsewhere, but I had many opportunities to explore machine learning in various forms. Fairly consistently though, where the opportunity was machine learning I turned it down, repeatedly. For whatever reason, deep learning and its applications never spoke to me, and never really attracted me. Much of it felt like smoke and mirrors -- part of this was watching so many projects consume large amounts of resources, only to fail to find deployment. The places where it seemed to work never stood out to me. I am absolutely certain that to practitioners deep learning models felt revolutionary, I didn't see it myself and so didn't feel compelled to pay attention.
ChatGPT, DALL-E and Midjourney have forced me to acknowledge: We have crossed some sort of Rubicon with these large model technologies. I no longer have the option of ignoring them.
Yet, despite knowing that I have to pay attention… I have struggled mightily to form coherent thoughts here: since I haven't paid attention, I feel a bit like Rip van Winkle, awaking after a twenty years into a future I barely understand. There are so many dimensions here that it's hard to figure out what to think on any of them -- certainly there's many ways in which the dimensions cross.
I want to write down some thoughts about all of this (like every other nerd on the internet), so expect a few blog posts on this subject over a little while. Expect me to alternate hugely between wonder and loathing. Talking this over with a friend, one thing that stood out hugely in our conversation: You can draw wildly different conclusions about this technology depending on whether or not you start from the presumption of capitalism or not. This probably true of all automation technology, but it's pretty clear already that the image generation technology is going to put some artists out of work, and in world where these artists need to make art to eat, that's an upsetting outcome.
I have much to wrestle with, and it's challenging to sort through my thoughts on this. I think the best way for me to organize myself on this is to divide this initial thinking into two pieces: First, I will cover image generation using tools like Stable Diffusion, DALL-E and Midjourney. Next time I will write about Chat GPT.
CMPUT 229 Haikus
I wanted to give students an opportunity to demonstrate a bit of creativity on the final, and give me a bit of a of a bright spot while marking, and so the final question on my final exam was the following:
I had many excellent examples from my students, some of which made me laugh out loud. To avoid this becoming a whole poetry collection, I limited myself to five students, asking if I could post their haikus with their permission.
Without further ado, in a random order:
With Tamed Lightning,
A world solemnly marches,
Keeping rhythmic time
☆
Winter, sun shines not,
Though not halting cosmic rays
A bit flips, blue screen
☆
The Stack Canary
Sings a song to keep away
Those who come to play
☆
Pipelines make it fast
But cause debilitating
headaches for students
☆
Never Code in C,
Not Good for Security,
Segmentation Fault
☆
This was a nice little exercise in fun. Next final I’ll ask for permission to use them on the final themselves, just to avoid having to bother students later!
Teaching CMPUT 229 - Computer Architecture I
Ok— I’m only 98% done, as I have three students who need to write a deferred final. Close enough for this blog post tho.
It is finished. I am done teaching CMPUT 229. It was one of the most intense, rewarding, and challenging times of my life. I am happy I did it, but happier still to see the end of it.
CMPUT 229 is the course where a computing science student will typically first learn assembly language, and low level details about how a CPU works. From the course catalog:
Number representation, computer architecture and organization, instruction-set architecture, assembly-level programming, procedures, stack frames, memory access through pointers, exception handling, computer arithmetic, floating-point representation, datapath, control logic, pipelining, memory hierarchy, virtual memory.
CMPUT 229 holds a special place in my heart. It was the class which most set me on my current career trajectory. Through this class I discovered my enjoyment of working low in the stack, and after this class I started into a pipeline of opportunities which eventually brought me to where I am today. My mind was blown when I realized that I had actually taken this class exactly 15 years ago.
In November 2021, my Master's supervisor Dr. Amaral reached out saying that he would be taking sabbatical, and they needed to find instructors for CMPUT 229 for Fall 2022 and Winter 2023, and asked if I would be interested. I never really got an opportunity to teach during my time at university; I always had enough funding that I never needed to TA, and it's not common for a Master's student to teach any courses. I have nevertheless always been interested in what teaching would be like, and so I jumped on the opportunity. I would teach Fall semester.
The plan was that I would teach the course largely following Nelson's previous editions of the course: I would use his course outline, topics, slides, quizzes etc. Each year, Nelson has students build new labs for the course, so I would select from some that were ready to use, and build the course around that.
In preparation for the course I reviewed the material, and planned out the lectures in a spreadsheet. I was happy to note that if I managed to keep a brisk, but in my opinion managable, pace I would be able to keep some extra time for some lectures of my own design by the end.
When we figured out that we were expecting our second child, I figured that I would be able to manage the workload, and so didn't change my plans to teach.
Teaching
I truly enjoy teaching, especially things I find fascinating. I've known this for a while, as I used to really enjoy mentoring junior employees and interns when I worked at IBM. There's something extremely satisfying about seeing someone gain skills and understanding.
Lecturing went fairly well for the most part. It was occasionally quite challenging to teach someone else's slides where the way the slides worked didn't match my preferred way of telling the story, or where there were errors in the slides. I tried to fix this where I could, but editing slides is an enormous time sink (made worse by the fiddly failures of PowerPoint), and so I didn't do nearly as much as I would have had I more time and energy.
I really enjoyed getting to do office hours with students. The ability to hash out understanding 1-1 in front of a whiteboard is so enjoyable; it's something I really miss from working remotely these last 5 years.
I was very happy I made some extra time at the end of the course. I really enjoyed the fact that I was able to give a lecture about computer security, and that 229 had provided enough background that I was able to give a cogent overview of how the Spectre family of attacks work.
Stupid Questions
Roughly once a month at work we have a meeting we call "Stupid Questions". The goal is to make sure we're willing to ask all the questions we need answered, even if we think they might make us look silly or funny. After a couple of years of having these meetings, I found a description of this process from the science blog Slime Mold Time Mold which now forms the preamble to the document we use for these meetings:
Stupidity is all about preparing you to admit when you’re facing a problem where you don’t know what is going on, which is always. This allows you to ask incredibly dumb questions at any time.
People who don’t have experience asking stupid questions don’t understand how important they can be. Try asking more and dumber questions — lean in on how stupid you are. You will find the world opening up to you. Ignorant questions are revealing!
Fairly early on I was sure I wanted to bring this tradition into the classroom -- though I'll admit I was nervous about it; you really don't want to offend your students by calling them stupid. Eventually I made myself sufficiently brave to take a half an hour of class and present the idea of "Stupid Questions", and get students to ask them.
This went remarkably well. I truly appreciated the ability to re-explain concepts from another persepective, help students clarify their understanding, and generally have slightly off-topic discussion. I think we only did two or three stupid questions sessions, but I thought they were some of the best teaching I did.
Student Reactions
I have to say, I'm happy with the student reactions I got to my teaching. While I still don't have access to my official "Student Perspectives On Teaching" report for the end of the course, in general feedback in-person or via the midterm-evaluation I received was quite positive. Students were certainly annoyed at the level of difficulty and the workload of the course -- but to an extent, that's unavoidable for a course as ambitious as CMPUT 229. The challenges students highlighted are challenges I was distinctly aware of while teaching (slide issues, lectures that weren't as good as I wanted them to be, etc). I think I got better as a lecturer over time.
My RateMyProfessor page I think is a good testament to the work I put into this course, with uniformly high ratings.
Administration
Adminstration was a bit of a thorn in my side throughout this process. There were hiccups with the university (ask me about contract issues over a beer one day), and then there was the actual administration of the course itself. Managing students absence requests, clarifications about assignments, requests for extensions etc, all ate a surprising amount of time. There's far more work involved in managing a course than I ever expected.
Examinations
While assignments and labs were marked by the TAs for the course (each of whom has my eternal gratitude for reducing the workload on me), midterms and finals had to be written and marked by me.
Marking 150+ exams is a very painful process. I put more effort into marking the midterms than perhaps was necessary, as I tried to give part marks and highlight where students made calculation mistakes. Still, each exam took days to finish marking, crammed into evenings and weekends around my work. Suddenly Scantron exams seem horribly tempting.
One of the lessons I learned this semester is that I need to have a writing partner to write exams. It's far too hard to write and edit my own exams and correctly identify drafting errors. That problem where you, the author, read your intent not your words, struck me badly on each of the midterms and the final exam. It was truly embarrasing to have to provide clarification to students live as they wrote their exams.
Alas, while it is really interesting and I plan to write more about AI stuff, ChatGPT won't suffice. While I think that it would have just-about passed my final exam, it was unable to identify missed information in some of the questions. Should I ever teach again, I think I would lean more heavily on my TAs, getting each of them to help with the drafting of a question for the exams. This would spread the workload, while helping make sure the questions were correctly done.
Workload
I'll be the first to admit: I definitely underestimated the workload of running this course. Part of this was my own fault; I never asked what the course enrollment was. So when I found out I was going to be teaching a 250 person section I was a bit gobsmacked.
Combine that with the new baby, and all the disruption to our lives that brought, and it's a wonder I didn't have a meltdown this semester. On the plus side, the section size dropped continuously until the final withdrawal deadline, and which point we stabilized at the final student count of 148.
When I went back to work with Mozilla, I was effectively working 60 hours weeks, every week. This had a real impact on my family life, my health, and my relationships. I definitely was not a fan. I let more things slip in my personal life than I am proud of, trying to manage all the pieces. While we've come out the other side intact, it was not nearly the light workload I had planned on, and much more stressful than I expected. Combine the stress with comparatively low pay, and it's a hard thing to argue in favour of. If someone is teaching casually, they're teaching for non-tangible reasons, that's for sure.
Lessons For the Future
All that being said, I sort of hope I have an opportunity to teach again, sometime in the future (5-10 years from now perhaps?). As such, there's some lessons I'd leave myself:
- Definitely don't have a baby two weeks before class starts. That was a rookie mistake on my part, and shouldn't be repeated.
- Dedicate enough prep time to ensure the assignments are in a good shape. I was embarassed at how many times students would ask me questions in office hours about their assignments, only to realize that editorial work on them previously had removed improtant information, or made it unclear where information needed to come from.
- Do more work on expectation setting with TAs. This was my first time teaching, and I was a pretty bad manager, because I had an unclear set of expectations that were poorly communicated. Thank you to my TAs for sticking it out with me. Having said that, the next lesson is:
- Lean more heavily on your TAs.
Conclusion
Teaching CMPUT 229 was a fascinating experience, that was extremely stressful and more impactful on my family than I would have like. I learned a lot about myself, and I think that I would do it again, but only if I was in a better place to manage the workload. Hopefully though, should I ever teach again, the experience I had this time will make the course substantially easier to the second time around.
Moanin' - Art Blakey & The Jazz Messengers
Watching this, I was struck by how professional this band looks, compared to the picture in my head of casual people hanging out in a smokey room in a basement somewhere.
Computational Nostalgia
ClarisWorks inside of System7.app
The above is a screenshot from System7.app, a website running a Macintosh emulator. In it, is a ClarisWorks document with the following contents:
Holy shit, I definitely remember using Clarisworks.
I don't normally consider myself nostalgic, and yet here I am, powerfully overwhelmed with nostalgia yet again, by an operating system I didn't use at the time.
What is it about these system emulators that speak to me so?
Why do I sort of long for a past I didn't really know? There's something peaceful here, disconnected in a way?
Another thing that stands out to me is immense legibility of the interface: The old greys may not be sexy, but I find them so legible compared to so many modern interfaces.
Anyhow. Guess who wants to buy an old Powerbook now.