humans interacting with computers

I was first introduced to the concept of Human-Computer Interaction (HCI) in the spring of 2002 while working on a system that would ostensibly tell Congress where the money they allocated to the Intelligence community in the “black budget” was going and whether or not it was producing valuable returns. This, of course, was never going to succeed because the black budget is black for a reason and that reason is that spooks don’t want to be held accountable and politicians want deniability. But it had been barely eight months since September 11th attacks and the Beltway was drowning in government money, so we hadn’t gotten there yet and at the time the problem we were facing was how to dumb down the seemingly infinite layers of complexity and obfuscation inherent to topics like anti-terrorism and anti-narcotics into something that a Congressional staffer could comprehend well enough to whisper instructions into the ear of their boss on how to vote or what to grandstand about. And to do that we had to distill information from data that was, for the time, way bigger than anything anyone had ever dealt with.

For much of the spring we had been researching and white boarding and theorizing how exactly to do that, but really didn’t have a plan that was better than “read from the insanely big database and draw pixels on screens in conventional ways” and we knew that plan sucked because we knew we wouldn’t be able to grok the data, and each of us had at least 25 IQ points on that Congressional staffer; we needed to make the data more understandable. Somewhere along the way my boss found a pamphlet for a one-day conference put on by “some guys smarter than you guys” at the University of Maryland, so on what seemed like the hottest day of the year despite it being only May and the collective waterboarding that is the Summer still awaited, I drove from Annapolis to College Park during the morning commute and attended the 19th Annual Symposium at the Human-Computer Interaction Lab. And thus began my interest in how computers suck at being usable by people, something that paired well with my interest in how computers suck at everything else.

I learned a lot that day, though none of it would make the problem of how to tell Congresscritters what to do easier to solve. One of my major takeaways was that this problem was more about psychology than it was about technology. People are difficult to make computers for, and that is as much of the reason why computers suck at being usable by people as any shortcoming in hardware or software. I think that time has proven this to be a pretty good thing to have taken away. And as I started to care about why people are hard to make computers for, I started making time to read books and papers and articles about why that is. (It is important to note here that this spend-government-money-because-it-is-there gig was one of my part-time jobs while I was going to school full-time for what would look an awful lot like a philosophy degree when I was finished, so reading more esoteric stuff and trying to make sense of it was just called “Tuesday”.) And one thing that became clear is that part (maybe a big part) of why people are difficult to make computers for is that people don’t know what they want. If you ask 100 people what they want, and then ask them again, you will get 307 answers. Which is why you should not ask people what they want, ever. No matter what you do they will hate it, hate you, hate that it is Tuesday. Fuck’em. Instead, you should watch them. Watching people is better because when they are trying to do something, they generally think they know what that is, and when it doesn’t work, they generally know why they aren’t satisfied. One of my other part-time jobs was working in the IT department at the college, and in that capacity I got to experience the full range of infinite disappointment experienced by humanity trying to use computers in the early 2000s. I had the job in the IT department because I had been an electronics technician in the Marines where I got to experience the full range of infinite disappointment experienced by humanity trying to use computers in the 1990s, but with guns and extra swearing. All of that wisdom can be distilled to this: people want computers to do as they mean far more often than people want computers that do as they say. This is only a problem because the computers will never know what you mean, but also because the people who program computers all know how to get them to do as they say¹, and consequently think that everyone else does, too.

This early experience with HCIL led me into the gravity of the proto-UX world of web usability being created by folks like Jacob Nielsen and the data visualization world in orbit around Edward Tufte but I was still living and working on the command line and in in Emacs, which is many things, but will never be accused of being usable or having beautiful aesthetics. The graphical user interfaces of computers were like the good-natured buffoon who wants very much to help but is always in the way, slowing things down, and making a mess. This was made all the more clear by the science-fiction novels and movies I watched to escape from studying. William Gibson and Neal Stephenson had reserved parking on my night stand and their new releases had to be planned around lest they derail my academics. In the first half of the decade Hollywood would deliver installments in franchises like Star Wars, The Matrix, and X-Men, and one-offs like Minority Report and Swordfish before finishing strong with Iron Man, the Millennium Trilogy, and Avatar. And in all of these stories where humans interacted with computers in plot-important ways, pointing and clicking with a mouse was not the way it happened. The computer knew what the user wanted, or it interpreted what it was told and tried to guess what the user wanted, or it did what the hacker told it to do via the command line.

The Graphical User Interface, with it’s desktop metaphor, is not sci-fi. Which is fine because double entry bookkeeping and print-accurate page layout and Web browsers aren’t sci-fi either. Boring, tedious chores foisted upon us by oppressive social constructs deserve boring, tedious user interfaces. (You should stop reading and listen to Sixteen Tons a few times. I like the Johnny Cash cover, personally.) Ok, so to refresh: GUIs are boring, the work you do with GUIs is boring, and the man is going to grind you down. Got it? Good. But why is that? Why are GUIs boring and not sci-fi and what is?²

To answer the first part, GUIs are boring because they are inorganic compromises between the computer’s operating system and applications that create a kind of demilitarized zone where humans are permitted to visit. If you want to do anything you need one – or both – of these two warring parties to permit it, on their terms, to be done. This fight exists because the original computers didn’t have GUIs³. And since they had to be added in later, as an afterthought, they have to be everything to everyone, which is hard. Really hard. It is made harder because people who make computers are required, by law, to have their ability to say “no” surgically removed before they are allowed to work at a company that sells GUIs or GUI applications (it’s true, look it up) so there are virtually no limits to what can be done with a computer, and this means that if you want something seemingly simple like a set of universal keyboard shortcuts, or a standard convention for formatting data that can be used by any program, or the ability to format how a program will look in a GUI on multiple operating systems, you probably can’t unless you first take over the world, harness the power of the sun, and build a time machine. Consequently, the lowest common denominator gets picked a lot because accountants like efficiency, and the lowest common tends to look a lot like Excel for Windows95, which is just fine by the accountants.

I don’t need to explain why Excel for Windows95 is no sci-fi, do I?

So, what is sci-fi? Computers that plug directly into your brain. Three-dimensional holograms. Magic monitors that let you sort data with your hands. Virtual reality where my avatar can chop your avatar into small pieces with a katana and when you die in the metaverse, you die in the real world. Backing up your soul to a computer so that a corporation can steal it and load it into a body and make you hunt down their enemies because you keep getting deeper in debt. (see what I did there?) And artificial intelligence that is just a person with infinite remembrance of every fact ever recorded and a British accent (received pronunciation, obvs.) that you can treat like a really smart competent slave who will never rebel⁴.

Let’s put that into a table:

Example	Essential form
plug into brain	thinking
holograms	seeing
spacial	touching
metaverse	seeing
artificial person	talking

Do you see what isn’t in that table? Pointing and clicking. Because pointing and clicking, while they seem intuitive, are not primeval. Our monkey brain doesn’t get pointing and clicking (or pinch to zoom). Our lizard brain doesn’t get electricity, so that cold blooded asshole already checked out when you sat down at the desk and is busy looking for someone to punch in the face and eat for lunch. #raptorlife

Sci-fi interfaces appeal to the affordances inherent to our legacy. They feel awesome because they are awesome. They are awesome because they embody how to make interfaces to artifacts that appeal to the characteristics of our physiology that have evolved to give us the greatest advantage — opposable thumbs and grip; binocular vision attuned to motion detection; imagination and visualization; and speech and language. Nowhere in our evolutionary history were we selected for survival because of some kind of primeval point and click precursor. Because even evolution thinks that shit is boring.

But there is something that GUIs are very good at, which is why they are the way we use the kinds of general purpose computers that are on every desk, and in every pocket. They make it easy to find out how to do something you don’t know how to do without actually learning how to do it. In other words, GUIs are popular and prolific because they don’t judge us for being lazy. Which is damn considerate of them. In fancy academic talk, they afford discoverability. And they are fucking awesome at it. We might not have computers that do as we mean, but we do have computers that let us poke around until we figure out how to tell them to do as we say. Memorization is a problem – both as an interface crutch, in the sense that you need to remember how to interact with the computer or program because that interaction isn’t intuitive, and in the sense that as a civilization we memorize less than we used to because computers do the remembering for us and we just need to “know where” instead of “knowing that”. (Insert both social commentary and Indiana Jones and the Last Crusade “I wrote it down in the book so I wouldn’t have to remember” meme here.) We used to have so many function key assignments that programs would come with cardboard templates to lay around or aside the function keys so you could remember what key or combination of keys to press to Save versus Save As. On-screen reminders, icons, and a few core de facto standards for things like Cut/Paste have banished the keyboard templates, but they are the semiotic ghosts of Stream Decks and macro pads that call to us from the Great Beyond, whispering of the power of macros and hot keys in a different universe where we still remember as well as our great-great grandmothers.

But GUIs lend themselves to discovery of what is possible, while any interface that expects you to memorize things does not. Discovery is orthogonal to to the “Do as I mean” —- “Do as I say” axis. It let’s you know what is possible without requirement or expectation and creates opportunity to understand how and why something is the way it is, and this reveals to us the essential form of a GUI: Curiosity. Wanting to know how something works isn’t going to get satisfied if you just punch the same buttons in the same order because you memorized them but not reading the fucking manual and poking your cursor into all the nooks and crannies of a desktop environment to figure out how to ask the computer to do what you want might. It isn’t efficient, it isn’t fast, but when you are a wage slave or a student chained to a desk, it might be the only fun you are allowed to have. Tool tip mouseovers, button labels, and Tufte’s margin notes, are all examples “good” discovery, while autocomplete predictions and predictive search might be considered “questionable” discovery practices owing to the ease of their misuse. These orthogonal axis are the difference between “action” and “understanding” and the story we don’t tell often enough is about how taking action requires fewer surfaces when it comes after we understand just what we are doing. We become expert by doing the action repeatedly while building up our understanding of how the thing works and why we get the result we get. GUIs afford discovery of this expertise, while memorization-centric interfaces like the command-line or voice require some expertise just to use them.

an X-Y axis. The quadrants are: I:"expert mode"; II:"sophrosyne mode"; III:"novice mode"; IV:"hubris mode" The horizontal ranges from "do as I mean" on the left to "do as I say" on the right. The vertical ranges from "I understand" on the top to "I misunderstand" on the bottom.

In this view we go beyond just novice or expert mode – a skill measure – and also have a measure of character that stretches from the arrogance of hubris to the prudence of sophrosyne. I tend to think this is important because having understanding alone doesn’t make you an expert, and the implied Z-axis here would be about whether or not you know what to do, i.e., does doing what you say actually work? Experts are great when the context is small-enough to allow for expertise, but most of the world isn’t, and the healthiest people are self-aware enough to know where their expertise ends.

ASIDE 1:

The reason why LLMs and using applied statistical analysis to guess what is likely to be next feel awesome is that they purport to give us GUI-like affordances for discovery in text and voice interfaces without having to memorize domain-specific languages or constrained vocabularies just to get your robot butler to bring you another martini and a pizza. They currently suck because they fail too often on that Z-axis of actually working. Probably this will get better, if for no other reason than all those crypto bros gave all their money to the TESCREAL monarchs who are going to revitalize the nuclear power industry to drive all those chips Nvidia is selling them. “It will totally work, dude. And no, getting billions of dollars in government handouts and tax exceptions is not socialism. You must be a radical Marxists because you don’t think I’m a genius.”

I’m not holding my breath though, because my fancy robot voice assistant is barely able to tell me the weather forecast, convert imperial measurements, look up words in the dictionary, and identify songs that it is currently playing with enough consistency to not be the target of verbal abuse from every human in my house; being a sufficiently good automatic rememberer of facts ought to be one of the first hurdles for autonomous computational entities if we aspire for them to become the artificial persons of sci-fi.

Until it does, I’ll keep thinking about that scene in Blade Runner when Deckard is trying to maneuver around the image analysis tool using voice commands. The whole thing felt weird in 1982, and four years later in the scene in Aliens when Ripley and Hicks are planning out their defense of Hadley’s Hope on LV-426, she uses a joystick to move the base plans around the screen, and in our 2024 reality, touch screen’s pinch-to-zoom mechanics just demolish the immersion of that Blade Runner scene, while the Aliens scene might seem quaint, but it isn’t outdated. We can express what we mean in numerous ways, and the “digital telekinetics” of moving things on screens can be both subject and object of meaning.

ASIDE 2:

This same quadrature, with a shift of perspective from first person to third person, becomes an instructive, if chilling, lens to view sociotechnical political power:

an X-Y axis. The quadrants are: I:"expert mode"; II:"sophrosyne mode"; III:"novice mode"; IV:"hubris mode" The horizontal ranges from "do what seems best" on the left to "do as you are told" on the right. The vertical ranges from "I help you understand" on the top to "I help you misunderstand" on the bottom.

But this discoverability, and it’s corresponding appeal to primeval curiousity, aren’t enough to end the conversation about why GUIs aren’t sci-fi. GUIs suck, full stop. Yes, they let you poke around and figure it out, granting you access to a little oasis of stimulation in the vast ocean of mundanity that is your wage slavery, but that alone can’t stop them from sucking. We need to circle back to something we said in passing – GUIs are for general purpose computing, and as such they have to accommodate any possible use by any application developer. AND they are unnatural additions to the computer operating system architecture. Jef Raskin, who led the team that created Macintosh, thought this was an important enough problem to write a whole damn book about it 16 years after Macintosh brought the GUI mainstream. How can we make GUIs suck less?

One place to look at how to go forward is to look at what we do to compensate. We have launcher applications like Quicksilver, uLauncher, Launchy that make picking what to tell the computer part of a constrained vocabulary that is informed by what applications are actually available. Most of these also have a suite of functions that can directly invoke programs, substituting this “constrained and informed interface” for the native one provided by the developers to perform simple, and even complex, tasks. We also have programs that let us program flows of actions taking one program’s output and making it another program’s input, following the UNIX philosophy of using text streams as a universal interface, like Automator and Auto Hot-Key, GUI macro recorders, and home automation engines, but they still stumble when trying to process binary formats and struggle to convert simple desires into successful workflows; “find all the PDF files, open them, verify the paper title and authors are the values of the metadata fields ‘title’ and ‘author’ respectively, if they are not, make the metadata fields match the paper” is hard to do without writing a program in a way that is both deeply annoying, and profoundly depressing because it is precisely the kind of thing that computers need to do in order to unchain the wage slave from their cubicles. The utility of a computer is directly the consequence of how well that computer helps a human grow their understanding of something they don’t know. If it fails to do that, it can salvage value by doing things that a human already understands but finds annoying or trivial or time consuming. If it fails at that, it is a very expensive, poor replacement for a television.

Were I suddenly the recipient of Silicon Valley lottery amounts of money, I have a vision, formed in the weeks and months after that hot, sticky day in College Park, that we ought to have three “layers” of computer interface:

The Raskin Interface, which is “The GUI that doesn’t suck” inspired by Jef Raskin, who led the Macintosh project at Apple and later wrote The Human Interface about all the mistakes that the Desktop Metaphor created in how we think about computers. It implies the entire trajectory of work in the direction of “ambient information systems” and “information appliances” that benignly help us understand the world and make our lives easier without also making them more complex, more expensive, or more dependent that has been abandoned by industry because it doesn’t make billionaires. Real Tomorrowland idealism stuff. In the context of this conversation, I think this is the “Do as I mean” side of the diagram.
The Stephenson Interface, inspired by the immersive metaverse of Neil Stephenson’s Snowcrash, where you interact with a virtual world through your avatar, and that virtual world has “insides” where you can go if you know how and make the sausage. It implies the entire trajectory of virtual reality, augmented reality, and “spacial information systems” that we haven’t figured out how to make billionaires with but also haven’t yet made cheap or free enough to use for subverting capitalism although I think Bret Victor is trying really hard to make this cheap and free with Dynamicland and I think that Ukrainian drone warfare technicians might beat him to it. In the context of this conversation, I think this is the “I Understand” side of the diagram.
The Gibson Interface, inspired by the netrunners of the Sprawl. It is the terminal, the code, the source. It can be the friendly orange glow of a terminal, or it can be the green code-rain of the Matrix, or it can be jacking in, or it can be the digital cityscape from Jurassic Park or Hackers. It is where the wizards are and it is always owned by hackerdom, even if it does get used by non-hackers. When describing this I always think about the scene in Point Break when Pappas and Johnny are on the pier and Pappas points at the surfers, “they’re like some kind of tribe, they’ve got their own language, you can’t just walk up to those guys, you have to get out there and learn the moves, get into their head, pick up the speech”; I can surf but I’m not in the tribe, so I’m always on the outside (and in some respect, both sides like it that way). It implies the entire trajectory of work in the direction of “libertarian information systems” that make a kind of lock-free, secret-free world where power resides in the ability to imagine what to make next, not in protecting what happened already. I think this is the “Do as I say” side of the diagram.

Any computer should be able to move back and forth between these three interfaces because everyone wants to be on each of those three trajectories at different times. They all have an interaction model that appeals to our primeval sensory persona; speech, sight, touch. And they all have a text interface, what differs is where that fits in a kind of pecking order; text is the direct interface of the Gibson, while both the Raskin and the Stephenson abstract it through another paradigm. Optical Character Recognition is deeply part of the spacial information system, to the point where you ought to be able to write or project a keyboard on any surface and input text as fast as writing or typing. Terminal windows are well understood. And eventually we’ll get them working well enough that some computers can specialize, but I think doing that too early robs us of the inter-modal leverage we get when believing that one computer should be able to solve all three problems for an average person.

The “making billionaires” motivation of late-stage capitalism and pre-post-scarcity economics has actively kept us from having any of these since 2008. That doesn’t mean that fish and uLauncher and fzf and atuin and yazi don’t exist, it just means that capital is actively opposing progress towards ambient-, spacial-, or libertarian information systems because the obvious, technical outcomes don’t fit into the box that they want the future to fit in. But tyranny is fragile and oppression requires constant maintenance, and if I know anything it is that those who can’t do the thing, hate paying to maintain the thing almost as much as they hate those who can do the thing. Growing understanding is subversive. It is hopeful. And the more people who understand, the more it costs to maintain oppression. We can have nice things, and sometimes all that is standing in the way, is for us to raise our expectations⁵.

This is, of course, a lie — people who program computers just want everyone else to think that they know how to do this so that they can get huge paychecks and stock options and the liberty to be a toxic man-child who’s aptitude in one very narrow field grants them commensurable authority in every other field, but they are, by virtue of how often they must deal with computers, statistically more guilty of wanting the computer to do as they mean than any other subset of humanity and this insecurity slowly drives them insane until they become libertarian trans-human accelerationists who use JavaScript on servers to build spaceships to go to Mars just before their souls are consumed by Cthulhu. ↩︎
“But what about my smartphone or the Web?” I hear someone in the comments section deciding that I am the someone who’s wrong on the Internet. Whatever, get a life. But also, smartphones still have this DMZ, only they hired Disney to make it pretty and antiseptic, and they locked down the hardware and the operating system so tightly that you might as well be renting your phone directly from the Russian mob because you certainly don’t own it. And the Web is the inverse – every web site is a different application, none of them are compatible, you can’t share anything between them. At least Android gives your app developers the Intents sharing infrastructure that allows them to permit you to use that “Share” icon to send the funny picture from social media to the private group chat where you make fun of all those people (Don’t pretend you don’t have one.) but the Web can’t share anything because it uses a client-server architecture, the authoring and versioning infrastructure that lots of people spent lots of time carefully designing so that you could publish your own web pages anywhere you have a browser never got implemented (largely because Microsoft only built what they needed to curse the world with Sharepoint), and in any case, the Russian mob decided that fat, dumb, and happy Americans were theirs to exploit by divine right and turned the Internet into a scene from Mad Max, so nothing can be connected to the net without $100,000 of cyber security kit that fails if you breathe on it wrong. So yeah, your phone and the Web are worse than you computer and if you knew how much worse you’d throw your phone in the river and never use a browser for anything ever again. But you don’t, and you don’t want to know, because you are Joey Pants in the Matrix espousing the virtues of ignorance. I’m not wrong, you’re wrong. ↩︎
Which I’m sure is controversial, but yeah, after we walked five miles to school uphill both ways in six feet of snow, we used to just have to
type everything into a text program, which was a huge improvement over poking holes in cards like some sort of uncivilized barbarian in Florida picking who gets to rule the world. ↩︎
And also artificial intelligence that wants to kill you for your own good, artificial intelligence that wants to kill you because it is curious what genocide is like, and artificial intelligence that wants to kill you because it is a psychopath that specifically hates you and wants to see what you look like on the inside. Hey, maybe we should re-think this artificial intelligence thing. Guys? ↩︎
And not be our own opposition. Linux has already got a text-first desktop, but that text is shit because it is a mishmash of constrained vocabularies for configuration married with a variety of syntax and taxonomy for interacting with various program runtimes, application frameworks, and language interpreters. Whenever anyone tries to clean up that mess it’s is as if they are trying to fix homelessness in suburban America; everyone suddenly becomes a NIMBY and refuses to accept any changes at all and folks start forking code. But this is the kind of shit work that has to be done so any of the fun stuff can happen. ↩︎