(2017-12-31) Nielsen Using Artificial Intelligence To Augment Human Intelligence
Michael Nielsen and Shan Carter: Using Artificial Intelligence to Augment Human Intelligence (Augmenting Human Intellect). What are computers for? Historically, different answers to this question – that is, different visions of computing – have helped inspire and determine the computing systems humanity has ultimately built
In the 1950s a different vision of what computers are for began to develop. That vision was crystallized in 1962, when Douglas Engelbart proposed that computers could be used as a way of augmenting human intellect
This vision of intelligence augmentation (IA) deeply influenced many others, including researchers such as Alan Kay at Xerox PARC, entrepreneurs such as Steve Jobs at Apple, and led to many of the key ideas of modern computing systems.
Research on IA has often been in competition with research on artificial intelligence (AI)
IA has typically focused on building systems which put humans and machines to work together, while AI has focused on complete outsourcing of intellectual tasks to machines.
This essay describes a new field, emerging today out of a synthesis of AI and IA. For this field, we suggest the name artificial intelligence augmentation (AIA): the use of AI systems to help develop new methods for intelligence augmentation.
Our essay begins with a survey of recent technical work hinting at artificial intelligence augmentation, including work on generative interfaces
We believe now is a good time to identify some of the broad, fundamental questions at the foundation of this emerging field. To what extent are these new tools enabling creativity? Can they be used to generate ideas which are truly surprising and new, or are the ideas cliches, based on trivial recombinations of existing ideas? Can such systems be used to develop fundamental new interface primitives? How will those new primitives change and expand the way humans think? (GenAI)
Using generative models to invent meaningful creative operations
imagine you’re a type designer, working on creating a new font
Let’s examine a tool to generate and explore such variations, from any initial design.
Of course, varying the bolding (i.e., the weight), italicization and width are just three ways you can vary a font. Imagine that instead of building specialized tools, users could build their own tool merely by choosing examples of existing fonts
To build these tools, we used what’s called a generative model; the particular model we use was trained by James Wexler
if the font is 64x64 pixels, then we’d expect to need 64×64=4,096 parameters to describe a single glyph. But we can use a generative model to find a much simpler description
We do this by building a neural network which takes a small number of input variables, called latent variables, and produces as output the entire glyph
The generative model we use is a type of neural network known as a variational autoencoder (VAE).
The generative model we use is learnt from a training set of more than 50 thousand fonts Bernhardsson scraped from the open web
the model doesn’t just reproduce the training fonts. It can also generalize, producing fonts not seen in training
Such generative models are similar in some ways to how scientific theories work. Scientific theories often greatly simplify the description of what appear to be complex phenomena, reducing large numbers of variables to just a few variables from which many aspects of system behavior can be deduced. Furthermore, good scientific theories sometimes enable us to generalize to discover new phenomena.
consider ordinary material objects. Such objects have what physicists call a phase – they may be a liquid, a solid, a gas, or perhaps something more exotic, like a superconductor or Bose-Einstein condensate. A priori, such systems seem immensely complex, involving perhaps 1023 or so molecules
the laws of thermodynamics and statistical mechanics enable us to find a simpler description, reducing that complexity to just a few variables (temperature, pressure, and so on), which encompass much of the behavior of the system.
Furthermore, sometimes it’s possible to generalize, predicting unexpected new phases of matter. For example, in 1924, physicists used thermodynamics and statistical mechanics to predict a remarkable new phase of matter, Bose-Einstein condensation.
Returning to the nuts and bolts of generative models, how can we use such models to do example-based reasoning like that in the tool shown above? Let’s consider the case of the bolding tool.
We’ll refer to this as the bolding vector. To make some given font bolder, we simply add a little of the bolding vector to the corresponding latent vector,
This technique was introduced by Larsen et al, and vectors like the bolding vector are sometimes called attribute vectors
The tools we’ve shown have many drawbacks
But even with the model we have, there are also some striking benefits to the use of the generative model.
For example, a naive bolding tool would rapidly fill in the enclosed negative space in the enclosed upper region of the letter “A”. The font tool doesn’t do this.
The heuristic of preserving enclosed negative space is not a priori obvious. However, it’s done in many professionally designed fonts. If you examine examples like those shown above it’s easy to see why: it improves legibility. During training, our generative model has automatically inferred this principle
the model captures many other heuristics
The font tool is an example of a kind of cognitive technology. In particular, the primitive operations it contains can be internalized as part of how a user thinks. In this it resembles a program such as Photoshop or a spreadsheet or 3D graphics programs. Each provides a novel set of interface primitives, primitives which can be internalized by the user as fundamental new elements in their thinking
Using the same interface, we can use a generative model to manipulate images of human faces using qualities such as expression, gender, or hair color. Or to manipulate sentences using length, sarcasm, or tone. Or to manipulate molecules using chemical properties...
Such generative interfaces provide a kind of cartography of generative models, ways for humans to explore and make meaning using those models. (user interface)
we can ask why attribute vectors work, when they work, and when they fail? At the moment, the answers to these questions are poorly understood.
Interactive Generative Adversarial Models
One of the examples of Zhu et al is the use of iGANs in an interface to generate images of consumer products such as shoes
Zhu et al train a generative model using 50 thousand images of shoes, downloaded from Zappos. They then use that generative model to build an interface that lets a user roughly sketch the shape of a shoe, the sole, the laces, and so on:
The visual quality is low, in part because the generative model Zhu et al used is outdated by modern (2017) standards – with more modern models, the visual quality would be much higher.
But the visual quality is not the point. Many interesting things are going on in this prototype.
The same interface may be used to sketch landscapes. The only difference is that the underlying generative model has been trained on landscape images rather than images of shoes.
the underlying idea is still to find a low-dimensional latent space which can be used to represent (say) all landscape images, and map that latent space to a corresponding image
Like the font tool, the iGANs is a cognitive technology. Users can internalize the interface operations as new primitive elements in their thinking
Having an interface like this enables easier exploration, the ability to develop idioms and the ability to plan, to swap ideas with friends, and so on. (sharing)
Two models of computation
Let’s revisit the question we began the essay with, the question of what computers are for, and how this relates to intelligence augmentation.
One common conception of computers is that they’re problem-solving machines
This is a conception common to both the early view of computers as number-crunchers, and also in much work on AI, both historically and today. It’s a model of a computer as a way of outsourcing cognition
But a very different conception of what computers are for is possible, a conception much more congruent with work on intelligence augmentation.
To understand this alternate view, consider our subjective experience of thought. For many people, that experience is verbal
For other people, thinking is a more visual experience
Still other people mix mathematics into their thinking
In each case, we’re thinking using representations invented by other people: words, graphs, maps, algebra, mathematical diagrams, and so on.
Historically, lasting cognitive technologies have been invented only rarely. But modern computers are a meta-medium enabling the rapid invention of many new cognitive technologies.
Consider a relatively banal example, such as Photoshop. Adept Photoshop users routinely have formerly impossible thoughts such as: “let’s apply the clone stamp to the such-and-such layer.”.
It’s this kind of cognitive transformation model which underlies much of the deepest work on intelligence augmentation. Rather than outsourcing cognition, it’s about changing the operations and representations we use to think
AI systems can enable the creation of new cognitive technologies. Things like the font tool aren’t just oracles to be consulted when you want a new font. Rather, they can be used to explore and discover, to provide new representations and operations, which can be internalized as part of the user’s own thinking
There are many other examples of artificial intelligence augmentation. To give some flavor, without being comprehensive: the sketch-rnn system, for neural network assisted drawing; the Wekinator, which enables users to rapidly build new musical instruments and artistic systems; TopoSketch, for developing animations by exploring latent spaces; machine learning models for designing overall typographic layout; and a generative model which enables interpolation between musical phrases. In each case, the systems use machine learning to enable new primitives which can be integrated into the user’s thinking
Finding powerful new primitives of thought
What properties should we look for in such new primitives?
breakthroughs in representation often appear strange at first. Is there any underlying reason that is true?
our advocacy of strangeness in representation contradicts much conventional wisdom about interfaces, especially the widely-held belief that they should be “user friendly”, i.e., simple and immediately useable by novices
Ideally, an interface will surface the deepest principles underlying a subject, revealing a new world to the user
What does this mean for the use of AI models for intelligence augmentation?
it’d be better if the model automatically inferred the important principles learned, and found ways of explicitly surfacing them through the interface. (Encouraging progress toward this has been made by InfoGANs, which use information-theoretic ideas to find structure in the latent space.)
But we’re a long way from that point.
Do these interfaces inhibit creativity?
it’s helpful to identify two different modes of creativity. This two-mode model is over-simplified
The first mode of creativity is the everyday creativity of a craftsperson engaged in their craft. Much of the work of a font designer, for example, consists of competent recombination of the best existing practices
For such work, the generative interfaces we’ve been discussing are promising.
The second mode of creativity aims toward developing new principles that fundamentally change the range of creative expression
Picasso or Monet
Is it possible to do such creative work, while using a generative interface? Don’t such interfaces constrain us to the space of natural images
The situation is more complex than this.
Artists such as Mario Klingemann and Mike Tyka are now using GANs to create interesting artwork. They’re doing that using “imperfect” GAN models, which they seem to be able to use to explore interesting new principles; it’s perhaps the case that bad GANs may be more artistically interesting than ideal GANs. (interestingness)
nothing says an interface must only help us explore the latent space. Perhaps operations can be added which deliberately take us out of the latent space, or to less probable (and so more surprising) parts of the space of natural images.
GANs are not the only generative models. In a sufficiently powerful generative model, the generalizations discovered by the model may contain ideas going beyond what humans have discovered
Our examples so far have all been based on generative models. But there are some illuminating examples which are not based on generative models. Consider the pix2pix system developed by Isola et al. This system is trained on pairs of images
Unlike our earlier examples, pix2pix is not a generative model. This means it does not have a latent space or a corresponding space of natural images. Instead, there is a neural network, called, confusingly, a generator
Conclusion
At its deepest, interface design means developing the fundamental primitives human beings think and create with. This is a problem whose intellectual genesis goes back to the inventors of the alphabet, of cartography, and of musical notation, as well as modern giants such as Descartes, Playfair, Feynman, Engelbart, and Kay. It is one of the hardest, most important and most fundamental problems humanity grapples with.
We’ve described a third view, in which AIs actually change humanity, helping us invent new cognitive technologies, which expand the range of human thought.
Perhaps one day those cognitive technologies will, in turn, speed up the development of AI, in a virtuous feedback cycle... (bootstrapping)
It would not be a Singularity in machines. Rather, it would be a Singularity in humanity’s range of thought. Of course, this loop is at present extremely speculative.
a much more subjective and difficult-to-measure criterion: is it helping humans think and create in new ways?
This creates difficulties for doing this kind of work, particularly in a research setting. Where should one publish?
We believe that over the next few years a community will emerge which answers these questions. It will run workshops and conferences. It will publish work in venues such as Distill.
The long-term test of success will be the development of tools which are widely used by creators. Are artists using these tools to develop remarkable new styles? Are scientists in other fields using them to develop understanding in ways not otherwise possible? These are great aspirations, and require an approach that builds on conventional AI work, but also incorporates very different norms.
Edited: | Tweet this! | Search Twitter for discussion