Stop. Hey. What’s That Sound?

Share on facebook
Share on twitter
Share on pinterest
Share on linkedin


The following is a short version of an essay written for the anthology Sound Unbound: Sampling Digital Music and Culture, edited by Paul D. Miller aka DJ Spooky that Subliminal Kid, and recently released by MIT Press.


When can a sound be an image? Often, of course, sounds are words. In songs, the musical elements that surround lyrics are often more important than the lyrics themselves. The words of few lyricists resonate from the page without benefit of performance. We accept that sound and language can be woven into a synthetic experience. But what about sound and image?

One characteristic of digital media is that they allow for, even encourage, the combination of diverse media into integrated experiences. In a digital artwork words can lead to virtual architectural spaces, in which gestures may trigger images, which may in turn evoke sounds. Disparate media elements can be stitched together in a multitude of ways, layered upon one another so that it is difficult to separate them into their constituent forms. Just as a song combines music and poetry to make something that is distinct from either alone, digital media give rise to forms that wed: sound and movement; sound and space; sound and image.

Those engaged in this exploration come from across the globe. Their common interest is not specific to any country or cultural tradition. Rather, it tends to coalesce where the technology exists to support it — on university campuses, at art museums, or at institutions dedicated to digital arts, such as ZKM, in Karlsruhe, or the InterCommuniation Center in Tokyo. The desire to pursue works in this emerging medium — which does more with sound than legacy technologies enable us to do — may well be universal.

Sound is inherently physical. It is a vibration, it travels through the body, and evokes a bodily response. With digital technologies we can integrate sound at a fundamental level into artworks that employ other media, opening new ways for us to share our private experience of sound with others.

The tools we have inherited for making music tap into some aspects of how we relate to sound, while wholly ignoring others. Though we rarely think of this, musicians have been restricted in their ability to play with sound — to literally construct cathedrals of sound that you could walk through — by the nature of their instruments. To date, a "cathedral of sound" has been, by necessity, a metaphor. One day that will no longer be the case. The human impulse toward mimesis is inspiring artists to employ emerging technology to create hybrid artistic forms that mirror the encounter of consciousness with the world. In the mind, sound is not so neatly sectioned off from space, touch, words, or image. One bleeds into the next, slipping and sliding in a spiral of associations. Digital media has already begun to reflect qualities of consciousness that had been beyond the means of artists to capture. In coming years, this will only accelerate.


The basics of digital technology invite artists to rethink traditional distinctions between the arts and to strive for something new. Ever since the emergence of computer-based media, engineers and artists have looked for ways to link diverse media together.

Computers play no favorites between media types; from the standpoint of a computer, the basic stuff of the Moonlight Sonata and the Mona Lisa is essentially the same — they are both strings of ones and zeros, ready to be manipulated by whatever programming sequence a code writer chooses to apply to them.

Ivan Sutherland, the great computer graphics pioneer, was perhaps first to grasp the full implications of this state of affairs. He was working on how to use computers to create accurate visual representations. Bits in a database, he reasoned, lent themselves to presentation formats as various as the human imagination could conceive. Yes, data might be formatted to look like a simple page of typewritten text, but it was just as feasible to present it as a fully realized three-dimensional environment. While one series of algorithms might structure the output of a set of data as a two dimensional picture, different algorithms could display that data as a volumetric space. At the tender age of 24, Sutherland proposed building what he called "the ultimate display," an interface to a computer-generated immersive environment that would synthesize all media into a representation of consciousness so convincing that "handcuffs displayed … would be confining, and a bullet displayed … would be fatal." Maybe the potential of virtual worlds got the young Sutherland overexcited, but he was not the last to be made breathless by the prospect of virtual reality.

Sutherland understood that a computer could integrate all media seamlessly into a complex experience, given the appropriate display devices and software. In the process, he hit upon one of the defining insights of our day: data are infinitely malleable.

Artists and theorists have since expanded on this insight. The Austrian artist Peter Wiebel has observed that, unlike traditional forms such as painting or sculpture, digital media are variable and adaptable. "In the computer, information is not stored in enclosed systems, rather it is instantly retrievable and thus freely variable," he writes. This quality gives digital media a dynamic aspect not shared by traditional forms. Computer-based media can be called out of a database at a moment's notice, and adapted to the needs of the particular context in which it appears. Referring to the impact digital technology has had on the visual arts, Weibel wrote that "The image is now constituted by a series of events, sounds, and images made up of separate specific local events generated from within a dynamic system." The emergence of the bit has eliminated the strict separation between image, word, sound, and action. Within digital media, when such a distinction does take place, it will be because the artist has made a deliberate choice to do so.

Sound is information, just like images, words, smells, gestures, or haptic impulses that are sensed through the skin. The shaping of this information for esthetic purposes is the common strategy of the arts. But only since the rise of the computer as a media device have we come to regard art as so fundamentally a class of information, albeit information subject to a specific type of formal arrangement.

In our era, an overt understanding of the ways that information can be structured, manipulated, and shared will be central to how we express ourselves through culture. The computer is our primary tool for working with information. But how this tool effects our relationship to information, and the forms through which we engage with it, is only beginning to be examined. Lev Manovich, the Russian new media theorist now teaching at the University of California at San Diego, has done much to establish a systemized approach to this study. In his book The Language of New Media he writes, "If in physics the world is made of atoms and in genetics it is made of genes, computer programming encapsulates the world according to its own logic. The world is reduced to two kinds of software objects that are complementary to each other — data structures and algorithms." The consequences of this he suggests should be the focus of a new field of "info-esthetics," which would apply the legacy analytic resources of the arts to the subject of computerized information.


Music had been the most transient of arts. It was ephemeral, of a particular place and moment, then gone. It could not be caught, repeated, transported. Without a plot and text to define it, as in theater, music is particularly challenging to discuss with those who have not heard it. While the score provides an approximate transcription of a musical work, it is rough, open to interpretation. Much of a musical work remains outside the score; not only the sections calling for the performer to improvise (which is common), but more importantly the make-or-break details of tone, texture, pacing — details no written notation can capture.

Before recording and broadcast, music was a medium of immediate presence. Late 19th century technology turned the medium on its head. Recordings became the primary way that we encounter music. What had been the most ephemeral aspect of music — the detailed intonation of a fleeting performance — became concrete. You hear the exact same notes broadcast over radio, in stores, on television, again and again. Jimi Hendrix's spontaneous deconstruction of "The Star Spangled Banner," played before a few stragglers at dawn at the end of the Woodstock festival, became the anthem of a generation thanks to the close proximity of a tape deck. Every impulsive swoop and shock of feedback on that recording was as if etched in stone.

Whole libraries of criticism are devoted to the minute inflections of particular performances. They become landmarks in time, representing more than an aural experience — they exhibit a lost way of being in the world. The preserving of old sounds invented a contemporary way to fetishize the past.

The tendency to recombine fragments of media, to play with the pieces as pieces, has of course been a prominent artistic trope in recent decades. It is seen not only in music, but in a great deal of contemporary artwork, much of which emerged in dialogue with the post-structuralist theory of Lacan, Barthes, Foucault, and others. The theater of Richard Foreman is an obvious example, since he has placed the mixing of disparate elements at the center of his productions, beginning with plays like Rhoda In Potatoland from 1975. Foreman's madcap juxtapositions, which go by at a ferocious speed, mirror the barrage we feel from a non-stop flow of media fragments. He arranges these shards of consciousness into elaborate, dynamic constructions that make esthetic sense out of what in life resists literal sense. The fragments, the little pieces, are the raw material from which he builds a poetic whole.

The avant-garde wing of electronic dance music draws from the same impulse, and uses samples to similar effect. Digital media enable this tendency to go much further. Once saved in a database, a recorded sound can be subject to more manipulations than any two turntables and mixer is capable of. A sonic element can be reconstituted on the fly according to a particular algorithm, in an interactive collaboration with the person who hears it. A sound can be linked to other sounds, but also to any form of media. A sound can lead to an image, which can in turn provoke a gesture. A sound and a gesture can be compressed into a single, inseparable event — as in life.

The mix-master sensibility is well suited to the possibilities of databases.


When audio becomes a digital file, it is stripped of its formal specificity — it becomes raw information, preceding form. As a string of ones and zeros, that data is open to a myriad of creative manipulations. It can be directed in real time to produce certain sounds, as determined by an algorithm. Or the bits of an audio file may be accessed from computer memory to recreate the sound of an originating recording. But the same bits can just as easily be read by a software program to generate an image, for example. The formal presentation of any string of bits is determined by the intentions, and capabilities, of the software that processes them. As Lev Manovich has put it, with the computer "media becomes programmable."

This new reality has already become routine, and we give it little thought. For example, most computer programs for making and manipulating audio have visual components — like waveforms and bar graphs — that help the user to control the precise shaping of particular sounds. The same bits that generate sounds through computer speakers will trigger graphical representations on a computer screen that communicate details about volume, pitch, frequency, beats per minute, etc. There are many examples of commercial music making software that produce synched sound and graphics in this way.

Digital artists have also begun to explore the linking of sound and image outputs from a single source of data. In the mid-1990s, the British design team Anti-ROM attracted attention for interactive animations that combined chilly, cerebral abstractions with ambient techno music. Pictures on the screen and MIDI samples would respond together to the clicks of a mouse. This effect was achieved by using the software Macromedia Director, but in recent years artists have expanded on this functionality by writing their own customized programs. The Amsterdam collective NATO has created their own software to generate complex, interactive video images from audio feeds.

It should strike us as remarkable that audio data can have a simultaneous visual representation. But we tend to take it for granted. Why? Because we experience the border between sound and image (or sound and word, or sound and movement) as arbitrary to begin with. In our art, that division has been imposed upon us by our tools. Given the resources, it is conceivable that the line between sound and other media might never have been drawn.

Consider that when Thomas Edison set out to "do for the eye what the phonograph does for the ear," as he put it, his first attempt was to build a "kineto-phonograph" that treated sound and image as inextricably bound. He intended for the device to add moving images as a supplement to the phonographic experience; moving images alone were not intuitively of value to the 19th century sensibility. Edison described the machine this way: "The initial experiments took the form of microscopic pinpoint photographs, placed on a cylindrical shell, corresponding in size to the ordinary phonographic cylinder. These two cylinders were then placed side by side on a shaft, and the sound record was taken as near as possible synchronously with the photographic image, impressed on the sensitive surface of the shell." Edison's materials, ultimately, were not capable of doing the job, and he settled for moving pictures divorced from sound. But as Douglas Kahn has written, "The important facet of this enterprise … was that the world of visual images was to be installed at the size and scale of phonographic inscription."

Kahn also discusses how, prior to Edison's work on the phonograph, he intended to invent a machine that would "fuse speech and writing… [H]e sought to develop a device that could take the phonautographic signatures of vocal sounds and automatically transcribe them into the appropriate letter. This was, in effect, a phonograph where the playback was printing instead of sound." It apparently took much deliberation before Edison could de-link the intuitive interdependence he perceived between forms of expression, as they are experienced in consciousness.

Digital media expand our ability to recombine formal elements in a way that reflects our intuition. With a computer, a string of bits can be expressed simultaneously as sound, image, word, and movement. The limits of this expression lie only in the software we write, or in the hardware we build, to give it shape.

Edison's Kineto-phonograph


For practical and commercial reasons, the software developed for computer media has largely focused on replicating familiar distinctions between disciplines. The media objects these programs produce are meant to fall into familiar categories: images, sounds, shapes, texts, behaviors. It's this easy categorization that leads Lev Manovich to describe computer-based multimedia as having a modular structure. When making a digital media work, Manovich writes, "These [media] objects are assembled into large-scale objects but continue to maintain their separate identities. The objects themselves can be combined into even larger objects — again, without losing their independence." Most off-the-shelf multimedia software, like Macromedia Director, treat discrete media objects as independent pieces (sounds remain sounds, images remain images) while assembling them into complex works. An HTML document is similarly composed of separate, self-contained media elements.

But there are a growing number of computer-based artworks that challenge the traditional division between mediums.

One example is "Mori," the installation by Ken Goldberg and Randall Packer from 1999. Entering "Mori," the visitor passes through a curtain into a dark hallway and walks up an incline, guided only by glowing handrails that increase or decrease in brightness. The hallway turns a corner and leads to a widened space at the end. Under your feet, the floor vibrates, sometimes quite powerfully. The vibrations are created by speakers under the floor, which generate rich, low, quaking sounds — orchestrated rumblings — that rise and fall together with the handrail lights. The effect is of walking into the center of a hushed, meditative space that is part-cave, part-womb. A computer, out of sight, ties the installation's elements together. Through the Internet, the computer receives streaming seismographic data measured continuously from a site near the Hayward Fault, above the University of California, at Berkeley. Using the multimedia software Max, the computer translates this data into two real-time commands — one that controls the lighting, another that sequences the rumbling samples that compose the sound, which then vibrate the floor when played.

The total effect suggests an intimate connection to the physical nature of the universe. The artists offer an interpretive frame through which a profound awareness of the cosmos can be experienced. "Mori" is an example of how new media technologies open avenues for personal expression where they had not been available before. The installation is a real-time communication with the geotectonic activity of the Earth, as expressed through an esthetic conjoining of light, sound, space, and haptic sensations felt through the skin.

Significantly, while each of these media forms is discernable in itself, the originating data — the impulse at the heart of the work — is of none of these. Both the sound and lighting in "Mori" are interpretations of the real-time seismographic data, as controlled by a set of algorithms. The sound is a live mix, determined by algorithms, of samples of low frequency sounds. The audio is designed to vibrate through the listener, and to effect her bodily — not unlike dance music on a disco floor, though "Mori" is a far more delicate, nuanced experience.

The technical linchpin of the piece is the multimedia program Max. Named as an homage to Max Mathews, it was introduced in 1990, and has been updated regularly to keep pace with advances in computer processing. Unlike most other media software, Max was not designed to mimic familiar media forms. Rather, it allows for the direct manipulation of media files in real time through the algorithmic processing of data — it effectively allows the artist to control the data, and output it in any format that he wants. Using Max, a software program that plays music can send information to a program that controls a lighting console, allowing the music program to direct the lights in the room where the music is played. Max is software that recognizes the intrinsic quality of computer-based media — that it is fundamentally nothing but bits — and enables an artist to shape these bits into the media forms most appropriate for achieving his intentions. Max allows for the total abstraction of media objects, because once they have become ones and zeros circulating through Max, it makes no difference what form of media they originally began as; the form the bits take at the end of the process is up to the sole discretion of the artist.

Max points to a future where the purpose of multimedia software will be to blur lines between what were once distinct media.

Interior of Mori art installation

What happens when, rather than treating music as an inviolable art form, we see it instead as a kind of data to be manipulated for esthetic effect? How might this approach expand our notion of personal expression, enabling us to apply esthetics to experiences that had been outside the concerns of art — that had been the domain of science?

F. Richard Moore, a computer music scholar and pioneer who worked with Max Mathews at Bell Labs in the 1960s, has written about one matrix of possibilities that arises where science meets sound:

"Imagine now a computer-based music machine that senses the musical desires of an individual listener. The listener might simply turn a knob one way when the computer plays something the listener likes, the other way when the computer does something less likable. Or, better yet, the computer could sense the listener's responses directly using, say, body temperature, pulse rate, galvanic skin response, pupil size, blood pressure, etc. Imagine what music would sound like that continually adapts itself to your neurophysiological response to it for as long as you wish. Such music might be more addictive than any known drug, or it might cure any of several known medical disorders."

Where here does science end and art begin, or vise versa? Much of what Moore describes (the monitoring of body temperature, pulse rate, etc.) seems to belong to science. But he applies the legacy of esthetic practice to this territory. What do we like or dislike in music? No conclusive answers are possible. What we like at any moment depends on the context; nothing could be more subjective, or in greater flux. But inhabiting this subjectivity is the specialty of artists. Scientists will likely find that, when it comes to unlocking the mysteries of consciousness, the strategies of artists will play an increasingly important role.


No information exists in isolation. Rather, the information we come in contact with, and comprehend, are fragments from a continual flow. We grasp passing particles from this flow, and understand them in a contingent manner. Meaning keeps shifting; our understanding evolves as we access subsequent information, which transposes what we had encountered before and casts it in a changing light.

Digital media make the contingent nature of information explicit, because the technology reduces all formal means of personal expression into raw data ready for manipulation. It not only blurs the lines between distinct media. It invites the further shaping of this data by the person, or group of people, who are accessing it in real time.

Novels, movies, symphonies are not interactive, because they are not capable of incorporating a direct response from the audience in their formal presentation, in real time (efforts to add interactivity to traditional forms are invariably awkward, and regarded as novelties). But because digital media are at their essence bits coursing through software, they can incorporate live response (as determined by the software), and be made to fit the needs of the moment.



It is hard to predict the consequences of using new media technologies. Edison's invention of cinema never anticipated the close-up or montage, for example, which themselves had a profound influence on the social organization of the last century. Only two decades after the popular acceptance of film were both of these key cinematic techniques discovered. I say discovered, rather than invented, because the potential for each was latent in the technology of moving pictures from its earliest days. But it took a shift in awareness for this potential to be recognized, and acted upon.

We are now entering an era in which the tools at our disposal to effect consciousness are increasingly agile. Digital media is opening new avenues to intimate personal expression — through the recombining of media elements, and the blurring of distinctions between traditional mediums in a way that reflects our intuitive engagement with the world. The line where art blurs into science is at the forefront of the discovery of new esthetic experiences. New tools for personal expression provide us with fresh ways of understanding our selves. By using these tools, our sense of self will inevitably be transformed. Technology prompts new modes of subjectivity into being.

What we think of as sound, as music, is going to change, as it changed so drastically in the modern era. Because of their extraordinary difference from what came before, digital media demand our attention. Otherwise, we will not see what it is we are becoming. Our analytical skills for identifying the effects of technology on culture have grown considerably since the days of silent film. If we see the changes, we may well be able to better direct them. After all, we are writing the computer code that is guiding the changes.

As Plato is said to have remarked, citing Damon of Athens, "When the mode of the music changes, the walls of the city shake." If you choose to see it, you will notice that the walls around you are vibrating.



Related Posts

Do NOT follow this link or you will be banned from the site!