Music, too, immerses us in seemingly stable worlds! How can this be, when there is so little of it present at each moment? I will try to explain this by (1) arguing that hearing music is like viewing scenery and (2) by asserting that when we hear good music our minds react in very much the same way they do when we see things.' And make no mistake: I meant to say "good" music! This little theory is not meant to work for any senseless bag of musical tricks, but only for those certain kinds of music that, in their cultural times and places, command attention and approval.
(Edward Fredkin suggested to me the theory that listening to music might exercise some innate map-making mechanism in the brain. When I mentioned the puzzle of music's repetitiousness, he compared it to the way rodents explore new places: first they go one way a little, then back to home. They do it again a few times, then go a little farther. They try small digressions, but frequently return to base. Both people and mice explore new territories that way, making mental maps lest they get lost. Music might portray this building process, or even exercise those very parts of the mind.)
To see the problem in a slightly different way, consider cinema. Contrast a novice's clumsy patched and pasted reels of film with those that transport us to other worlds so artfully composed that our own worlds seem shoddy and malformed. What "hides the seams" to make great films so much less than the sum of their parts–so that we do not see them as mere sequences of scenes? What makes us feel that we are there and part of it when we are in fact immobile in our chairs, helpless to deflect an atom of the projected pattern's predetermined destiny? I will follow this idea a little further, then try to explain why good music is both more and less than sequences of notes.
Our eyes are always flashing sudden flicks of different pictures to our brains, yet none of that saccadic action leads to any sense of change or motion in the world; each thing reposes calmly in its "place"! What makes those objects stay so still while images jump and jerk so? What makes us such innate Copernicans? I will first propose how this illusion works in vision, then in music.
We will find the answer deep within the way the mind regards itself. When speaking of illusion, we assume that someone is being fooled. "I know those lines are straight," I say, "but they look bent to me." Who are those different I's and me's? We are all convinced that somewhere in each person struts a single, central self: atomic and indivisible. (And secretly we hope that it is also indestructible.)
I believe, instead, that inside each mind work many different agents. (The idea of societies of agents [Minsky 1977; 1980a; 1980b] originated in my work with Seymour Papert.) All we really need to know about agents is this: each agent knows what happens to some others, but little of what happens to the rest. It means little to say, "Eloise was unaware of X" unless we say more about which of her mind-agents were uninvolved with X. Thinking consists of making mind-agents work together; the very core of fruitful thought is breaking problems into different kinds of parts and then assigning the parts to the agents that handle them best. {Among our most important agents are those that manage these assignments, for they are the agents that embody what each person knows about what he or she knows. Without these agents we would be helpless, for we would not know what our knowing is for.)
In that division of labor we call 'seeing', I will suppose that a certain mind-agent called Feature-Finder sends messages (about features it finds on the retina) to another agent, Scene-Analyzer. Scene-Analyzer draws conclusions from the messages it gets and sends its own, in turn, to other mind-parts. For instance, Feature-Finder finds and tells about some scraps of edge and texture; then scene analyzer finds and tells that these might fit some bit of shape.
Perhaps those features come from glimpses of a certain real table leg. But knowing such a thing is not for agents at this level; scene-analyzer does not know of any such specific things. All it can do is broadcast something about shape to hosts of other agents who specialize in recognizing special things. Since special things–like tables, words, or dogs– must be involved with memory and learning, there is at least one such agent for every kind of thing this mind has learned to recognize. Thus, we can hope, this message reaches Table-Maker, an agent specialized to recognize evidence that a table is in the field of view. After many such stages, descendants of such messages finally reach Space-Builder, an agent that tries to tell of real things in real space.
Now we can see one reason why perception seems so effortless: while messages from Scene-Analyzer to Table-Maker are based on evidence that Feature-Finder supplied, the messages themselves need not say what feature-finder itself did, or how it did it. Partly this is because it would take scene-analyzer too long to explain all that. In any case, the recipients could make no use of all that information since they are not engineers or psychologists, but just little specialized nerve nets.
Only in the past few centuries have painters learned enough technique and trickery to simulate reality. (Once so informed, they often now choose different goals. Thus Space-Builder, like an ordinary person, knows nothing of how vision works, or about perspective, foveae, or blind spots. We only learn such things in school: millennia of introspection never led to their suspicion, nor did meditation, transcendental or mundane. The mind holds tightly to its secrets not from stinginess or shame, but simply because it does not know them.
Messages, in this scheme, go various ways. Each motion of the eye or head or body makes Feature-Finder start anew, and such motions are responses by muscle-moving agents to messages that Scene-Analyzer sends when it needs more details to resolve ambiguities. Scene-Analyzer itself responds to messages from "higher up." For instance, Space-Builder may have asked, "Is that a table?" of Table-Maker, which replies to itself, "Perhaps, but it should have another leg–there," so it asks scene-analyzer to verify this, and Scene-Analyzer gets the job done by making Eye-Mover look down and to the left. Nor is Scene-Understander autonomous: its questions to Scene-Analyzer are responses to requests from others. There need be no first cause in such a network.
When we look up, we are never afraid that the ground has disappeared—no matter that it has "dis-appeared." This is because Space-Builder remembers all the answers to its questions and never CHANGES any of those answers without reason; moving our eyes or raising our heads provide no cause to exorcise that floor inside our current spatial model of the room. My paper on frame-systems [Minsky 1974] says more about these concepts. Here we need only these few details.
Now, back to our illusions. While Feature-Finder is not instantaneous, it is very, very fast and a highly parallel pattern matcher. Whatever Scene-Analyzer asks, Feature-Finder answers in an eye flick, a mere tenth of a second (or less if we have image buffers). More speed comes from the way in which Space-Builder can often tell itself, via its own high-speed model memory, about what has been seen before. I argue that all this speed is another root of our illusion:
If answers seem to come as soon as questions are asked, they will seem to have been there all along.
The illusion is enhanced in yet another way by '"expectation" or "default." Those agents know good ways to lie and bluff! Aroused by only partial evidence that a table is in view, Table-Maker supplies Space-Builder with fictitious details about some "typical table'" while its servants find out more about the real one! Once so informed, Space-Builder can quickly move and plan ahead, taking some risks but ready to make corrections later. This only works, of course, when prototypes are good, and are rightly activated–that is what intelligence is all about.
As for "awareness" of how all such things are done, there simply is not room for that. Space-Builder is too remote and different to understand how feature-finder does its work of eye fixation. Each part of the mind is unaware of almost all that happens in the others. (That is why we need psychologists; we think we know what happens in our minds because those agents are so facile with "defaults" – but really, we are almost always wrong about such things.) True, each agent needs to know which of its servants can do what, but as to how, that information has no place or use inside those tiny minds inside our minds.
How do both music and vision build things in our minds? Eye motions show us real objects; phrases show us musical objects. We "learn" a room with bodily motions; large musical sections show us musical "places." Walks and climbs move us from room to room; so do transitions between musical sections. Looking back in vision is like recapitulation in music; both give us time, at certain points, to reconfirm or change our conceptions of the whole.
Hearing a theme is like seeing a thing in a room, a section or movement is like a room, and a whole sonata is like an entire building. I do not mean to say that music builds the sorts of things that space-builder does. (That is too naive a comparison of sound and place.) I do mean to say that composers stimulate coherency by engaging the same sorts of inter-agent coordinations that vision uses to produce its illusion of a stable world using, of course, different agents. I think the same is true of talk or writing, the way these very paragraphs make sense– or sense of sense–if any.