This article serves two purposes. It is a discussion about the future of computer interfaces; and it is a means by which I can purge myself of thoughts that have been accumulating on this topic for quite a few years. Even if it fails as intelligent discourse on the first, it will have succeeded in the second. Previously the title was Where Are Computer Interfaces Going? but after writing it I noticed a significant number of predictive passages and decided to be bold and move the "are". Of course now I feel obliged to add a disclaimer. I admit right here, or at least in the next sentence, that I don't know where computer interfaces are going. I don't know.
With that out of the way, I'd like to start, as many interfaces do, with the metaphor. In the 80s and 90s successful interface design and an appropriate metaphor were taken to be nearly synonymous. Although a good metaphor is important, it imposes unnecessary and artificial restrictions. So why is it so important? The best, perhaps only, reason is familiarity. Unfortunately, familiarity comes at a cost: the shorter learning curve can require speed and ability to be sacrificed.
Consider the ubiquitous desktop metaphor. What is more powerful, the abstract construct of a tree, or a single flat surface to place your papers on? Well, a tree is. In fact it is so much more powerful that it is the cornerstone of all modern file systems. Trees are great, they impose an organizational order that is common in natural systems. General graphs are, perhaps, too general. DAGs (Directed Acyclic Graphs) are a good contender; largely because of their acyclicness, but also because they extend trees in a well defined way. I suspect that trees are so useful because we can't move backwards in time. Species speciate, languages extend, and software bloats. To fight these is to fight the increasing entropy of the universe.
Would it be a good idea not to allow folders within folders within folders just because it would be physically cumbersome, and at some point impossible? Probably not. Do icons have a real-world counterpart? Not really. Metaphors should be, and have been, taken only so far.
So what does the future hold? Will interfaces be 3D? Will we be stuck with rectangles forever? I think it's reasonable to say both have their place. People on the 3D side think that we humans see, work, live, and play in 3D. We don't. They say they can't wait until there are fully 3D monitors that you can walk around. Why? Our retinas, as well as birds whose eyes are plastered on the side of their heads, are two-dimensional surfaces. Birds have flatter vision than we do, if not as Euclidean, because they don't have the benefit of the tiny bit of 3D depth perception a predator gets by overlapping images. I've heard graphics programmers explain that their 3D scene was being projected onto a flat 2D screen and so it was no longer really 3D. But consider this: everything you see in this world is like that. It all gets projected onto our flat retinas. We just have really big brains. A 3D scene is constructed in our mind regardless of whether what we're viewing is on a flat computer monitor or in that nether-world known as real life. In fact, most brains do a decent job of scene construction even with one eye closed. From 2D to 3D. Impressive!
People on the 2D side think that we humans see, work, live, and play in 2D. We do, after all, have flat retinas, like playing tennis on flat tennis courts, and eat dinners from flat plates on flat tables. But we don't live in 2D. Our brains are really big. 1.3 litres big. More than enough dendrites, axons and other brain-things to contain a nice 3D representation of the world we live in. Clues to build the scene abound: motion, foreshortening, and the aforementioned depth perception.
The truth is some things are better in 2D and some 3D. Writing a letter? Use a desk. Put a flat piece of paper on it. Want to file that letter away? Wouldn't it be cool if you could just let it hover in some large 3D organizational space? Here's what I think.
It has occurred to me that 2D representations should be considered a feature of an interface. It's beneficial that text documents are lined up nicely for you in a window. If head-or-eye-tracking hardware were more widespread, we'd have software that could compensate for (single) users who are not directly in front of their screens. Imagine looking at your monitor from an angle but still having the text of this article appear flat. That would be a pretty neat feature (on the other hand, it might just look strange and make you sick; hard to tell without trying it).
Because the input is essentially 2D, I predict pure 3D imaging devices will prove to be a novelty even if the enormous bandwidth problems can be solved. A graphics card that draws a 480x480 pixel scene at 60fps would take 8 seconds to update a 480x480x480 cube. Yes, I understand this is a vast simplification. Somehow restricting rendering to the surfaces of an object might help, but it sounds tricky. Regardless, the same or better effect will be achievable by feeding a couple of 2D images to each eye. Technology that takes this approach will be more successful. Devices that project images directly onto the retina seem like a reasonable approach; along with any tracking technology that may go with them.
The next 10 years will be a transitional phase for interface design. 3D rendering technologies already have a stable home in the entertainment, video game, simulation, and design sectors. Although 2D interfaces have dominated everything else, I expect we will start seeing more 3D incursions. Operating systems and applications are beginning to capitalize on what 3D has to offer. The precise nature of how and where 3D can best be incorporated is an open question, and a framework to evaluate these questions seems appropriate. As a rough starting point it seems reasonable to divide the attempts into two broad categories: those that are trying to simulate the physical world and those that prefer more abstract representations. If you'll indulge me, I'd like to call these two approaches, respectively, the "Physical Simulation Approach" (PSA) and the "Abstract Representation Approach" (ARA).
Developers in the PSA camp are taking physical simulations and hanging applications, web sites, movies, and pictures on simulated walls. Simulated desks have functional simulated calculators on them. And, perhaps, there is a simulated sun outside. It's all very familiar and comes with a nice minimal learning curve.
The ARA camp are working on strange visualization techniques to view complexity and patterns in large amounts of data. They have general graphs floating around in space with links joining concepts and words together in arbitrary ways. They have nifty algorithms that filter the salient characteristics of large data sets so you don't get overwhelmed. Their attempts are, by far, much harder to describe with these mere words.
In practice many attempts will combine aspects of both philosophies. I suspect that successful attempts at a 3D interface will have to balance these two extremes in appropriate ways. Objects in a functional 3D interface should probably be represented with models that are familiar, just like the icons on your desktop are often imitations of familiar real-world objects. This is a PSA property. On the other hand, tree-based organizational systems would be well advised. Very much an ARA concept.
Text should always be view-plane aligned, as should images. This is one of those 2D features mentioned earlier. Images and text may be scaled, but they should not present themselves at an angle. Vertical and horizontal edges need to remain vertical and horizontal. Of course, these features are trivially present with your desktop interface as well.
And there's an important lesson: build on the backs of giants. The desktop UI is successful for a reason, not simply because it has a familiar analogue in the physical world, but rather because it behaves in that same useful way that real desks behave. It takes advantage of a well-established ability; spatial memory. You put something down and it stays there.
Useable interfaces need a certain amount of persistence in their structure. Having objects stay where you leave them is one good way to achieve persistence. Placing objects manually, whether on your desktop or in a 3D environment, takes advantage of spatial memory. We can remember, in context, where we've left hundreds of objects (notwithstanding car keys; they get moved around too much). You probably know where your camera is and where the light switches in your home are. By positioning objects manually you can give them some context; perhaps by placing pictures of your family to the left, and panoramic vistas to the right. Contextual clues help you remember.
I've heard the assertion that adding a single extra dimension doesn't buy you much organizational power and that the added navigational complexity isn't worth it. Others think that we need an n-dimensional space to do a good job. Aside from the obvious observation that we seem to exist in a macroscopically three-dimensional world (macroscopically was added just to keep any physicists-who-may-know-better reading) and are therefore good at 3D manipulations, there is evidence that the jump from 2 to 3 dimensions is of a more fundamental significance. If you draw a bunch of dots on a piece of paper you will not be able to draw lines joining the dots in all possible configurations unless the lines cross (given some sufficiently large number of dots. I think 5 might do it). However, once you hit three dimensions, all configurations are possible without crossings. Adding a fourth or fifth doesn't have any further beneficial effect. Admittedly there is some hand-waving going on here; but the result has implications for some possible interface designs; and it points to using three dimensions.
So why haven't interfaces changed much in the last 20 years? One possibility is that the desktop is in some way an optimal representation. More likely, however, is that it is simply a functional representation; no need to change when change takes effort, right? We expect to be able to sit down in front of a new interface and immediately be as productive as we were before. We have all learned to use the desktop and menu-driven interfaces because we haven't had a choice. It has taken time; just as learning to read and write took years when we were younger. Even the keyboard and the mouse, although perhaps easier than writing, have taken time and effort to master. New interfaces will face the same hurdles. Their designs will need tweaking to reduce the learning curve as much as possible. The users of these new interfaces will need the patience to develop efficient usage patterns; and the interfaces themselves will need to be entertaining enough to mitigate the patience required. All these efforts will yield interfaces that are not only more enjoyable, but faster and more useful.
Tristan Grimmer is technical director at Upper Bounds Interactive Inc. Previously a video-game programmer, and much earlier than that a young boy trying to get his Vic-20 to accurately compute Pi, Tristan now spends his time working on Tactile 3D in an attempt to rid the world of rectangles. If you would like to try a 3D interface for yourself, please visit http://www.tactile3d.com