Date: Wed Feb 19 11:25:07 EST 1997 From: Emre Erkal Sorry, I got aware of this quite late.. Hope I'm not too late... EMRE ERKAL a tentative short proposal for speaking trees in the brain opera feb.17/97 1. a. general issues:: On the speaking trees, I would like to see a dynamic achieved through an interplay of two levels of engagement. One would be the layered control of the musical flow/production. The other would be the actual act of music making. This conception can be followed through a series of binary poles: cerebral vs. physical, cognitive vs. performative, pure structure vs. the process, the french vs. the german and the like. The first level, the cerebral level should convey a message about the multi-layered nature of the music produced. After all, the theory of Society of the Mind suggests that various assemblages of agents at various levels of complexity, performing different tasks would in result, perform at higher grounds. Assemblage here, might suggest unions of people performing, or unions of output of people controlling the flow of music; or even a union of one cellist, and a smaller assemblage, and a synthesizer would be an assemblege. The second level should be totally experiential. Looking at the successful sing-a-stable-tone tree, it is necessary to pursue an utterly physical act. And the system should be immediate in responding. b. in totality:: A tree will be composed of a visual display, a set of audiophones, a microphone and a performative engine (could be named better!). A performative engine will require the user to engage with the system in a profoundly physical way. Some examples I have thought of are a pump, a vibrating arm/joystick, a trackball, acts involving voice would also be fascinating to incorporate. These are described below. Overall, the assembly of trees will be a dynamic collective playground with the help of a visual interface. The users will engage with the system in their distinct physical jargons. However, the users will be made aware of the fact that they are playing on the same ground with some other people. Together they will perform an act: a highly dynamical and responsive one. Analogous act in the physical world could be the carrying/balancing of a huge object by many people. Each person carries a little portion of the weight and balance but overall, it is a highly complex act in which one starts discovering minute acts of balancing: minor to him/her but crucial to the whole. Therefore, the users will be able to understand the collective nature of the complex dynamics. This collective act will be comprised of many subjective (extremely subjective, due to non-symmetry of the performative acts) elements of control, however the overall control will be one related to the whole. Graphical interface (not a necessarily complicated and detailed one) will convey this information with regard to the whole. Together they may control some parameters on other Brain Opera instruments: which set of musical segments to be played for example. A subgroup of the trees may control the parameters of the play another subgroup is working on. Two trees may control the parameters of a dynamic interchanging tone (generated by Kurzweil). Three other trees may control a set of sound sequences in terms of their volumes for example, and one of these sound sequences might be the one coming from the previous two trees, in real time! In totality, the trees will convey the message of interlinking and cascade nature of the assemblages formed. Assemblages will not be static, they will jump up and down, form and disperse and reform in terms of trees involved in the groupings. The layers of control, the set of parameters will change. All this information will be conveyed by the visual interface. I hold the belief that with a wise and economical design, the interface can be subtly simple. Radar displays for example, not only convey the location of "others" but also because it retrieves information in bursts, it will help to take the pace down to a more relaxed plateau. Totality and layers of control is the basic idea of Society of the Mind. Trees can be a dynamic illustration of it. c. individually:: Individually each tree will function as part of the collective interplay as well as a personal discovery of physical engagement with music/sound. On one side it will be a hub in a highly cerebral and cognitive understanding of interplay, while on the other side, the very simple discovery of a landscape between physical engagement and sound. 2. a. a detailed tree:: The tree I will describe is the pumping tree, although I had thought about some others also. An instrument which will simply measure/record the instances of pump-ins by the user is going to be added. Upon arrival, the user is required to record a sound; a sentence, a word, a cough for example (let me call it a "keyword"). This is a strategic way to elicit words from the users: making it part of the ritual. Then, this sound will be used by the tree for the user to calibrate/align his/her own performance. In the case of the pumping-tree, the very word that the user entered will be repeated at each pump-in. At the same time, on the visual display, there will be a balancing act in which the user has to perform his/her part. Therefore s/he will immediately be engaged in pumping; the only means to perform, in rhythms dictated by the collective interplay (balancing act). But this physical act of pumping will repeat his "keyword", his own voice, altered and controlled in terms of its sonic parameters by other peoples acts. It will be repeated with varying intensity, flange, cut-and-spliced parts and the like. One trick the system may do in order to enhance collectivity is switching of the keywords between users. Each user will be operating through the use of a keyword. These words can be added, concatenated, shifted and perpetuated at various moments in the Brain Opera. The users may join and leave the trees any time, the system is flexible enough to continue with any number of users. Speaking trees can be the quite influential in the Brain Opera. Other lobby instruments can be aligned by these trees in terms of their parameters. Information flow can be in two directions: not only trees can effect the Brain Opera, various parts (especially the main performance) can effect the collective interplay in the trees. Internet can also be figured out to have a role, however the immediacy of the physical engagement is crucial for the trees. b. design issues:: There is the task of devising the visual interface. It has to be designed subtly so as for economy of information. Each performing act and its ways of engaging has to be figured out. That is the physical act and a responsive algorithm to figure out the The visual interface then has to be such that individual performances are meaningful as well as the collective act. This is the main design paradigm. In specifics, there are many issues of layer-definitions: how to form assemblages and how to select parameters of control. c. technical issues:: Multimedia programming. Graphics interfacing. Physical construction of performative engines. I want to learn about the musical and sound parts, that is the reason I am taking the class. As a designer I can say that I am strong in conceptual design (whatever that is, something I've got education in!), somewhat graphics, and cognition issues (if there are any). For myself, I would not want to be immersed in programming per se. But programming is okay. I see the distribution of work as a dynamic process as it will shape through the course of the work, if we are quick to respond to it. ================================== Date: Tue Feb 18 16:32:30 EST 1997 From: Takuji Imai Hi, everyone. Sorry for the delay. Overview $B!!(JMy basic idea is to play and decompose a famous piece of music, such as a part of Beethoven's symphony, by everyone attends the Brain Opera. Each one plays his/her role with a Speaking Tree, and doesn't know how her/his contribution influences the whole music before it's played. This is based on a quite simple procedure, but can produce a very interesting effect. Tentative story $B!!!!(JWhen a participant comes to a Speaking Tree, first it plays a short phrase from a famous piece of music. After hearing it, she/he is instructed to hum or sing it, and record it. Each phrase is a certain part of a famous piece, and once all of them recorded by participants, the whole piece can be played by their voices. Each participant may be encouraged to arrange the phrase. They can try as many as they want, until they are satisfied with their recording. After this, participant is given a picture of whole piece of music. This picture shows the structure of the piece, and location of the phrase played by the participant. The picture consists of several layers. Each layer represents a instrument, for instance a violin, and consists of sequence of blocks which mean phrases played by the instrument. Each block is colored based on some principles, such as the same phrases have the same color. The point is that the colors should be chosen to express the musical ideas such as harmony. The picture itself should seem like a piece of art, like a music mapped in the color domain. In the picture, the block, which corresponds the phrase played by the participant, blinks. The participant can move this block anywhere in the same layer. But she/he has to consider the harmony of color. They should put their block in order to make the whole picture seems to be beautiful. This procedure creates the decomposed version of original music. The information collected will be used for the performance. The performance consists of 3 parts. First, the whole (or part of) music played by the real instruments or MIDI instruments, next it's played by the participants' voice, and finally the decomposed piece is played by voices. The transition between each parts should be blurred by some techniques such as gradual substitution of instruments by voices. Underlying concept $B!!(J These ideas are inspired by the notion of the Society of Mind. Each phrase corresponds to an primitive agent. Like an agent, each phrase does a very simple thing and has no idea about whole music. But once they are arranged in a proper way, they can create great music. Each participant plays a role of an agent by singing or humming a phrase. But in this context, there's a drawback. This scheme presumes a central cause, like the "Self" in a mind, which is the composer. So I proposed the decomposition part to weaken this notion. Design of Speaking Trees I think this idea will be easily implemented with current infrastructure. One thing should be noticed is the recording duration. Each phrases has to be recorded in a proper amount of time. To provide some visual and auditory cues of timing will help the participants notice the amount of time. Each Speaking Tree may have the same functions, or may represent different instruments. If each of them has the role of certain instrument, they may be divided into 2 groups, one is for men, another is for women. The idea is that the sound of some instruments, such as violin, sound like women's voice, and others sound like men's. My role in this project The critical part of this project is how to express a piece of music by color coordination. I would like to address this part of work. Also I'm interested in the design of the user interface. To do this, I have to learn some music theories and some programming techniques, because I'm not a musician or a programmer. I have some technical background, but my current specialty is journalism. I'm not sure how I can apply it to this project. I need several skilled programmers who know Speaking Trees well, and musicians who give me some advices to achieve my goal. ================================== Date: Mon, 17 Feb 1997 17:59:47 EST From: Veronica Lopez My proposal has to do with the kind of things you hear and the things you speak. For instance, in a speaking tree, a person could hear different kind of sounds (chosen in a random way from the sound library archieves)(while seeing images not neccessairly of Minsky) and answer questions related to these sounds (or just one question), no matter the language you speak. This should be done in a similar way as the answering machine system (after a certain signal), so when you finish you click the mouse. These answers are processed by effects as reverb, chorus, delay, flanger, phaser, etc.(also can be chosen using random). Then they are combined whith the answers of the other speaking trees, so you can hear these conbination of answers (the ones that you gave and the ones that the other people gave)(these anwers can be words, questions, feelings, etc.) before your speaking tree contact ends. Once it ends, your answers (or words) already form a part of the data in the computers, beeing able to trigger them in some way from certain lobby experiences abailable in the brain opera (rythm trees, etc.). About the physical design of the speaking trees, I think it's good. I would improve the ear phone system so you can hear the music, questions and yourself better).(better earphones, less background noise). In relation to the background noise, I think there should be a better distribution of the rest of the lobby experiences. In the same space, there could be a way to isolate these experiences a little bit more, (maybe an example is the "steering wheel experience"). I know we should concentrate on the speaking trees, but the other experiences (the rythm trees, the "touch screens", and the one that sensors your hand movements by a circuit on your feet) could be located in a direction so that the sound aims away from the other experiences, reaching at the same time an absorbing material so it dies off fast, not neccessairly meaning to hide the experience far away, only better oriented. This I think (I don't know if it would be a great help) can lower the background noise that sometimes (not always) makes you think you are in the middle of the street and a lot of car alarms are being activated at the same time (and your mind gets stressed). This in order to listen well what the speaking tree is saying and one to be able to concentrate and answer (the mic should recieve just your voice and not the backgraund noise). I would like to learn the softwares that are being used, the programming (if it's used any), the sampling systems, and all that has to do with internet. Also, I would like to work with people that want to learn the same things as me so we can go step by step. See you and thanks very much, Gonzalo Herrera. ================================== Date: Mon Feb 17 17:44:41 EST 1997 From: Catalina Buttz Hi everyone. My contribution can be found at the following web site: http://web.mit.edu/cmbuttz/www/bo-class_1.html I may include some graphics later on tonight to make it more clear. - Catalina ================================== Date: Mon Feb 17 17:40:39 EST 1997 From: Nicolas ESTRADA 12 speaking-tree project: Well, the way I look at it, the strength and genius of the brain opera seems to be the ability to bestow musical creativity and a desire to compose music onto any individual, whether he/she posseses a musical background or not. Yet, as I look at the speaking trees, all they appear to accomplish is the capture of empty phrases and answers and insert them later on into a musical performance of some sort. As counterproductive as it may seem to all of the previous work done with the speaking trees, why not have people step into one of the many networked trees, assume an alterinate maybe allegoric persona, and enter a beautiful virtual world of sound and sight extraordinaire! I might enter this world, select the angel character, and from there choose the appropriate selection of minimalist yet magical excerpts or samples of music, say a vast list of gregorian or lithurgical chants of some sort. It would only be a VERY minimalist and short cycle of music, because I would became one voice among many, as if in a chorus. The music generated might be similar to a sequencer: every small piece of music would be indexed so that other channels/voices that are to played at the same time wouldn't require any form of synchronizing/beat-matching; every voice would smoothy mix into one another as layers folded onto each other. Additionally, every different persona would have a different style of music appropriate to him/herself: for instance, much like the example of the angel and lithurgical chant, one might have a vampire complete with a selection of dark wave, gothic or even melancholic opera... The music created by such a technique is very unique and very rich in content as it is composed of many voices all layered into onto each other. Now here is where it gets tricky: suppose a collection of rooms created virtually using open inventor for instance. Characters would slowly sway from one side of the room to the other, virtually "emanating" the music they had selected.. There could be some form of active panning where if one voice were behind another, the composition would be acoutiscally different. Suppose also there exist rooms where some form of DSP was in effect such as a room which flanges and creates a concert hall envelope around each and every voice within the room, or perhaps a room that phases in and out the music while creating an short term echo, the possiblities are endless... The internet as well can play many roles into this design: since open inventor is one of the de facto standards of VRML, could it possible to enable remote users from the web to login and assume persona's and aliases of their own? Aside from CPU and RAM constraints, why not? Requirements wise, I know SGI's inventor libraries pretty well, but there is a lot of dsp code that I would be problematic... as well as maybe some networking code, unless sgi's are used in which case I'm cool... Everybody could participate composing a little minimalist sequenced music, I mean, soundz like fun... Individualist tree project Why not a simple marvin tree with dsp capabilities for the user? Most people don't know what a flanger or a reverb is, why not show them? It would require some sort of mouse interface but could probably done with existing software... ================================== Date: Mon Feb 17 16:24:53 EST 1997 From: Daniel Dreilinger 1. Brain MUD Have you ever wondered what it would sound like if a Harmonic Driver was jamming with a Melody Easel? How about three Rhythm trees playing together in a subway station? This becomes possible if we connect the Talking Trees to each other, the Internet, the live performance, and the rest of the Mind Forest elements in a virtual environment called the Brain MUD. MUDs connect a community of people with shared interests and goals in a virtual space. Typically the interface is textual or graphical in nature; for this project, I propose a sound oriented interface. Using the Brain MUD, audience members interact with each other and with people participating via the Internet, assemble different combinations of the Hyper-instruments (while they are being played by other audience members in the Lobby) in virtual rooms with varying acoustics, and listen and contribute to a concurrent Brain Opera performance. The proposed model allows for other tree purposes to be seamlessly incorporated. For example, if you wander into the Minsky zone, the tree goes into the Minsky interview mode; when you enter the video-conferencing room, you join a video conference with other occupants of the room; when you hop into the shower, the tree becomes a Singing Tree. Figure 1 (http://www.media.mit.edu/~daniel/classes/brain/mud.gif) shows a simplified map of the virtual world for audience participants to explore. Example locations of tree users, Internet users, and instruments are listed---these entities can move (or be moved) around the virtual environment. The state space of the virtual environment includes the following information: 1. Several acoustic environments, such as practice rooms, a large recital hall, a recording studio, a passage way to the concurrent Brain Opera performance, and perhaps some other unusual spaces, like a subway station, a stairwell, the Media Lab atrium, etc. 2. The current position of all speaking tree users, who are free to move about the environment. [note: some type of directional controller will need to be added.] We can also track the rotational orientation of each tree participants head with special headphones, and use this information to enhance the stereo imaging. 3. The virtual location of each of the Hyper-instruments (Harmonic Driving, Rhythm Trees, Melody Easels, Gesture Walls, etc.) in the virtual space---these can be 'picked up' and 'moved around' the space by audience participants at the Speaking Trees. 4. The virtual world location of Internet participants. The virtual environment information is used to create a uniquely rendered sound sculpture for each of the Speaking Tree users. As they move throughout the space, different instruments will be louder and softer, appearing in different spatial locations in the stereo (or 3D!) field, and will be rendered with different qualities of reverberation, depending on the acoustical properties of the room they're in. When there is a live performance going on, participants can virtually wander into the performance, and hear what the live audience hears. If a Brain MUD user wishes to contribute to the live performance, they can make a tape in the virtual recording studio and deliver it to the musicians in the live performance. Internet users could be included in a number of ways. They could wander around the virtual world via their home computer and interact with other Internet users as well as the people in the Lobby. If we added some extra MIDI synthesizers and a Real Audio server, remote participants could enjoy the full sonic experience of the on-site users. technical --------- The challenges here are the audio rendering and the visual model of the environment. The audio could be processed in the MIDI domain, using approximations of the sounds generated by the Hyper-instruments on the individual trees' local synthesizers and digital effects units. Alternatively, the audio output of all the Hyper-instruments could be sent to a MIDI addressable mixer which is continually adjusted to produced a unique output for each listener as they wander throughout the system. Ideally, the LCD displays a map (or 3-D) view of the world, including the locations of instruments and other audience members. There is existing software designed for creating and browsing virtual worlds---it might be possible to adapt something that already exists. walk through ------------ Upon slipping on the Speaking Tree headphones, you hear a Harmonic Driving--Melody Easel duet. A look at the LCD display reveals that you are in the Recording Studio, listening in on a session that will soon be sent to the live Brain Opera performance next door. After a while you start to wonder how the Melody Easel would sound if accompanied by a Rhythm Tree section, one of which is currently in a practice room. On the way to the practice room, you bump into some Internet guests in the hallway. Using the microphone, you tell them about your plans to put together a concert and invite them to come along. There is a Rhythm Tree in the practice room, so you move it to the Concert Hall, which is presently occupied by a pair of Melody Easels. After you've heard enough of your ensemble, you head for the Marvin room, where you tell him your deepest thoughts about music, one-on-one. 2. Memory Browser The Brain Opera has accumulated a large corpus of audio and visual 'memories'. Sounds (and videos, depending on some of the other project proposals) are accumulated in a number of different forms: answers to Marvin Minsky's questions, sounds gathered by the Sound Crawler web robot, recordings of visitors playing with the various Hyper-instruments, live Brain Opera performances, and Internet submissions from the general public. Presently, the interface is "write only"---once sounds are added to the Brain Opera, there is no easy way for audience members to recall and listen to these sounds. The Memory Browser tree will provide a simple, fun, and engaging interface for browsing the Brain Opera's vast memory. These trees will be retrofitted with a set of four fish sensors, arranged around the screen and a pair of controls labeled 'forward' and 'back'. Discrete metal contacts are added to the headphones to to make the audience member conduct the RF signal. Similar to the Gesture Wall and the Sensor Chair, four fish sensors will be used to create a physical sound navigation space. At any moment in time, the fish-space is mapped to a set of just a few to hundreds of sounds. As the listener makes hand gestures through the space different sound samples fade in and out of the foreground. Videos are played when appropriate. All of the sound library content is organized in a hierarchy---the top level has just a few major subdivisions: Minsky Q&A, Internet sounds, Past Concert clips, and Real Time (or ???). If the forward button is pressed while the Minsky region is active, for example, the fish-space becomes mapped to the collection of Minsky audio/video question clips. At this level, the space is mapped in approximately a 10x10 grid, each point corresponding to one of the 100 questions he asks. When the users hand passes through any one of these regions, the video segment of Marvin is instantly queued both on the LCD and the headphones. Neighboring questions are played at a substantially reduced volume level. If the forward control is touched while a particular question is playing, the fish-space is then mapped to the myriad audience answers recorded throughout the Brain Opera history. Sound classification algorithms could be used to arrange the sounds on a continuum, male--female in one direction, and calm--agitated in the other, for example. At this level the audience member could also use the microphone to add their own response. Continuing with the example, if the forward button is pressed while one of the answers is selected, the fish space might become hundreds of subtle variations on the selected answer, with the two dimensions of fish space mapped to varying degrees of digital effects, such as pitch shift, echo, etc. Figure 2 (http://www.media.mit.edu/~daniel/classes/brain/memory.gif) illustrates the hierarchical space. Adapting this retrieval interface the massive collection of sounds gathered by the sound seeking Web robot will be a great challenge, but a very powerful tool if we can pull it off. Keywords found on the pages accompanying the sounds could serve as one means for organizing the sounds, classification algorithms would be another. It might be feasible to map the fish space to a real-time Internet robot. Moving your hand in different areas would cause the web traversal to follow various hyperlinks and play any sounds encountered along the way. The rest of the Lobby experiences could be mapped to yet another part of the hierarchy. Thus gesturing through the fish-space solos the various Hyper-Instruments in your headphones as they are being played by the other audience members. Zooming in on any particular instrument would allow you to modulate your version of that instrument's output. To realize this project, I would need to learn more about information retrieval as it specifically pertains to sounds. Alternatives to the hierarchical model should be investigated. I suspect this project could be done with Rogus and/or Director, but I'm not familiar with either (yet). This proposal has largely ignored the LCD display. It would be great to work with a visually oriented person with ideas on how to graphically render the collections of sounds. ================================== Date: Mon Feb 17 16:24:13 EST 1997 From: Alex Westner PROJECT PROPOSAL FOR A SYSTEM OF 12 SPEAKING TREES It is difficult to create an experience that can *effectively* involve all twelve speaking trees, functioning together as a system. I propose that all of the trees should have a common structure, but will run different algorithms common to a few "subgroups" amongst the entire system. Allowing the trees to interact with each other in small groups not only simplifies the burden on the technology, but it is more analogous to Minsky's theories of the mind. In my opinion, the chief goal of the speaking trees is to explain and, more importantly, *create* the libretto of the Brain Opera. The trees are unique in that they directly record participants' voices. To stay true to an original intention of the Brain Opera -- to allow people coming in off the street to create "music" -- these trees can play a vital role to openly capture the thoughts and opinions of the entire Brain Opera experience. Privacy is a unique social feature of the physical structure of the speaking tree. The hope here is that the participant will be less inhibited to "open-up" to the instrument. The current implementation of the speaking tree, however, is a completely blind and somewhat awkward call and response session between Dr. Marvin Minsky and Joe Normal. I propose that video cameras be mounted inside the trees, and that small groups of trees will take advantage of the network to allow two or three participants to simultaneously interact with each other, as well as with Minsky or with any other audio/video clips that can be played in the tree. My hope is that people will be more comfortable in a discourse between live human participants, encouraging a more meaningful and interesting dialogue relevant to the Brain Opera experience. Several small groups of these trees will exist in the "system." One such group of three trees will allow for an open discussion in a pseudo-videoconference style. Since the camera shot of the participant in the tree will be a close-up, there won't be much background disturbance in the image, and it may be effective to crossfade the images received from the three trees onto one rendered image that will be displayed on the LCD screen inside the tree. In this environment, participants can feel more involved and more intimate with the tree. A few groups of two trees can include two people responding to Minsky's comments and questions, or other things such as: past conversations recorded from the trees, explanations of the Brain Opera from Tod and his students, anecdotes from past Brain Opera performances, music videos, live video feeds, etc. Finally, individual trees can be set up to play random recordings of recent interactions that took place in the other trees and, using the mouse button, the participant can "vote" for clips that should be used in a Brain Opera performance. How will these clips be used in a performance? Since, in some of the proposed tree groups, we have knowledge of the content of the audio and video that the participants are responding to, we can group all of the recordings according to pre-determined groups based on the subject matter of the clip that is presented. It will then be intuitive for the performers to trigger a random recording from a desired subject that may be appropriate to the current context of the piece. Continuing in this thread, there can exist another "random" content group which will contain clips from the proposed "three-person discourse" trees. One of the biggest complaints about past Brain Opera performances is that participants were unable to recognize their contributions. The speaking trees are vital in this sense. I propose that the audio recordings from the trees be used extensively during performances. By including more audio (and video!) recordings from the trees, the audience will leave with a much more pleasant and fulfilling experience! ;-)# some technical considerations: video camera, streaming audio and video, image processing, information storage/retrieval PROPOSAL FOR AN INDIVIDUAL TREE Being a noise musician, the idea of having a microphone connected to a responsive computer, capable of doing signal processing and audio analysis is very exciting. However, if I were thrown into a speaking tree I wouldn't have any idea what I should say or do. How is this thing going to inspire me to make a sound? If the system were to ask me questions that I was supposed to respond to, I would know that it really doesn't care what it is I have to say -- it'll just go on asking me questions until I get bored and leave. Minsky says, "What is music?" My response: "Do you like my shoes, Marvin?" "When does a speaking voice become a song?" "My favorite cereal is Frosted Flakes." "What do you like about music?" "F- you, Marvin! Yeah!" This call-and-response, point-and-click, question-and-answer type of interface does not work. Computers don't understand spoken language and everyone from engineers to five-year-olds to god-fearing grandmothers will become frustrated with this after experiencing a couple of these questions or other stimuli to get you to respond in some controlled fashion. My proposal, then, for this individual tree is simple and caters towards an open and willing participant who *wants* to contribute to the Brain Opera. When a person steps into the tree and puts on the headphones, nothing happens on the LCD screen and no sound is heard through the headphones. The tree will then be "triggered" when the person says something like, "Hello? Does this thing work?" (At least this is something that *I* would probably do.) The person's voice will be fed back through the headphones so he/she will feel that the system is at least "turned on." After the computer hears the participant utter these first few sentences, it will start to add some strange effects (like reverb or flange) to the voice and an abstract image will appear on the LCD screen. Now the participant knows that the tree is truly responding to them. The system should now adapt to the input received in the microphone, i.e. as the person's voice grows louder, even more effect will be applied, and the image on the screen changes more violently. The idea is to slowly ease the participant into the tree, giving him/her the impression of interactivity and understanding, while developing a mutual respect between the tree and the participant. If the system "feels" that the participant is enjoying the experience (which may simply be detected by how much sound he/she is putting through the microphone over a short duration in time, i.e. 15 seconds), then the following statement could be made both on the screen and in the headphones (to be said in an excited and convincing radio-announcer type of voice): "Any words you speak, and any sound you make may be used in the next performance of the Brain Opera. You are creating the Brain Opera! Is that exciting?!" If, at this point, the level from the microphone goes through the roof, then the tree has succeeded in hooking in the participant. Little or no response, however, indicates that this participant is more of an observer, so perhaps this is a good time for Tod or someone else to pop on the LCD screen and talk a little bit about the Brain Opera or some of the instruments. Consider the first case: the tree is now interacting with an excited participant. It is important not to drop the intensity level of the experience; the tree must maintain this open-ended environment and allow the participant to contribute creatively. To keep the mood flowing, I propose that continuous background music be played through the headphones. The LCD screen will display abstract images that respond to the participant's voice, as described earlier. The tree will now send some subtle, half-whispered buzzwords through the headphones and the participant may respond. The buzzwords are to get the participant thinking about music, sound, interactive art, the Brain Opera, etc., and will be voiced every 5-10 seconds, allowing the participant to interrupt with a response. If a response is detected, the system will not speak the next word until the participant is finished. (This can be detected by measuring signal power over time.) Now the tree can pose some questions (keep in mind the background music is still playing, uninterrupted, and the abstract image on the LCD screen is still set to be responsive to the microphone input) to the participant in a "comfortable" voice. "How do you feel about music?" "Are you smiling?" "What does the Brain Opera mean to you?" Again, the participant may respond at will. All responses will be sent to the network to be stored and reviewed (perhaps by other speaking tree participants) for use in the next Brain Opera performance. Again, as I described in my first proposal, it is easy to group responses into categories based on the content of the buzzwords and questions posed to the participants. There is also, as before, a random subject in this tree: during the first 15-30 seconds of the experience. However, I expect (hope) that some participants might be making some weird noises to feel out the system response. These noises can be analyzed off-line by a classification system and then sorted out and grouped into timbre categories, i.e. scratchy, siren-like, spoken, etc. Technical issues: It seems to me that Director would be a good program to use for this kind of thing, except I have never used this program before... I don't know anything about the image processing that would be necessary for this project. I would *REALLY* like to avoid using a mouse or any other hands-on type of device. ================================== Date: Mon Feb 17 05:18:05 EST 1997 From: Jeff Norris I’m going to give this another try- it seems you have to include line feeds.. Ok, everybody, here’s my idea-- It’s designed with four trees in mind, but I’m sure it’d work for all twelve, or just one. If you’re in a hurry and still want to know what it’s all about, try to read the Overview, Look & Feel, and Connections to the Rest of the Brain Opera sections. The Collaborative Drum Circle: The speaking trees are an extremely fertile environment for collaborative music composition. A creative use of the trees would separate the users from the other people in the lobby and bring them together in a space where they can work together to make an exciting visual and musical composition. At the same time, the trees have to guide their users in their creation, or the result will likely be cacophony. A cooperative drum circle would give the users great freedom to make interesting compositions while limiting them enough to ensure their success. General Overview: The twelve speaking trees would be divided into three groups of four trees, each representing a drum circle. Each member of each drum circle would be given a drum pattern when they began to use the system. At any time, a user can replace the sample that his drum is currently using with a different prerecorded drum sound or a recording of their voice taken from the microphone attached to the tree. In addition, each user would be allowed to select a “visual instrument,” an image from a large catalog. This image would be “strobed” on all of the screens whenever his instrument plays. Periodically, the system would change the patterns that each user was assigned to create a varied and interesting composition. The end result would involve the users in a rhythmic composition not unlike the improvisational drum choruses popular in traditional African music, incorporating both traditional drum sounds and spoken words. The visual component of the experience would be reminiscent of a sort of MTV music video on the fly, incorporating pregenerated background graphics with the strobed images chosen by the users. The visual and musical aspects of the performance could be shared with everyone attending the Brain Opera through monitors and speakers in the lobby and Mind Forest. Look and Feel: To describe the look and feel of this use of the speaking trees, let me describe a typical user’s experience. A user activates his speaking tree, and is presented with a screen with four large, partially overlapping squares of different colors in the middle. The border color of his screen matches one of the colored boxes, and a line is drawn quickly from the border to the box, indicating that this box represents his contribution to the performance. Various images are fading in and out in the background, depicting major historic events, abstract artwork, computer generated graphics, etc.. The user hears a repetitive, 1 bar drum pattern being played, and recognizes both drum sounds and human- spoken words and syllables in the instruments. At the bottom of the screen, a white vertical line is moving horizontally across a horizontal square wave. Each time the vertical line touches one of the peaks on the square wave, a particular drum sound is played, an image is displayed in his colored square, and the border of his screen becomes brighter for a moment. When the vertical line reaches the right side of the waveform, it returns to the left side and begins moving across it again. This wave represents the drum pattern that this user has been given, but is not important that all users make that connection. Currently, on the right side of the screen is an image of a large drum, indicating what instrument the user currently playing. He moves his mouse-device to the right, and a box in his color is drawn around the instrument, indicating his option to change his instrument. When he moves the mouse up, the image changes to depict a different drum, which has no box drawn around it. The user presses the button, and a box is drawn around this instrument. Suddenly, he hears a new sound being played when his drum was previously being played. His pattern has stayed the same, but he has, in effect, picked up a new instrument and kept playing. In addition to the pictures of different drums, he sees a picture of a microphone. When he presses the button on this picture, his microphone activates and he is prompted to say a short, one syllable word or sound. He says “cat.” When he does this, his instrument becomes the sound he spoke into the speaking tree. On the left side of the screen is an image of an Egyptian statue, which is the image being strobed in his box in the center of the screen. The user moves his mouse to the left, and a box is drawn around this picture in his color, indicating his option to change his visual instrument. When he moves his mouse up, the image changes to a picture of an exotic bird, a shattering wine glass, a white rose, a light bulb, and so forth, on through a seemingly endless catalog of images. The user selects a picture of a mountain, and focuses his attention on the center of the screen. The four boxes are flickering quickly between solid colors and various instruments, each in time with a different drum line in the performance, while other images are fading slowing in and out in the background. The user notices that the yellow performer is using the image of a lion, and has spoken the word “boy” to be his instrument. The yellow performer’s pattern seems to be playing a quick pattern on the offbeats while our user’s pattern is a slower, more regular beat, intended to be the downbeats of the performance. Suddenly, the performance goes silent for an instant and the colored boxes in the center rotate around the center of the screen. Now, the blue performer is in the position that our user previously held and is playing the downbeat pattern with his instrument, which is a low timba. Our user is now in the position previously held by the yellow performer, and the word “cat” is being played rapidly on the offbeat pattern. After a couple of minutes, the colors rotate twice more, and then all of their patterns change entirely, producing a new song altogether. The user notices that no one is standing at the blue performer’s speaking tree, but the computer seems to be changing the visual and musical instruments for the blue performer periodically anyway. Technical overview, audio component: In order to accomplish the audio component of the cooperative drum circle with the speaking trees, we need a central audio server for each set of four speaking trees, but the existence of this server need not be revealed to the users. *All four users’ headphones* are actually driven by this central server, which ensures that the audio of the performance is properly synched. Thus, each speaking tree simply becomes an interface device, as far as the audio component of the performance is concerned- no speaking tree actually plays a single note of audio. This is very important, because the computing demands that we are going to place on the trees in order to produce the visual aspect of the performance would prevent them from properly playing synched audio. Each tree only needs to inform the central server when its user wishes to change instruments, and transmits the data for a recorded sample when the user chooses to change his instrument to his voice. The samples that the users record would be strictly limited to very short sounds, both to ensure a good percussive sound and to cut down on transmissions to the central server. Since the central server is playing the performance to all four listeners through the headphones, the samples for the user’s voices don’t have to be transmitted back out to each tree, which gives the trees the cycles they’re going to need in order to keep up with the visual component of the performance. Technical overview, visual component: The central server will send only 3 types of signals to each user: notification that one of the performers has hit their “drum,” notification that one of the performers has changed their image, and notification that the user’s pattern is being changed. When the tree receives notification of a drum hit, which the central server would send every time it played a drum note, it quickly displays and hides the correct user’s image in their colored box. The same set of images is on each tree, so notification that a performer has changed his image simply means that a different image is loaded and displayed for further drum hits by that user. Certainly, quickly displaying an image and erasing it on demand is not an easy coding task. It may be necessary to divide the video palette into 4 sections and limit each user’s image to less colors. Then the code would simply assign the performer’s solid color to all of his palette locations when that performer’s drum isn’t being hit, and the correct colors to his palette locations when he does hit his drum. I am nearly certain that this operation could be done quickly enough to produce the MTV “strobing” effect that I’m after, and we may be able to do even better. When the server decides that it is time to change all of the users’ patterns, a short delay is acceptable while it informs each tree what their new waveform is. Users will likely accept this as a pause between songs. Connections to the rest of the Brain Opera: In order to include people not actually using the speaking trees, screens could be placed all over the lobby and Mind Forest depicting the visual performance going on in each of the speaking tree drum circles. The visual component of the performance would likely be quite captivating to watch. In addition, speakers could be playing the audio from one of the drum circles at a time. These windows into the collaborative space occupied by the drum circle participants would likely create a lot of interest in the speaking trees, and would probably keep the attendees quite entertained. In order to include the internet attendees of the Brain Opera, Web users could be allowed to submit images into the vast catalogue of images that drum circle participants can select from. (Obviously, some censoring would be in order here.) Another conceivable option would be to create a drum circle on the internet. Obviously, the Web user’s connection would not be fast enough to relay the performance back to him, but their performance could be displayed on a monitor in the Brain Opera lobby, along with the compositions of the speaking tree drum circles. People needed to put it together: Server God This person(s) knows C++, can figure out the Rogus McBogus libraries, and knows how to write a server. He has to write the code for the central audio server responsible for maintaining connections with the four speaking trees in the drum circle, playing the audio to the headphones at each tree, incorporating digitized samples from the users into the performance, and informing the trees when a user changes his image and when it’s time to change to a new set of patterns. Composer Dude This person has to write the drum patterns that are going to be used in the drum circles. Sound Wizard The speaking trees have to be able to segment the digitized samples from the users into small chunks that can be used as percussive instruments, and these samples have to be in a format that can fit into the central audio server’s performance. This person has to write the code to segment the samples based on time and amplitude, and figure out a way for the server to play back these samples. Art Guy This person has to gather or create the images that will be displayed in the background and in the users’ strobe boxes. Interface/Graphics Guru Somebody has to write the code (C++ likely, perhaps using Inventor’s new PC libraries?) that will run on each Speaking Tree. This code will strobe the images when prompted by the audio server, allow the user to select his visual and musical instruments, and otherwise make the drum circle a visually stunning thing to be a part of. I think that I can fill the role of Composer Dude, Art Guy, and most of Interface/Graphics Guru, but I definitely need a Server God and a Sound Wizard. I could also use someone to give me a hand with some of the processor-intensive graphics tasks I’ve got in mind. Way out there: (crazier directions) The same concept would probably work with 6, 8, or even 12 speaking trees in a single circle. The amount of work that the central server is expected to do increases, as does the complexity of the performance, but the experience would certainly become more detailed. If a drum circle seems too limited, the user could be allowed to participate in a more full musical composition. Then, in addition to the rhythm parts, the users would have control over the melodic and chordal aspects of the composition. For instance, if a user was controlling the melody, he could record his voice, which would then be pitch matched and used as the instrument in the performance. More modest destinations (if my idea seems too crazy): If the visual aspects of the drum circle seem daunting, the audio component could instead be developed further, and the user would have the above described interface, but without the images being strobed in the center of screen. If the central server (which I don’t expect to be too complex) turns out to be too difficult, each of the twelve speaking trees could instead make their own audio composition, allowing the user to control all four instruments in the drum circle. Single Tree Application: If it turns out that four trees are not available to run the drum circle, the same idea could still be an interesting application for a single tree. I must stress, though, that the effort necessary to develop the system on multiple trees doesn’t seem to be much more than what would be invested to develop the system on a single tree. In a single tree application without a central audio server, each tree would be responsible for the audio and visual components of its user’s experience. This would certainly increase the load on the machine, and the images would almost certainly have to be strobed using the palette switching techniques mentioned earlier. Still, the system could run almost identically to the multiple user model, perhaps giving the computer control of the other three performers, or giving the user control of all of them. ================================== Date: Sun Feb 16 21:04:47 EST 1997 From: Seum-Lim GAN 1.) Have buttons/handles on the outside of the hood so that users can adjust some bahaviours of the trees. Examples : emulate mouse clicks, adjust loudness, control what they want to answer (skip to next question), change graphics background, change Marvin's voice(?). 2.) Have sensors, can be infra-red switch or something built inside the headphone, so that the hard to use and easily broken "leaf" like button can be gotten rid of. This way, the tree will know if someone has entered of left. 3.) I am not sure what kind of microphones are used in the trees. From the problem it gets, i.e. unpredictable behaviours when ambient noise is sufficiently loud, the microphones may not be directional. Perhaps, they should all be replaced by highly directional ones that will get on sound from the users' mouth to reduce this funny problem. 4.) Have a miniature colour camera inside the hood so that images of people talking/listening can be seen somewhere in other parts of the Brain Opera or even within the talking tree system. This option should be made available with the talking tree system whether we use it immediately in this project or not. Also, it should be implemented to the singing trees while we are at it. If possible, as well as for the harmonic driving (most interesting), gesture wall, melody easel and etc.. (everything). 5.) Redecorate/redesign the hoods so that they look less "alien". I feel that the vast number of buttons that appear on, and under the hood can be hidden away. Perhaps some kind of fabric to make them less obvious. People to work with : Since my descriptions here are mainly hardware and design, someone with knowledge of C++ programming in Windows NT will be good and complementary. Gan ================================== Date: Sat Feb 15 16:58:47 EST 1997 From: Josh Strickon I was thinking of everything that we have been talking about. I know that we h\ ave thought of using these structures as soem communications terminals, or as a\ collaborative instrument. I recently had a new idea. We addressed how these \ could be used as an instrument. One of my favorite parts of the brain opera is the minsky melodies. It is visu\ ally the best part of the movie and the only literal reference to the story of \ the brain opera. The idea is this, each tree would create your own minsky melo\ dy. The tree itself would function in a similar matter. 1. you walk up and an intro screen appears. 2. marvin will ask you questions 3. you will answer them 4. your answers to the questions will be pitch shifted to music 5. you will hear your personal minsky melody 6. you will leave. 7. by some selection process, the minsky melodies will appear in the performan\ ce. This is a good way to integrate the peoples voices into the brain opera. For e\ ach tree, the questions would be different as would the melody. The overall st\ itching of the melodies together, would be the same during the performance. ie. tree1, followed by tree3, followed by tree5, etc. so the music would be th\ e same, but the words would change. Technically, I don't know what it would ta\ ke to shift the words to music, but I think it has been done before. Graphically this is where the, largest change would take place. I would like t\ o use something like inventer or open gl to do some sort of expressive text. M\ arvin would not be the focus any more. I was reading this weekend about someth\ ing microsoft has available. It is some sort of agent authoring ability that a\ llows you to create agents that serve as guides through a program. Marvin woul\ d become more dynamic, as he would not appear inside a box, but would be able t o move around you and the screen. It would be interesting to play with spatial\ ization of the sound as well. Marvin could be infront, to the side or above yo\ u. Sometimes you see him, sometimes you don't. Once you created your melody, it would be instantly archived on our web site, m\ aking your contribution a permanent part of the brain opera. Another idea is t\ hat each tree has an underlying melody, and using pete's audio collage techniqu\ e, all of the samples would be combined into another sample. With this technique, we analyze all of the samples from an audience and create \ different parts of the performance from them. Using this scheme, a person, would be givin an example of what the trees can do\ ,when he or she walks up. Then the questions would flow in the same way as bef\ ore. When you are finished, you will hear an example of how the processing tec hnique works, by choosing a familiar melody and hearing your voice being transp\ osed in to that sound. I think this would retain much of the same functionality and idea of what the t\ rees were intentionally used for as well as turn them into an instrument. They\ would not be just a sampler anymore. Another Idea that I have is that the trees become a weave of interconnected tra\ cks. When you answer a question, the file is sent out on the track. The trees\ themselves become nodes, in which a user hears the answers coming from a cert\ ain direction as well as sees, expressively, the corresponding question scroll by. When a user walks up, he slowly becomes immersed into this vocal environme\ nt. Files start zipping around. Eventually the files could reach dead ends. \ Or they could be sucked into the performance. The performance could be a speci\ fic spot on the tracks or could be and agent that eats up sounds. Along with t\ eh spoken text, ambient music could be playing. This idea would best be expres\ sed as Minsky net. You are in a web of sounds. Sounds become physical objects\ that fly by you. The screen would be a portal for the questions. This ide convoluted and hard to describe in email, but would function similar to the pr\ evious idea. The closest analogy that I could give would be one of those resta\ urants with the tracks that go around with plates. Each station would have the\ function of putting a plate on the track, except there would be more tracks. \ When the plate gets to you, you hear the sound. Plates could eventually be tak\ en off the tracks or fall off. Depending on where you sit, you are going to get\ different sounds at different times. I would still use the expressive text as\ well. I am not sure how the buttons would function, or how hard it would be to assemb\ le this system. It would create a nice environment though. -Josh ================================== Date: Thu Feb 13 12:54:59 EST 1997 From: Seum-Lim GAN Hi, this is a test contribution ! Testing 1 2 3. ==================================