This page describes my current and recent professional life. It is mostly about work examples and my portfolio, but I also included a bit of background as to why I am doing what I am doing, the leadership positions that I held recently and what I achieved in them, the functions I tend to have at these recent positions, and a concise summary of a typical modus operandi of mine in recent positions.


In short

At the very core, I am a Human-Computer Interaction (HCI) research scientist, inventor, and visionary. I dream up, create, research, design, develop, and patent systems with fundamentally novel user interfaces in the domains of human augmentation (social, knowledge, emotion, etc.), UX of autonomous vehicles (as occupant or pedestrian), augmented reality (visual, auditory, haptics of AR/VR/XR), advanced HCI and HMI, human-robot interaction, aerial robotics, gestural and conversational systems, cognitive and affect sensing, and more. My strengths are "engineering creativity," and connecting the dots of research and emerging technologies to create radically new products and services. I have 20+ years of experience in corporate and academic environments, including MIT, Samsung, Hewlett-Packard, and HARMAN International.

Motivation: why, how, and what?

Although I have been active in a multitude of (seemingly unrelated) domains, there is actually a common theme that spans most of my professional work: Human Augmentation. I will have to explain this a bit more to put it in context. From a global historical perspective, mankind has only just begun creating technologies with the explicit intention to directly enhance our bodies and minds. Although we have built tools to enhance our motor skills and perception for many millennia already (think "hammer" and "eye glasses"), I am referring to augmentation of higher level mental skills. The augmentation technologies that we do have already, however, have impacted us deeply: for example, mobile phones extend our conversational range and allow us to talk to people almost anywhere on the face of this earth. Mind you, although smartphones just seem like a sensory augmentation, they are actually rather an augmentation of our social interaction potential. This becomes clear when we look at today's smart devices, which intend on enhancing our knowledge, telling us where we are, what other people think and see, and many other useful things. Going way beyond that, it is foreseeable, however, that Human Augmentation technologies even in the near future will enhance us in much more extreme ways: perceptually, cognitively, emotionally, and on many other levels. I am intent on pushing the envelope in this direction, and creating technologies to serve this purpose.

And I have been doing so for 35+ years: in order to understand the problem well, I have studied the human psyche in depth for eight years (and got a Psychology degree for it), and then worked on the engineering side at MIT for another eight years (during which I designed a multitude of high tech systems). After that, for the last 15 years in industry, I have put these two perspectives together in order to create technologies, systems, devices, and services. I strongly believe that to create technologies that immediately and explicitly enhance people, and allow us to interact with technology more intuitively, we need to combine deep engineering and deep psychology knowledge, and all shades in between. So, I have worked in the fields of mobile communication, augmented reality, virtual worlds, artificial intelligence, robotics, and many more, and created a series of systems, some as proof-of-concept and some close to a product, that each show how I think we will interact with future technologies.

Positions held

From 1997 to 2005, I was at the MIT Media Lab as research assistant and lead for about ten research projects in the domain of speech interfaces and mobile communication, conversational and communication agents, embodied agents, and wireless sensor networks. Below are summaries of my MIT projects. I supervised undergraduate researchers (software, robotics, circuit design). I was in close contact with the lab's 150+ industrial sponsors, and gave over 100 presentations, talks, and demonstrations.

From 2005 to 2010, I was project leader and principal researcher at Samsung Research in San Jose, CA. I was part of SAIT, the Samsung Advanced Institute of Technology. I was in charge of initiating and managing HCI research projects, and headed a team of PhD level researchers, engineers, and innovators. I envisioned, designed, and developed fundamentally novel user interface concepts, based on original research in the domains of ubiquitous computing, mobile communication, artificial intelligence, robotics, virtual worlds, and augmented reality. One of my larger projects explored new ways how to interact naturally and intuitively with 3D content, such as AR content, virtual worlds, games, etc. I did this on mobile and nomadic devices from cellphones to tablets to laptops to unfolding display devices to portable projection devices and more. I built working prototypes that demonstrate key interaction methods and intelligent user interfaces for natural interaction. Below are summaries of my representative Samsung projects.

From 2011 to 2012, I was with HP/Palm, as the director for future concepts and prototyping at Palm: I led, managed, and inspired teams of end-to-end prototyping and research engineers (hardware & software), UI prototyping engineers, and UI production developers (the latter only interim until August 2011) in the webOS Human Interface group. I created working systems of future interaction methods, filed for many patents, and contributed to strategic roadmaps across all of Palm/HP. My Palm projects were in the fields of wand and pen input (e.g., project 3D Pen), mobile 3D interfaces, remote multi-touch interfaces (e.g., project Ghost Fingers), and more.

From 2012 to 2020, I was with HARMAN International (since 2018 part of Samsung), as Vice President of Future Experience and AI. My responsibilities included assembling and leading teams of advanced research and prototyping engineers in order to create working prototypes and IP of technologies that will enable radically new UX for future HARMAN products. I also founded and led a large AI team which developed cutting edge ML based solutions for all HARMAN divisions. I was future proofing the current UX of all HARMAN products, from Infotainment (automotive) to Lifestyle Audio (consumer audio) to Professional (professional audio and light). I worked out of Mountain View, California, in the middle of the Silicon Valley. Going beyond the mobile focus I had at Samsung and HP, my HARMAN projects also included future user experiences in cars, and UX synergies between home, mobile, and automotive platforms. I was applying my expertise in conversational systems and spatial interfaces, but was also able to include the interactive spatial audio domain since HARMAN is deeply involved in audio systems of any kind.

During the last 3+ decades, I worked on many professional projects. One way to get an idea about the range of the projects I have done is to browse my patents and patent applications: they are essentially blueprints for many of the works I did. As of right now, I hold 180+ granted patents: 90+ U.S., and 90+ non-U.S. Another way is to look at my work samples, as described below. But let me briefly explain first the functions I held.

My functions

As an engineer, I am an inventor, builder, and implementer who uses software and hardware rapid prototyping tools to create systems from the lowest level (firmware, sensors, actuators) up to highest level applications with GUIs and networking capabilities. Systems I have built at MIT include palm-sized wireless animatronics with full duplex-audio, conversational agents interacting with multiple parties simultaneously, autonomously hovering micro helicopters, laser based micro projectors, and a drawing tool with built in camera and an array of sensors to pick up and draw with visual properties of everyday objects.

As an HCI researcher, I can isolate relevant scientific problems, tackle them systematically by using my extensive training as a psychologist, and then come up with novel theories and visions to solve them. Then I integrate theories and technologies into working prototypes and innovative systems, and verify their validity with rigorous user testing, be it with ethnographic methodologies or in experimental lab settings.

As a leader, I am able to assemble a team of world-class experts, lead and advise them on all research and engineering levels. I have a knack to inspire and enable team members, bringing out the best in them, and at the same time keeping strategic requirements in mind. So far I have supervised organizations with up to nearly 60 people, but I am very comfortable creating high impact systems and prototypes with smaller teams as well.

Academics

I have earned three degrees: both Master's and PhD in Media Arts & Sciences from MIT, and an additional Master's in Psychology, Philosophy, Computer Science (University of Bern, Switzerland).

I have received my PhD in Media Arts & Sciences from MIT in June 2005. During my studies, I worked as an HCI research assistant at the Media Lab at MIT. I was part of the Speech + Mobility Group (then called "Speech Interface Group"), headed by Chris Schmandt, where we worked on speech interfaces and mobile communication devices. Although in a user interface group, my personal approach includes software and robotic agents to enhance those interfaces. My PhD work (as well as my Master's thesis) illustrate that, and my qualifying exams domain reflect these ideas as well.

Portfolio

My representative industry projects have been focusing on a wide range of domains, from user interface innovation (Samsung) to mobile device interaction (at Samsung and Palm) to AR/VR/XR (at Palm and Harman) to smart audio, to AI/ML, to autonomous driving, to automotive interfaces, to neural and brain interfaces (all at Harman) and many more. But in a wider sense, many of my works hinge around the idea of Human Augmentation (as described above). I have the mind of an engineer (have always had), but in the last 15 years, I was fortunate enough to direct teams and larger orgs which allowed me to execute projects on a larger scale and faster time frame, focusing on R&D and Innovation management.

My representative MIT projects—mainly my doctoral work—focus on adding human-style social intelligence to mobile communication agents that are embodied both in the robotic and software domain.

My past projects are diverse, and include projects such as autonomously levitating devices (yep, that's what they were called 40 years ago before drones became all the rage in the last decade). Some of these past projects, particularly before I attended MIT, are not in the engineering domain at all, but in social sciences, and often in the psychology and philosophy realm (where I have deep expertise as well). In an earlier life, I was even working in an entirely different domain (audio and video engineering), but here I want to focus on my most recent professional life.

Typical Modus Operandi at recent positions I held

Commonly, one can summarize their professional life with a CV or a Resume, or work samples like below. However, an alternative and more concise way of describing what I can do and how I contribute to the companies I work for would be as follows:
  1. I am a builder. I build the future, almost to production level. At my most recent position, I created, nurtured, and delivered 50+ prototypes of systems and services, then developed some of them further towards current and planned products.
  2. I have a wide range of backgrounds. I have deep expertise in HCI and Human-Centered Design, HRI and Robotics, ML and Responsible AI Design, User Experience and Interaction Design, AR/VR/XR and Knowledge Augmentation, Audio and Speech, Hearables and Post-Wearables, Deep User Understanding and Neural Sensing, Design for Sustainability, Music, Cognitive Psychology. I am a generalist, driven by solving problems, not by specific solutions.
  3. I have technology skills and organizational skills. I excel at creating future visions, translate them to roadmaps, then implement them by setting up and leading programs and innovation teams. I work closely with my team, executives, business unit leaders, and product initiation groups to create an innovation and future products pipeline. At my most recent position, I supervised 55+ employees across multiple global teams.
  4. I have a passion for invention, innovation, exploration, and creative engineering. I apply this passion to every challenge. Leveraging my comprehensive software and hardware engineering knowledge, my social sciences background, and my people sensing and leading skills, I excel at architecting and creating human-centered holistic systems. I am an orthogonal thinker, excited about investigating unexplored domains and creating future solutions to affect humanity positively in a meaningful way.

Leadership
HARMAN International, a fully owned subsidiary of Samsung. When I started working for HARMAN in 2012, it was already a long established company, founded in 1980. It went through some serious issues until Dinesh became CEO, who then hired IP Park in the CTO role, who in turn hired me. In 2017, HARMAN was bought by Samsung and became the fourth business unit of Samsung. Interestingly, for me this was a return to Samsung, since I already worked for Samsung from 2005-2010 Vice President, Future Experience and AI
(part of HARMAN X, and reporting directly to the HARMAN CTO)

I had a dual leadership role: first, I set up and led the Future Experience (FX) and the Corporate AI teams. The FX team, set up in 2012, did top-down vision driven research and engineering on systems with futuristic and advanced UX. I hired and inspired the team members, set up the group's projects, created concept videos, executed on proof-of-concept prototypes, iterated towards productization, patented all important IP, and interfaced with other R&D groups and the technology strategy teams at HARMAN. The Future Experience team's charter was to come up with systems that create and showcase new user experiences that span all business groups of HARMAN and beyond, exploring synergies, adjacencies, and new areas. Its sister group was the Corporate AI team: I initiated this team in 2018 and grew it quickly to 50 people. It was set up as an AI execution group, working on machine learning and deep learning projects which impacted all of HARMAN. My second leadership role was in advancing all things related to User Experience at HARMAN. In this role, I influenced roadmaps and R&D across all of HARMAN, from the automotive to consumer to professional divisions. I was in close contact with all HCI related teams at HARMAN, and worked on "future proofing" all products' UX.

My achievements at HARMAN include:
  • Founded and led the corporate Future Experience (FX) team which executed top-down vision-driven R&D and prototyping; this was the most forward-looking team in all of HARMAN. Delivered 50+ proof-of-concept prototypes, and 155+ granted patents to protect current and future products.
  • Established and led the 50-person corporate AI team, within record time (less than 6 months) and under budget, directly supporting all HARMAN divisions on their specific AI needs and products.
  • Delivered a suite of ML-based software modules to the HARMAN business units, to be used directly in products or as basis for further product development, in domains from DMS to VPA to intelligent audio. 2019 modules include: facial identification, music source separation, siren detection, emotion from voice.
  • Delivered 50+ proof-of-concept prototypes showing feasibility of a new product or new UX, to be used as reference designs for future products by new product initiation teams. Examples of 2019 demonstrators: unobtrusive stress and cognitive load sensing, audio event detection systems, alarms and siren detection systems, shape-shifting automotive interfaces.
  • Impacted products at HARMAN: JBL headphones and earbuds with Ambient Aware [link] (estimated 5M sold so far) are based on my work in selective noise management. HARMAN’s Dashcam product [link] was enabled by my work in aftermarket DMS. HARMAN’s advanced DMS offerings [link] come from my work in cognitive load & stress sensing, driver readiness from combined emotional-cognitive load, and more.
  • Delivered 155+ granted patents globally to HARMAN, as lead and co-inventor [link], to protect current and future products. Domains: smart headphones, driver safety and driver monitoring systems (DMS), virtual personal assistant (VPA), augmented reality devices (AR/VR), sound management systems, UX of autonomous vehicles, hearables and post-wearables, HCI and HRI, aerial robotics/drones, neural sensing.
  • Strategically increased HARMAN’s reputation via joint R&D with high profile partners such as Google ATAP [link], BMW, Ford. Engaged with virtually all car manufacturers (OEMs) to demonstrate the advanced technology developed by my teams: during OEM Technology Days (e.g., for Toyota, Alfa Romeo, JLR, Volvo, Renault, VW, GM, BMW), Auto Shows (e.g., Shanghai Auto Show, Geneva Motor Show), Consumer Shows (e.g., CES Las Vegas).

Year: 2012 - 2020
Status: concluded
Domain: engineering, management
Type: research group, executive function
Position: vice president
Key team members USA:
  • Davide Di Censo (R&D Manager, 2012 - 2017)
  • Elliot Nahman (Engineer, UX Design & Prototyping, 2013 - 2015)
  • Mirjana Spasojevic, Ph.D. (Sr. Principal Researcher, 2014 - 2015)
  • Joey Verbeke (AI Team Head, 2016 - 2020)
  • Adam Boulanger, Ph.D. (Director of UX Prototyping, 2016 - 2018)
  • Sven Kratz, Ph.D. (Sr. Principal Engineer, 2017 - 2018)
  • Josh Sweers, MBA (Project Manager, 2017 - 2018)
  • Neeka Mansourian, MFA (3D Animator & Designer, 2017 - 2018)
  • Priya Seshadri, Ph.D. (Principal Engineer, 2018 - 2020)
  • Evgeny Burmistrov, Ph.D. (Sr. Principal Engineer, 2019 - 2020)

Key team members Russia (leadership):
  • Vladimir Aleshin (AI Execution Lead)
  • Andrey Filimonov, Ph.D. (AI Technical Lead)
  • Sergey Litvinenko, Ph.D. (AI Architect)
  • Ivan Shishalov, MBA (Project Manager)
  • Andrey Milchenko (Principal Architect)
  • Mikhail Sotnikov (Principal Architect)
  • Andrey Epishin, MBA (Senior Researcher)
  • Roman Vlasov (Principal Architect AI)
  • Dmitry Yashunin, Ph.D. (Project Manager)
HP Palm GBU Palm was bought by HP in July 2010. In 2012, HP decided to stop producing hardware, but kept the OS called webOS. It was later sold to LG and got a successful second life as OS for TVs, watches, projectors, fridges, etc. Director, Future Concepts and Prototyping
(part of the HI team in the webOS/Palm business unit of HP)

I founded the team in January 2011. My focus was on leading and inspiring the team members, directing the group's research agenda, hiring new members, interfacing with external groups, and patenting. The team's charter was to do risky and holistic HCI research and end-to-end prototyping (spanning software and hardware) that pushes the edges of HCI. Our projects targeted future product releases, approximately 3-5 years from the current releases. This incredibly talented team consisted of Ph.D. level researchers with diverse backgrounds, from 3D virtual environments to robotics to architecture to speech interfaces. Each of them continued their career and had significant impact on the industry and our lives. I did lead two other teams at Palm: about 15 designers and coders who did UI development and software UI prototyping. Within Palm and HP, I worked with engineers (software and hardware), designers, researchers (e.g., HP Labs), and planners (roadmapping and strategic planning). My team's output consisted of working prototypes and patents of interaction methods that served as ground work for future releases. Our patenting efforts were significant, about one invention disclosure per week.

My achievements at HP/Palm include:
  • Led and managed 25+ employees in multiple teams: end-to-end prototyping and research engineers, user interface (UI) prototyping engineers, and UI production developers. Oversaw $2,000,000 budget.
  • Established and led the Future Concepts and Prototyping group which created a rapid sequence of prototypes showcasing future interaction methods and applications. Example projects: wand and pen input, mobile three-dimensional (3D) interfaces, remote multi-touch interfaces.
  • Helped significantly with product releases by effectively managing a UI production team, and injecting innovative features into product releases.
  • My granted patents were sold or licensed to various other companies (such as Qualcomm), and are now used in other mobile products.

Year: 2011 - 2012
Status: concluded
Domain: engineering, management
Type: research group
Position: director
Team members: Seung Wook Kim, Davide Di Censo
Samsung Electronics Samsung Electronics Project Lead and Team Lead, HCI Research Team
(part of the Computer Science Lab at Samsung R&D)

I founded the team in 2008, and led it until my departure from Samsung at the end of 2010. The team size was between 3 and 5 members, with the staff researchers holding doctoral degrees in various fields, and interns from first tier universities. I was tasked to initiate and execute strategic HCI projects, both in collaboration with other Samsung groups and external groups. My duties included leading and inspiring the team members, setting the team's direction, creating strategic and feasible project plans, keeping the projects on track, hiring, and patenting. Our main accomplishments were working prototypes (see some projects below), evangelizing these prototypes to Samsung executives (up to chairman, CEO, and CTO), patenting core technologies, and technical reports of our research.

My achievements at Samsung include:
  • Founded and let a group of world-class HCI experts that created proof-of-concept prototypes for novel interaction methods; completed most transfers in 6 months.
  • Delivered several prototypes for new interaction methods with mobile devices that made company’s phones competitive with Apple’s iPhone. Some of this work got applied to cell phone products in 2012 and beyond. Obtained granted patents for the crucial technologies.
  • Transferred prototypes for interacting with 3D TVs based on bare-hand gesture interaction and other methods, from innovation group to business units in Korea.

Year: 2005 - 2010
Status: concluded
Domain: engineering, management
Type: research group
Position: project and group leader

Team members: Seung Wook Kim, Ph.D. (2008-2010), Francisco Imai, Ph.D. (2008-2009), Anton Treskunov, Ph.D. (2009-2010), Han Joo Chae (intern 2009), Nachiket Gokhale (intern 2010)



Representative industry projects
High cognitive load (CL) and stress are a significant source of driver distraction. The objectives of this work were to develop novel methods to monitor biometric data at a distance (contact less) to reliably detect states of high cognitive load and stress. Our effort is an industry-first, and goes way beyond our earlier systems which employed pupil diameter changes (pupilometry) to detect CL. In this effort, we use eye motion data (such as eye saccades and fixations) as well as heart rate data (in particular heart rate variability) and significant machine learning techniques to get to more precise and continuous data.<br><br>

		This video shows a live test run of our Cognitive Sensing technology as of end of 2019. Our purely machine learning based system can determine the driver's level of stress and cognitive load (separately) from eye motion data and heart rate data (see continuous curves).<br><br>

		This effort was done by my AI team in Nizhny Novgorod, which was part of my Future Intelligence Labs (FIL). Video of in-car Cognitive Sensing Technology

My team, under the super capable guidance of AI team head Joey Verbeke, created a driving simulator setup with a plethora of sensors including infrared cameras, ultra-sideband radar, chest-worn ECG sensors, head-worn EEG sensors, in addition to the collection of various driving parameters like steering wheel position and pedal position.<br><br> We developed an experimental procedure that induced cognitive load through the use of validated methods including DRT, n-back tests, and the OSPAN task. Data was collected during hundreds of repeated experiments and labeled by our medical experts. After many iterations and investigations, the salient signals of eye-gaze fixations, saccades, and blink rate were identified as the most important signals to the ML classifier for CL. In addition, heart rate variability (HRV) turned out to be the best data for our stress classifier. Data collection rig

Our technology is the basis of HARMAN's current DMS offering. HARMAN DMS product

This slide summarizes the main points of the original Cognitive Sensing program from early 2019 (still relying on pupillometry only): going beyond traditional DMS, our Deep Driver Understanding technology called <i>Neurosense</i> not only provides novel driver metrics such as cognitive load and emotional load, but also highly reliable and monetizable signals such as engagement, attention, and driver readiness. We have filed for multiple patents for all aspects of these and related Cognitive Sensing innovations. Overview slide



Click on any thumbnail for more details!
Cognitive Sensing Systems (HARMAN)

Cognitive Sensing Systems do real-time non-invasive sensing of the brain’s cognitive load, stress, and emotional states by analyzing eye motions, facial & voice cues, and low-level physiological signals. Application areas include automotive safety, semi-autonomous driving, advertising, gaming, AR/VR/XR, and many more.

My team's deep driver understanding technologies, developed over several years, are used for HARMAN's DMS products, which can make a vehicle adapt to the specific state of the driver, e.g., during autonomous vehicle hand-off events, to dynamically adjust ADAS parameters (such as increase warning times), to adapt and personalize the vehicle UI/HMI, etc. My team also developed the core DMS system for the HARMAN Aftermarket DMS products.

This is one of the larger programs I have initiated and executed at HARMAN. My teams, both the FX group in California and the AI team in Nizhny Novgorod in Russia, have developed many prototypes and modules in multiple iterations. For our earliest systems named DriveSense, we worked with startup EyeTracking Inc. on pupillometry based prototypes, and HARMAN announced the effort in 2016 (and the awards we won), teased it, and we demoed it at CES 2016. This and more advanced systems, Neurosense, were shown at following CES shows and many OEM Tech Days, and created lots of media attention (e.g., the Discovery Channel did a nice piece).

Complementing these previous efforts, my AI team started developing novel (industry-first) ML based DMS methods. By the end of 2019, working systems that measure cognitive load and stress from eye motion and heart rate data were successfully put in test vehicles and demonstrated to C-level executives. In parallel, we were building systems that combine cognitive load and emotional load (from face and voice cues) to sophisticated signals like driver readiness, and were planning large scale validation studies with external partners on all our new algorithms and methods.

I want to add that this program had significant impact on HARMAN. Young Sohn, President and Chief Strategy Officer of Samsung Electronics, saw our Neurosense demo at McKinsey’s T-30 Silicon Valley CEO meeting in June 2016, and told my team and the HARMAN CTO that they should "connect with Samsung's automotive people". Five months later, Samsung (under the leadership of Young Sohn) announced the acquisition of HARMAN for $8 Billion. I claim we had something to do with that!

Years: 2015 - 2020
Status: ongoing
Domain: engineering

Type: full range R&D effort from early PoC to near product-level systems; many patents; multiple concept videos; various demonstrators (software and hardware); systems shown at OEM Technology Days, Auto Shows (e.g., Geneva), and Consumer Shows (e.g., CES), invited paper

My position: project and team lead
Collaborators: all FX and AI team members
Our initial concept video of end of 2015: it describes various shape-shifting interfaces, from hand rest interfaces (surface changing), to rotary controller (shape changing), to steering wheel (thickness changing). It also describes why: these interfaces are hands-free, and use neither hearing nor seeing, so do not add to the visual and auditory information overload of the driver.<br><br>

		Many of the concepts we showed in this video have been implemented since then, and some are close to production level. As always, all these concepts and solutions are protected by patents. Concept Video 2016

This is our high-fidelity shape-changing rotary controller prototype, built in 2016. It is a fully working prototype where the shape of the knob varies through the rotational position of 12 “blades”. This video focuses on the mechanical design of the prototype: we used 6 servos (6 DoFs) to rotate pairs of blades, which then create various controller shapes dynamically. This happens <i>at the same time</i> while the user is manipulating the rotary knob, so this controller is both an input and output device, simultaneously, closing the interaction loop.<br><br>

		Note that the 12 rotating pieces that move robotically are dull, and feel smooth to the touch. We made sure that there is no way the system would be able to accidentally pinch fingers. The user experience of touching a device (in particular a controller) which can change shape while touched is really unusual, but people get used to it within seconds. A metaphor we sometimes used was petting a dog or a cat which then reacts to the touch by slightly arching their back, or raising their ears.
		Shape-shifting controller (SSC) v1

This demo video shows the actual user experience of our Shape-Shifting Rotary Controller demo version 1. The demonstrator includes a mock center console and infotainment system, and shows the different shapes that the knob can assume, and how the various shapes help the specific applications.<br><br>Since it was built, this demonstrator has traveled all over the world, to various trade shows, and was shown to many OEMs. It shows well what shape-shifting may mean for a rotary controller. However, it was not intended to be a product yet, but a highly polished UX prototype. UX demo of SSCv1

Around 2018, we developed a second generation rotary controller which has a lower mechanical complexity than v1, and as such is closer to a product. The animations in this video show that we decided that the overall diameter of the knob would be the most desirable degree of freedom. We designed multiple variations that each have a different numbers of sides, resulting in different shape, such as triangle, square, hexagonal, etc. All of these variations have in common that they are driven by a single actuator (single DoF), which makes them mechanically simpler and cheaper to produce. Design of SSCv2

This video shows a close-up of the v2 working controller. In addition to robotically changing its diameter, it can also disappear in the arm rest. This was requested as a feature by the OEM we were working with. This “hidden until in use” capability would make sure that the knob is flush with the arm rest, unless it is getting used. This video also shows how fast, or slow, the device can change its diameter, and the “wiggles” it can produce using minute shape changes. Note that such wiggling is not the same as traditional haptics (vibration), since it uses the human sense of proprioception, and not the vibration senses in our fingers. Mechanics of SSCv2

This video shows a UX demo of the full system, integrated into a center console. All parts are fully working. The controller was designed as a drop-in replacement for traditional center console rotary controllers.<br><br>

		Note that the controller rises when the user's hand is sensed in proximity (hovers over the arm rest), and disappears after a timeout without a hand. On the top level menu, the controller assumes 3 diameters, depending on which of the 3 menus are in focus. The push button on the left makes the system enter into a sub-menu, and the button on the right is a back button.
		UX demo of SSCv2
Shape-Shifting Interfaces, part I (HARMAN)

Shape-Shifting Interfaces robotically change their shapes, surface textures, and rigidity to communicate with users on a semi-conscious and subtle level. Such systems use the human proprioceptive sense, not competing with vision and hearing, thus enabling "load-balancing" of the human senses. Primarily used for automotive interfaces in standard and semi-autonomous vehicles, our effort was announced by HARMAN, and our systems are currently being evaluated by a leading German OEM for productization.

Shape-Shifting Interfaces is a large-scale long-term effort which has delivered many systems and prototypes. Our Shape-Shifting Rotary Controllers (SSC) are amazing pieces of technology, and create a low distraction, yet super obvious user experience, such as controlling infotainment systems eyes-free. (The videos on the left show the user experience.) We refined these controllers to near product level. The HARMAN Fact Sheet describes the product reasonably well.

All shape-shifting interfaces create a never seen before and highly futuristic UX for vehicles of all kinds, including eVTOLs, space crafts, and beyond. As an experience, it is unlike anything else consumers have experienced before, be it in the automotive domain or anywhere else. We were planning user studies to verify our assumption that such interfaces do not add to the information overload, and are largely processed in parallel to all other human senses.

Year: 2016 - 2020
Status: ongoing
Domain: engineering

Type: full range research and development from early PoC to near product-level systems, many patents, multiple concept videos, various demonstrators, presented at OEM Technology Days (e.g., for Toyota, Alfa Romeo, JLR, Volvo, Renault, VW, GM, BMW), Auto Shows (e.g., Shanghai Auto Show, Geneva Motor Show), Consumer Shows (e.g., CES Las Vegas)

My position: project and team lead
Collaborators: all FX team members
This video shows the high-fidelity prototype of a Shape-Shifting Steering wheel. It is fully working, and as a demonstrator integrated with windshield view display which then showcases various events to the driver in real-time.<br><br>

		The mechanical solution we chose, to put the actuators in the steering column, would not be how it is productized, but enabled us 8 DoFs (with 8 servos) to address 4 fingers on each hand. For a product, that is likely not necessary, but it shows the various ways to interact with the driver very well: e.g., for a count down to a highway exit, on one hand, all fingers get lifted, then then in 1-second intervals, each actuator sequentially goes back into flat position. This alert literally did not need any explanation. In another example, when a potential front collision is detected, all 8 actuators flash briefly (but quite noticably), making it very clear to the driver to pay attention ahead.

		We got a comprehensive patent granted for this solution several years ago. Steering Wheel Prototype

Here are some shots of the Steering Wheel before it got closed up, showing the eight actuation mechanisms. Making of SSSW

The haptic language was core to the IP of this system, and the patent got granted in 2017. Haptic Language

This is what a Shape-Shifting Steering Wheel Cover may look like (symbol image, not the actual product!): the red areas would be inflatable, making them slightly bulge. All actuator components would be withing the steering wheel cover, and a Bluetooth module would connect to the user's navigation system, or to the infotainment system directly. Steering Wheel Cover

This video shows our Shape-Shifting Surfaces prototypes. Each of the 5 solutions shown is mechanically different, and creates a different proprioceptive user experience: some of them have elements that lift vertically, some have surfaces which hinge inwards, and some have elements that move laterally. We were in the process of doing a large scale user research study to gauge the advantages of each solution, and the user preference for each.<br><br>

		A primary application area for such devices would be a semi-autonomous vehicle where the user is in a “co-driving” situation with the car. In one application example, the user would rest their hand on a robotically improved arm rest. They then would get an almost subconscious understanding of the environment around the vehicle, without having to look at a display. Obviously, the mechanical assembly would be hidden in the arm rest.<br><br>

		Furthermore, if the user perceives the vehicle to drive too close to another vehicle or the curb, they may push with their hand laterally against the “obstacle”, and therefore move the car away from it—all while leaving the car in autonomous driving mode. This specific use case is a primary concern of users of semi-autonomous vehicles, and was confirmed in a thorough user study my group did with CMU. Shape-Shifting Surfaces Prototypes

This video shows some early experimentations for Shape-Shifting Surfaces. The first prototype uses multiple “stamps” for each of the 4 areas they can appear, which then are changed robotically to create multiple very clear indentations. The mechanism is related to an ancient ball-head typewriter, or rolodex.<br><br>

		The second prototype shown in this video uses an array of cams which can be rotated to be all flat, or to create a more bumpy surface. This prototype is not showing the cover that would hide the mechanical pieces underneath.<br><br>

		These early explorations were created by amazing Tomoki Eto in 2017.
		Early Prototyping

Design Iterations and some early CAD renderings for potential SSC mechanical solutions. Design Iterations
Shape-Shifting Interfaces, part II (HARMAN)

The concept of Shape-Shifting can be applied to other automotive interfaces: in particular, to the steering wheel, and to the arm and hand rest.

Our Shape-Shifting Steering Wheel (SSSW) is one of the most intuitive demonstrators I have ever built. It is a no-brainer as a product, either fully integrated into a steering wheel (and offered to an OEM), or as a steering wheel cover (to be sold as aftermarket solution to end consumers). The idea of a shape-shifting steering wheel is that it changes its thickness underneath the hands of the driver, to subtly give feedback to the driver. The simplest product would be a steering wheel cover that inflates slightly on the right or the left side, telling the driver to turn in that direction. We have built a more sophisticated system where the steering wheel's built-in robotic actuators can slightly lift each finger, separately or the whole hand at the time (see videos on the left). We have developed a whole new language to communicate with the driver, and can use this for many more applications: merger and blind spot alerts, countdown to turns, frontal collision alerts, pedestrian alerts, hidden object alerts, and many more. The HARMAN Fact Sheet has further details.

The Shape-Shifting Surfaces apply the concept to hand and arm rests. These efforts are a bit earlier in the productization cycle, but we have already iterated through many solutions (see videos on the left). A primary application area would be a semi-autonomous vehicle where the user is in a “co-driving” situation with the car. In one application example, the user would rest their hand on a robotically improved arm rest. They then would get an almost subconscious understanding of the environment around the vehicle, without having to look at a display. Furthermore, if the user perceives the vehicle to drive too close to another vehicle or the curb, they may push with their hand laterally against the “obstacle”, and therefore move the car away from it—all while leaving the car in autonomous driving mode. This specific use case is a primary concern of users of semi-autonomous vehicles, and was confirmed in a thorough user study my group did with CMU. (Note that their solution to the problem does not use shape-shifting interfaces since we prompted them to find other potential solutions.)

Year: 2016 - 2020
Status: ongoing
Domain: engineering

Type: full range research and development from early PoC to near product-level systems, many patents, multiple concept videos, various demonstrators, presented at OEM Technology Days (e.g., for Toyota, Alfa Romeo, JLR, Volvo, Renault, VW, GM, BMW), Auto Shows (e.g., Shanghai Auto Show, Geneva Motor Show), Consumer Shows (e.g., CES Las Vegas)

My position: project and team lead
Collaborators: all FX team members
This concept video shows the UX of earbuds with AAR. First, it shows how the user is able to cancel specific sounds, like the car alarm while on the side walk when he tries to talk to his friend, or cancel out traffic noise while inside, but the window open.<br><br>

		In this video, we use pointing and hand gestures so that the user can simply point at a source of sound and either lower it, or cancel it. We also show that sounds from a certain direction can be enhanced (like when he points towards the kitchen and increases that sound to check if the water in the kettle is boiling), with a similar pointing gesture. Later on, we show how the user can replace certain sound categories (e.g., substitute traffic with ocean noises, construction screeching with with seagulls screeching, etc.), and in that scenario, he is using a voice interface. AAR concept video

The vision of AAR is very big (like shown in the previous concept video), but it is also helpful to point out an MVP (minimum viable product): this headphone appears like a traditional ANC headphone, but shows the smallest amount possible of true selective noise control: it has <i>two</i> switches, one for turning off noises (which is similar to today's ANC), and one switch for turning off voices. These two switches work independently from each other. One could choose to cancel out only noises (like on an airplane when I want to talk to my neighbor), or only voices (like in an open floor office where the voices are particularly distracting, but I do want to hear everything else), or both. Of course an MVP could also have more switches, maybe one each for noises, voices, and nature sounds, and so on. The UI being simple binary switches would be a design decision—it could be done in other ways (see next image of UIs). AAR MVP headphones

AAR devices can come with a variety of user interfaces: we have shown gestures (in the first concept video: super intuitive), but more practical is a GUI (a slider per sound event), or a voice interface (just talk to the VPA about your prefences; see also following automotive concept video). AAR UIs

This concept video shows our recent automotive FX portfolio. In the section above, we show the voice user interface for an in-cabin AAR system: the user simply tells the vehicle which sounds to emphasize, de-emphasize, or cancel. Later in the segment, a siren is heard which obviously will be passed through directly, regardless of other settings.<br><br>

		Although we initially focused on consumer products for AAR (because HARMAN already has ANC headphones), from an engineering perspective, an in-cabin AAR system is actually easier to implement than a head-worn system because for AAR to work in a car, one needs microphones around the vehicle, an in-cabin sound system, and a voice agent—all of which already exist in most modern vehicles. Also, there are fewer computational limits for in-vehicle platforms compared to what we can fit, e.g., in earbuds or Hearables. AAR in cars

This is the video for the Indiegogo campaign that we intended to launch. The campaign was not able to move forward because of corporate issues, but the video shows very nicely another MVP for an AAR headphones: in this case, the headphones are sensitive to a few (customizable) sound events, and alert the user even when they would not hear the events. Obvious sound events are the user's own name (which would result in Name-Sensitive Headphones), alerts like door bells or car horns or bicycle bells, and the sound of a train arriving. We did build a PoC for CES 2016 which was largely feature complete. Indiegogo video
Click on any thumbnail for more details!
Auditory Augmented Reality for Super Human Hearing (HARMAN)

We want control of what we hear. Noise cancellation headphones are ok, but not selective enough. We need a product which not only cancels exactly what we don’t want to hear, but also emphasizes what we like. This leads to a kind of SUPER-HUMAN HEARING.

Auditory Augmented Reality, short AAR, allows users to customize their auditory environment: selectively cancel unwanted sounds, increase or decrease the volume of other sounds, or add new sound sources to their sound scape. We apply AAR to products such as headphones, ear buds, Hearables, in-car audio systems, AR/VR/XR gear, etc., redefining what these sound systems can do, beyond listening to music.

To explain the idea, we created a concept video for earbuds use, made AAR part of our portfolio video for automotive projects, and a headphones MVP video (the crowd sourcing campaign it was made for did not get corporate approval).

Over the years, we created many prototypes and modules that showcase AAR, such as Name-Sensitive Headphones (shown at CES 2016, which got lots of attention), machine learning based modules for Audio Event Classification, Sound and Music Source Separation, UX demonstrators, and many more, all protected by 20+ patents.

And our AAR efforts have impacted HARMAN products directly: e.g., JBL headphones and ear buds with Ambient Aware (estimated 5M sold so far) are based on AAR. Beyond HARMAN, I also provided significant thought leadership by coining the term in 2012. Still, AAR is just starting to become a “trend”, e.g., it is featured in Amy Webb’s 2020 Tech Trends Report where she describes now what we have been working on for 8 years already.

Years: 2013 - 2020
Status: dormant
Domain: engineering

Type: PoC of MVP, various ML based modules, concept video for AAR earbuds, concept video for in-cabin AAR, MVP video for AAR headphones, 20 patents

My position: project and team lead
Collaborators: all FX team and some AI team members
This brief animation illustrates the effect that we use to determine if a user actually paid attention to a visual cue, such as a “Low Battery” indicator in the middle of the dash board (pinkish color alert, blinking). When it is urgent to get the driver's attention, and verify that the driver actually saw the indicator, the HMI control software can oscillate the brightness of this indicator, in the 1.0 - 2.5Hz range. If the driver pays attention to the indicator, their pupils will oscillate in sync with the brightness changes of the indicator. Illustration of PFT Method

This animation shows a potential use of the system in a vehicle with a full-windshield display. The technology to project outlines on a windshield exists, and works even in bright sunlight conditions. We experimented with windshield projection successfully in a different project. The point of this illustration is to show that a driver's active attention to the red square that oscillates at 1Hz can be verified by detecting the same oscillation frequency, 1Hz, of their pupils. Windshield display illustration

This animation shows a potential use of the system with car that has infrared detection of pedestrians and animals, and shows them on the driver's instrument cluster (visible through the steering wheel). In this illustration, the two targets have different oscillation frequencies, so the car would be able to tell from the pupil oscillations if the driver paid attention to the deer (1Hz), or the pedestrian (2Hz), or both (1Hz and 2Hz).<br><br>

		This illustrates that such a system allows for <i>simultaneous notifications</i> to be presented at varying frequencies of oscillating brightness. Because of that, it is capable of presenting multiple visual notifications, outlines, and other visual information, and can discern which of the multiple visual stimuli the user has attended to. Note that the color of the visual cue (e.g., red or orange) is not relevant, only the brightness changes. IC display illustration

Example data from our experimental setup: the user was instructed to look straight ahead for 30 seconds (illustration top row, left), and then look at an oscillating target slightly to their right for 30 seconds (illustration top row, right). Note that the user was not driving a car for this data collection step. The diagram in the middle row shows the pupil diameter of the user during the 60 seconds. At the bottom, the FFT of the pupil diameter data of the second period (t=30-60s) shows that the pupil oscillation has a spike at 1Hz which was not there in the first period, meaning that the user actively paid attention to the oscillating target in the second period.<br><br>

		Note that the pupil diameter of course changes for several other reasons, in particular because of increase or decrease of light. However, these environmentally triggered pupil diameter changes are filtered out by our approach, and would not interfere with the highly periodic changes that we introduce with the oscillation of a visual target. Example data
True User Attention Sensing (HARMAN)

Today's cars often alert their drivers visually, e.g., a low tire pressure indicator lighting up on the dash. However, a car cannot determine if the driver has actually paid attention to such signals. (And no, requiring the driver to acknowledge an alert by pressing a button, like it would be done on cellphones or laptops, is not an option since that would distract a driver from driving. And no, a driver having made eye contact with an indicator does not mean they also paid attention to it.)

We built a truly industry-first proof-of-concept prototype which can determine if a driver looked at a specific visual alert (e.g., low gas indicator), but then can also distinguish between the driver simply having glanced at it, vs. having actually noticed it. This project is based on published scientific findings of the Pupil Frequency Tagging effect (not invented by us), which determines a user's true attention to an alert by measuring physiological signals such as pupil fluctuations or brain signals. This method is super exciting because it goes beyond measuring that a user simply looked at an alert (which could relatively easily be done by measuring eye gaze direction), but requires the user to actively pay attention to the alert.

Our MVP software module only needs access to a car's DMS camera (which will become mandatory soon anyway) and an interface to the HMI controls. If the car uses warning lights and needs confirmation of the driver's true attention, the system oscillates the warning lights' brightness (separate frequencies for each), and at the same time looks for corresponding pupil oscillations. A more sophisticated product can use a head-up display (HUD), windshield projection, or instrument cluster display (IR, thermal, or night vision) which can “outline” objects ahead such as pedestrians or animals or any road obstacle, and oscillate the outline's brightness subtly. If the driver actually paid attention to the object outlined, the pupils would oscillate with the same frequency as the visual cue. The oscillating cue can even be located in the human peripheral sight and still cause pupil oscillations, as long as the person cognitively paid attention.

Our early stage exploratory PoC consists of a software module, a COTS DMS camera, and a mock IC display. In our work, we confirmed experimentally that the PFT effect exists, identified the optimal oscillation frequency, a minimum size for the target, the minimum acquisition time, and the amount of degradation of the PFT effect depending on the peripheral vision angle.

Year: 2019
Status: concluded
Domain: engineering
Type: early PoC, study, report, pending patent application
My position: team lead
Project lead: Evgeny Burmistrov
We created many prototypes that illustrate the UX of mid-air haptics in cars. These photos are from the system we installed in our test vehicle: it includes the transducer board (the keyboard-looking structure between gear shift and infotainment system), a custom trim to make the system integrate seamless into the interior, as well as a fully working GUI (partially visible on top) that supports the mid-air haptics effect for automotive specific gestures.<br><br>

			On the right side, there is a comparison between original trim (right) and our custom trim. On the right, middle, is the assembly that contains that driver board, fans, and electric components, that go below the transducer array; all that was provided by Ultraleap. At the bottom right, we see the system with the transducer board covered: one of the unique things about mid-air haptics is that the signal can pass through acoustically transparent material, such as the grille we show on this picture.<br><br>

			This system was demoed first to all HARMAN executives (including to Young Sohn, chairman of the HARMAN board, who acquired HARMAN for Samsung because of HARMAN's automotive technologies—such as this one), then literally to all OEMs, both in the Silicon Valley (where they have significant R&D labs) and in the Detroit area where all US OEMs are located. Automotive system

Our car was great to demo, but hard to transport quickly. So we built a table-top demonstrator which perfectly shows the effect of mid-air haptics in vehicles. We built multiple infotainment applications for this platform to demonstrate the experience. This demonstrator was shown widely, all around the world, from China to Korea to Japan to Germany to Italy to UK, and at many other international locations, as well as in the U.S.A. Portable demonstrator

This brief segment in our portfolio video (0:36 - 0:45) shows one use case of mid-air haptics. Here in this concept video, the haptic effect would be projected downwards from above. In our engineering prototypes, the effect comes from transducers placed below.<br><br>

		In this segment, we use a visual overlay (CG) to explain the mid-air haptic effect. It is guite difficult to visually show how the effect feels. This is simply something one has to experience first hand. The effect is quite unique, and somewhat unreal. Concept video

This slide summarizes the main points of HARMAN's mid-air haptics system: gesture interfaces need a feedback channel, and mid-air haptics is solving that problem. It is both eyes-free and ears-free, not adding to the driver distraction.<br><br>

			Beyond the obvious automotive use cases that reinforce gestures, we also worked on many applications and use cases which uses mid-air haptics in other ways, without gestures. All these efforts are covered and described by our patent applications. Overview slide

We also had prototypes showing mid-air haptics in the audio space. At CES 2016, we showed a speaker with gesture and mid-air haptics interface. It was a JBL Authentics L16 which was updated by Ultraleap (top left). Mid-air haptics would also be a great candidate for studio applications, such as 3D mixing consoles (upper right). When the transducers become small enough, we also envisioned putting them on headphones where gesture interfaces are starting to become available, but are obviously missing touchless haptic feedback (lower row). Audio applications
Mid-Air Haptic Feedback Systems (HARMAN)

This technology, based on an array of ultrasonic transducers, can create a haptic effect at a distance, which is a highly unusual sensation with tremendous product potential in a variety of domains, from automotive to AR/VR/XR to 3D interfaces to wearables to gaming, and many more. This effect, sometimes called "touchless haptics," can be used for many interaction scenarios. One of them is to make in-car gesture interfaces useful so that the driver does not have to check visually if their gesture was successful.

My FX team created various prototype systems, the most recent one integrated into our test vehicle. The core technology is by UltraLeap, a startup which we have worked with closely. My team developed the automotive interaction design, and closed the gap between the startup's basic modules and a Tier-1 automotive product. HARMAN announced this collaboration, and shortly after we were in discussions with a dominant German OEM about HARMAN becoming the provider of mid-air haptic automotive systems, integrated with the rest of the infotainment systems.

Beyond automotive applications, we also applied mid-air haptics to support gestures with speakers, and showed a system at CES 2016 (with interesting reviews).

Years: 2016 - 2018
Status: dormant
Domain: engineering

Type: full range research and development from early PoC to near product-level systems, many patents, multiple concept videos, various demonstrators (software and hardware), presented at OEM Technology Days (e.g., for Toyota, Alfa Romeo, JLR, Volvo, Renault, VW, GM, BMW), Auto Shows (e.g., Shanghai Auto Show, Geneva Motor Show), Consumer Shows (e.g., CES Las Vegas), invited paper

My position: project and team lead
Collaborators: all FX team members
At Google I/O 2016, ATAP's Dr. Ivan Poupyrev presented our JBL speaker with radar-based gesture control and adaptive lighting.<br><br>

		It was a big deal that Google showed the JBL brand and logo in full during the presentation (and financial news outlets reported that the presentation and JBL name dropping did have a positive impact on the HAR stock). Even more important was that the JBL demo was fully working, showing the micro gestures for volume change and next song, as well as the turn-off gestures, together with the adaptive light design that closes the interaction loop for the user.
		Google I/O

This Verge video talks about Soli, and at 00:56 it describes our JBL speaker. They emphasize that the big deal is not just the technology prototype, but the collaboration between the brands and eco systems. The Verge

This photo shows the core elements of the system: the micro controller which controls the lights and the audio (top), the acoustics section (loudspeakers and amplifier; middle), the radar based gesture sensor (bottom), and the LED assembly (also bottom). Engineering

This collage shows the final system as it was presented at Google I/O (top left), the development process at HARMAN's FX lab (top right), the intense collaboration process at Google's ATAP office (bottom right), a view into the packed speaker casing (bottom middle), and the CAD model showing the significant updates to the mechanical design of the JBL speaker. Design and collaboration
Gesture Enabled Speaker (Soli) (HARMAN and Google)

The FX team worked extremely closely with the Google Soli team for several months to create a working prototype of a speaker that allowed for interaction via micro and finger gestures. The loudspeaker was shown at Google I/O on May 20th, 2016.

HARMAN announced our collaboration a few days earlier. Arguably, HARMAN's stock price reacted quite positively to this collaboration: financial news outlets noted that HAR stock gained 4.81% right after our joint live presentation at Google I/O 2016.

This project was a co-development project between HARMAN's FX team, the Soli team at Google’s Advanced Technology and Projects group (ATAP), led by Dr. Ivan Poupyrev, and IXDS who did user research on the project. We integrated the Soli radar based sensing technology in a connected home speaker to enable people to control audio without touching a button, knob, or screen. This collaboration with Google ATAP not only resulted in integrating the high-performance radar-based gesture sensor into a JBL speaker, but also in designing a novel light-feedback based user interface to communicate with the user. We also got two patent grants, for HARMAN and Google-HARMAN joint patents for several aspects of this work, enabling and facilitating productization.

Year: 2016
Status: concluded
Domain: engineering

Type: several demonstrators, video from Google I/O 2016, granted patent (joint with Google), granted design patent

My position: team lead
Project lead: Joey Verbeke, Davide Di Censo
This video shows the working system of our eye gaze and eye vergence controlled transparent display demonstrator. The video does not show a concept, but an actual engineering demonstrator: all parts are working exactly as shown in the video (no CG or editing). We detect eye gaze directions separately for each eye (which was possible in an early version of the Tobii SDK), and then could create the interactive demo which not only registers eye contact on the display, but also when the user would focus <i>behind</i> the display (not changing eye gaze direction).<br><br>

		The transparent display we used was simply back-projected glass, and the glass pane was coated with a foil that increases the visibility of back-projection. We also experimented with other existing transparent display technologies, such as a transparent LCD display panels from Samsung and LG; our version, however, had the best visibility. In the following years, we also collaborated with startups with more sophisticated windshield projection methods: they were using customized projectiors and transparent fluorescent screen coating to create projections that can be visible in bright sunlight in a car. Video of PoC

The granted patent describes in detail the methods we used, and the above visuals are directly from our engineering prototypes.<br><br>
			After this project concluded, we continued working on eye gaze and eye vergence efforts, based on our early expertise in that domain. E.g., one of our efforts was detecting a driver's interest in billboards on the road side (in the advertising domain). Another effort was detecting a driver's focus on volumetric content on an in-cabin 3D display. We also continued using eye gaze and vergence in a multitude of DMS prototypes and efforts leading to products, such as detecting a driver's “daydreaming” behavior (by determining if they actually focus on the vehicle or object in front of them, or looking through them). Eye vergence patent

We developed many patents and systems that use eye gaze and eye vergence detection: shown above are a few. Other Eye gaze patents
Click any thumbnail for more details!
Eye Vergence and Gaze Sensing for controlling transparent displays (HARMAN)

In vehicles with head-up displays (HUDs) or windshield projection displays, our system not only measures the driver's eye gaze direction, but also their vergence (eye focus level), allowing the user interface of the car to adapt instantly and context-aware to the driver's needs, without the driver having to do anything specific (or even be aware of the system). The video on the left shows our working system, and how it is used.

This was an early prototype, but based on the expertise gained, we kept developing many more systems afterwards. E.g., one of our efforts was detecting a driver's interest in billboards on the road side. Another effort was about using eye gaze to honk at a specific vehicle or person (directed alerting). We also continued using eye gaze and vergence in a multitude of DMS prototypes and efforts that are incorporated in current and upcoming HARMAN products: from detecting a driver's “daydreaming” behavior (by determining if they actually focus on the vehicle or object in front of them, or looking through them); to eye gaze controlled in-cabin projection systems (important information shows up inside the cabin right beside where the driver is currently looking at); to eye gaze enabled navigation systems (system monitors the driver's eye gaze and can refer to the objects they are looking at to give super contexualized navigation guidance). In a very large recent project, we even use eye motion data to calculate a driver's stress and cognitive load. Eye gaze and vergence is also used in many of our AR/VR/XR related systems, e.g., to enable super enhanced conversations across crowded rooms triggered by mutual eye contact.

Year: 2013 (this prototype)
Status: concluded
Domain: engineering
Type: demonstrator, video, granted patent (this system)
My position: team lead
Project lead: Davide Di Censo
These are some of our early exploration prototypes for “head-mounted ungrounded force actuators”. Each of them is using a different method, which we rapidly prototyped to experience the UX directly. In particular with such systems it is all about how the user actually perceives these unusual forces; they are hard to simulate and imagine (or to explain), but they are easy to understand when experiencing working prototypes. Early PoCs

Our Method A is about dynamically changing the center of gravity (CoG) of a headphone or headset (such as AR/VR headset). It is done either by re-distributing fluids to the left or right side of the user's head, essentially pumping it from one small container to another (illustration on the left), or by mechanically moving weights on the user's head along rails, or on hinges (illustration on the right).<br><br>

			The effect is that an originally balanced headphone or headset would then weigh heavier on one side of the user's head than the other. As a result, the user may turn their head or body to even out their own center of gravity. As they do this, they turn to face a specific direction, or tilt their head to a particular angle, thereby drawing the user’s attention to a particular vector. This gentle change of center of gravity is perceived by a person's vestibular sense (or equilibrioception), located in the inner ears. The method does not rely on seeing or hearing, and creates an alternate communication channel between device and user.<br><br>

			The purpose of such products would be to provide feedback, guidance, or directional attention focusing. In one application, the user is gently being “nudged” into an intended direction, i.e., as part of a pedestrian navigation system. In another application, the user’s attention is focused towards a certain direction, i.e., as part of a system that alerts the user of danger from a certain direction. In yet another application, the system is part of a virtual personal assistant (VPA) which may gently point the user towards a certain direction of interest.<br><br>

			The forces required to create this effect need not be large, and in fact could be so small that that effect is only subconsciously perceived by the wearer.<br><br>

			The illustrations above look geeky (and exaggerated), but in a product, the actuator would be hidden from sight and highly miniaturized. Method A: shifting CoG

This short demo video shows a very early PoC of the Center-of-Gravity shifting PoC we built. A small mass on top of the headphones can dynamically be positioned along a rail in order to change the Center of Gravity (CoG) of the headphones (and the user).<br><br>

		This slight imbalance may be enough for a person to tilt their head. Depending on the actual weight that is shifted, the forces involved would be very small, and may not necessarily be perceived consciously by the wearer; it would still have the intended effect, though, particularly if the person is not using visual cues (such as walking in an open area while reading a book). If the shifted mass is more than a few grams, the imbalance would become more obvious to the wearer, and they would understand it to be an intentional cue from the device to change the direction of walking or to pay attention to a specific side.<br><br>

		Obviously, on a product level the system would use near-silent linear actuators to avoid audible noises during shifting of the weight. CoG prototype

This animation above shows the concept behind our Method B, “asymmetric linear acceleration of multiple masses.” A system will create the perception of a constant linear force in one direction (unidirectional) by accelerating multiple small masses along parallel rails asymmetrically and with slightly different phase timing. A strong acceleration and deceleration is generated for a very brief period in the desired direction, while a weaker acceleration and deceleration is generated over a longer period of time in the opposite direction.<br><br>

			Because of the nonlinearity of the human perceptual system, each mass generates a perceivable force only in a fraction of the motion cycle (in the green area on the left where strong acceleration and strong deceleration occurs). When multiple masses on parallel rails are combined, they create the appearance of a single continuous force. Since the masses need to return to their origin to restart the cycle, the return trip is using acceleration and deceleration forces which are below the human threshold (salmon colored area on the right where weak acceleration and deceleration occurs).<br><br>

			The result is that people's perception is “tricked” into perceiving a unidirectional constant linear force, aka, something or someone invisible attempting to “pull” or “push” them. In the above illustration, the effect is just along a single axis, but we also designed systems which create such “fictional” forces in 2D and 3D space.<br><br>

			The force is actually not fictional, but real. Because humans more strongly feel rapid acceleration than slow acceleration (due to the nonlinearity of human vestibular perception), in our system people do only perceive the force in one direction and not the other due to the asymmetric acceleration/deceleration cycles.<br><br>

			(If the acceleration and deceleration cycles were symmetric in both directions, the effect would be a strong vibration, or rattling, like a broken piece of machinery, which would be a highly unpleasant experience. If only a single rail would be used, even if the acceleration/deceleration were asymmetric, the effect would be an intermittent unidirectional force, which would be perceived as a kind of “poking,” or “hammering”: not a pleasant perception either.)<br><br>

			We have a granted patent on the above described system. The effect it is based on has been described in scientific publications referenced in our patent. Method B: asymmetric accelerations

Our Method C is using air or gas flow to generate forces on the user. Although the core effect is intuitive and obvious to understand, we came up with quite sophisticated embodiments for our patent.<br><br>

			Top row: in the simplest engineering embodiment (for which we have working PoCs), one or several ducted fans are attached to headphones or head gear. The orientation of these ducted fans determines what kind of forces (DoFs) can be applied to the wearer. Note that the ducted fans in these CAD models are oversized to make the effect obvious. In a product, the fans would be miniaturized, and relatively inaudible: the forces that can be generated even by miniaturized or MEMS turbines are by far strong enough to make the wearer become aware of them.<br><br>

			Bottom row, left: the illustrations labeled Fig. 6A to 6G come directly from our granted patent, and show simple assemblies and which forces they would exert on the wearer. (Again, the turbines and fans shown are oversized for clarity purpose.) Some of the embodiments would use ducted fans that can pan/tilt, or have vanes attached to them, for thrust vectoring.<br><br>

			Bottom row, middle: one quite practical engineering approach would be to use the equivalent of a micro quadcopter attached to the top of the headphones. This configuration allows the same 4 DoFs that today's quadcopters already possess (roll, pitch, yaw, vertical linear), applied to the wearer's head. This solution is technically highly mature, and practical. (What is not shown is how such a propulsion method would be covered in protective encasing to prevent the blades from being exposed.)<br><br>

			Bottom row, right: further embodiments include mounting the propulsion units on a user's shoulders, wrist watch, or belt. Each of them affords specific types of DoFs, which can be used for a variety of applications. And yes, they look like super hero gear, because they are shown oversized...<br><br>

			In our patent, we expand on the thermal and noise management of such systems, which obviously would have to be considered. (ANC for micro ducted fans are practical.) Also, instead of rotary engines (fans and turbines), we describe the use of micro jets based on compressed air, which reduces the amount of moving parts significantly. Method C: air flow
Nudging Headphones and AR/VR/XR gear (HARMAN)

This effort is about creating PoCs for products that give the user the feeling of being gently “pushed” (or pulled) in a specific direction. This effect can be almost subconscious, and often feels quite unreal. Such gentle "nudging" can be used for headphones, Hearables (aka, smart ear buds), AR/VR/XR gear, and all kinds of wearables. Applications include pedestrian navigation systems, directional warnings and alerting (e.g., used by a VPA), gaming, and many uses for AR/VR/XR. We worked on 3 different methods:

Method A: Dynamically change the center of gravity on headphones by shifting weights mechanically or pneumatically to the left, right, or front/back. This method does work, is simple, but is not particularly sophisticated (which should not be a reason against productization, to the contrary!)

Method B: Use linear asymmetric acceleration of multiple small masses: this method is non-trivial to explain, so please refer to the illustrations on the left. The effect is amazing, and feels pretty unreal. The mechanical elements needed are not simple, though, and make it likely more expensive as a product.

Method C: Use miniaturized fans and jets "mounted" on the user to nudge them in the desired direction. the effect comes from air and gas flow from axial fans and similar propulsion systems such as propellers, ducted fans, micro and MEMS turbines, and micro propulsion systems.

For all these methods, we filed patents, and two of them got granted so far.

Year: 2016
Status: concluded
Domain: engineering

Type: multiple early PoC, 3 patents: Fan-Driven Force Device (granted), Center of Gravity Shifting Device (pending), Pseudo Force Device (granted)

My position: project and team lead
Collaborators: Tanay Choudhar, and all FX team members
This brief video shows a working system of a gesture interactive sound system in a mock car cabin. Since the video cannot represent full spatial sound, we used JBL Pulse speakers which visualize the sound level (aka, they light up when they play sound). Joey is grabbing and moving the sound from one corner of the cabin to the other. One use case for this method would be to move an incoming (or ongoing) phone call from one seat to another.<br><br>

		This PoC is using a gesture sensor (Kinect) to detect the grab and release gestures, as well as the hand position in 3D space. The music that is playing follows the hand when grabbed and moved around. The user can grab and then “drop” a music or audio source anywhere, even in the back seat. As such, the user does not need to fully reach out to the passenger or back seat, but can grab the sound at some distance. Furthermore, the user is even able to “throw” a sound towards a different seat or person. Video of working PoC

Early concept video showing the interaction method in a desktop or headphones setting. Concept is that the user listens to music, and when a call alert comes in, the user swipes the music to the side, and then pulls the caller to the middle of the sound field. In order to terminate the call, the user pushes the audio of the caller to the side, and pulls back the music to the middle. Brief concept video

A preliminary product sheet for our in-cabin system. Preliminary product sheet

We applied the same interaction concept to headphones. Our working PoC consists of headphones with a forward facing Leap Motion controller mounted on top of the headband. During the demo, the user can grab individual sounds that are playing in their sound field and relocate them by grabbing, moving, and releasing them.<br><br>

			The user's hands are tracked in 3D space, so the sounds can be moved left-right, but also forward-backward, and even up/down. Obviously, the headphone's capability to render sounds above or below the eye level of the user are limited. (3D sound rendering in our headphones prototypes was not part of our R&D effort; we used COTS tools such as RealSpace 3D Audio. After we concluded our project, HARMAN did acquire OSSIC, which provides superb 3D sound rendering capability. Their technology is currently applied to HARMAN gaming headphones.)<br><br>

			Today, some AR gear comes with (limited) gesture sensing, but our work was done years earlier and is an industry-first, hence our granted patents. Also, our solution does not <i>need</i> a display, so works perfectly with headphones. Gestures to move sounds on headphones
Bare-Hand Gesture Control of Spatialized Sounds (HARMAN)

We developed multiple proof-of-concept prototypes for products which allow a user to move sounds (or “sound objects”), such as alerts or music, with their bare hands. We applied this interaction method to automotive in-cabin interaction, as well as to headphones, AR/VR gear, and wearables in general. This is a great example for a unified interaction method which allows the same intuitive sound control for when the user is in their car, or is wearing headphones, or is using AR/VR gear, smartphones/tablets, or laptops. Note that the user is moving sounds, not visible objects, so there is no display involved. (Our 3D display and gestures combines them.)

Our in-cabin PoC is shown in the video on the left. A preliminary automotive product was called GESS, “Gesture Enabled Sound Space” (early product sheet). Our interaction system complements well HARMAN's Individual Sound Zones product.

In parallel to these automotive efforts, we also worked on systems for headphones: our working PoC has a sensor on the headband that tracks the wearer’s hands and gestures. This allows the user to use gestures for typical media control (volume up/down and next song), but more importantly, bare-hand control over the spatial distribution of sound events in their sound scape.

Years: 2013 - 2015
Status: concluded
Domain: engineering

Type: early demonstrators, brief demo video, preliminary product sheet, 2 granted patents: wearables, automotive, invited paper

My position: project and team lead
Collaborators: Davide Di Censo, Joey Verbeke, Sohan Bangaru (intern)
The concept of our See-Through-Dash system: to give the driver a better visual awareness of the vehicle’s environment, in particular close and low, we expand a driver's view through the windshield downwards through the instrument cluster.<br><br>

			What is not visible in this animation is that the instrument cluster display is 3D, and the camera system is stereoscopic (dual imagers), making the road appear far out (where it actually is), not on the display surface. Concept

This video shows our fully working PoC prototype for a See-Through-Dash system.<br><br>

		We created dual cameras on a robotically actuated rail, with each camera having full pan-tilt capabilities. The purpose is to adjust the perspective to the user's head position and orientation in real-time. E.g., when the user moves sideways, the cameras have to adjust their perspective. Same when the user turns their head, or looks up or down: the camera perspective needs to adjust to these changes.<br><br>

		On the display side, we originally used a zSpace 3D display which renders our dual cameras in real-time, creating a true depth effect: on the display, the background is set back, showing it not on the display surface, but further out. (Obviously this video cannot show the 3D effect, since the user needs to wear glasses.) In a production system, the 3D display would be auto-stereoscopic (no glasses), and we experimented with a variety of such displays. We do track the user's head location and orientation in 3D space with high resolution to drive the servo motors. Video of PoC

Although our PoC uses robotic cameras (which looks cool and shows the syncronization between driver head and cameras well), for a product, it is more practical to use a camera array mounted on the bumper which is solid state (no moving parts). The above illustration shows the two solutions. System components
See-Through Dashboard (HARMAN)

The purpose of such a product would be to give a driver a better visual awareness of their vehicle’s environment, in particular close and low. It expands a driver's view through the windshield downwards through the instrument cluster (wider vertical view) using a 3D-capable instrument cluster display and robotically actuated stereo cameras (or a solid state camera array) mounted on the car's bumper. It creates the user experience of a "transparent cockpit," and makes driving and parking safer.

We built proof-of-concept prototypes that have mechanically actuated cameras, view dependent rendering, and stereoscopic 3D rendering on zSpace displays.

Year: 2014
Status: concluded
Domain: engineering
Type: PoC demonstrator, demo videos, 2 granted patents: one, two, invited paper
My position: team lead
Project lead: Davide Di Censo
This concept video from 2013 shows the UX of a volumetric display in a car cabin. It illustrates nicely the interaction a driver can have with a display which is able to use the space between the steering wheel and the face of the driver. We got multiple patents granted for these interaction methods.<br><br>

		The video shows very nicely the following use cases:<br><br>
		1. <b>Alerts</b>: Important information is moved towards the user, such as a speeding alarm. The idea is that important information bubbles to the top of the display, becoming more pertinent to the driver.<br>
		2. <b>Navigation</b>: Navigation instructions, such as a turn arrow, is moving closer to the driver when the turn is coming closer. Note that she does not need to look at it directly: the system works also when the user is looking at the road (which she should), and the arrow is in the peripheral view of the driver.<br>
		3. <b>3D Gesture Sound</b>: After she turns on the music, a phone call comes in. It is shown as a ringing phone, hovering over the steering wheel. She then swipes the music to the side to turn it down. In order to accept the call, she grabs the phone icon and moves it towards her ear, a gesture similar to picking up a phone. Since the display can render the icon anywhere in space, it will follow her had.
		Concept video (2013)

This brief video shows our truly working engineering prototype of the in-cabin AR volumetric display that we showed in our concept video. It is using a modified Hololens, which we made work even inside a moving vehicle.<br><br>

		The goal of this system was to re-create a UX as close as possible to the concept video, including the multiple layers of information stacked onto the steering wheel, and how pertinent information bubbles to the top of the stack. This video was recorded through the Hololens, so is exactly how people perceive it when wearing the HMD. The demo's functionality includes starting and stopping music (including an actual sound wave visualization), listening to a voice mail, etc., all gesture controlled.<br><br>

		The video shows that our system also works when the driver is still outside the car, looking through the driver window. That is because we modified our Hololense to detect certain markers on the steering wheel to anchor the AR content. In fact, our demo consisted of multiple Hololenses, all of them based on the same live 3D AR content, but from their respective perspectives. This means not only the driver may see the AR content, but also the passengers (if they wish so).<br><br>

		Obviously we do not expect people to wear a Hololens in their car. We anticipate that light-weight AR glasses will become commercially available soon. Such AR glasses could be brought-in (as a consumer device), or even come with the car and be tethered to it, making them almost as light as passive sunglasses (just the light engine: no compute, no battery, no gesture sensor; in cars, the gesture sensor is likely built into the cabin).
		PoC using HMD (2017)

This video shows our second working prototype of an engineering system for a full cabin AR capable system, based on HMDs.<br><br>

		This system, though, goes way beyond the original concept video: it shows how to connect the in-cabin AR space with the actual outside world, expanding into an industry-first “inside-outside AR navigation solution.” Core idea is to show the UX of a 3D navigation system which starts inside the cabin (like a typical navigation system does), but then moves the navigation instructions seamlessly into the real world. This is shown at 00:20 when the small red arrow smoothly moves from the overhead view of the virtual miniature city into the real world, onto the real road.<br><br>

		This very unique UX is possible because our HMD based AR system is superior to pure instrument cluster (IC) displays, even if they would be 3D capable (which most of them aren't): our solution easily combines, and even surpasses, all functionality of 3D IC displays <i>and</i> AR capable HUDs. Our system can create AR content <i>inside</i> the cabin (literally anywhere inside the cabin), but also <i>outside</i>, in the real world, with unlimited field of view (making AR HUDs look ancient). Even more, our system can render AR content that <i>seamlessly moves between these two views, a truly magical and super intuitive UX</i>.<br><br>

		As input method, we expanded from gestures to voice interaction, making a truly multi modal system.<br><br>

		Note that our car was not driven, for safety reasons, but the system was fully working, and was tied into GPS and would work while driving.
		PoC using HMD (2018)

Before we worked with HMDs (which show our UX of a volumetric display perfectly), we did work with stereoscopic displays, meant to directly replace the current instrument cluster displays. This video shows an early PoC of the HMI (the actual user interface) which runs on a 3D display. (This video of the system obviously is not 3D.) The purpose was to experiment with the UX of a 3D display. The 3D display allows for view-dependent rendering, which means when the user changes their perspective, the system adjusts the view of the 3D content. This, together with the stereoscopic rendering, makes the user experience very realistic. It does look like a pseudo hologram, with full color and in full motion.<br><br>

		The content we show has two layers: one of them the gauges (in the front), which show speed and gear. In the back, we see a 3D navigation system. Even though the UI looks crammed and possibly confusing with flat rendering, it is not confusing when watched on a volumetric 3D display.<br><br>

		We did get a granted patent for the data visualization method shown in this video.
		Early PoC of HMI on 3D display (2013)

This slides shows an overview over the project's goals and our early solutions. Overview slide
Pseudo-Holographic Instrument Cluster with Gesture Control (HARMAN)

This project is about the user experience of in-cabin 3D capable displays, combined with gesture control. Such systems can use the space between the steering wheel and driver to dynamically render contextual information. One example is to move important alerts to the focus of the driver (closer to the driver's face) to get her attention.

At the beginning was a concept video to show the novel user experience. In multiple engineering iterations, we created working systems, first based on auto-stereoscopic displays, then based on head-mounted AR gear. Our Hololens based working prototypes show the UX of the concept video very well, but we had to adapt the HMD hardware to the moving vehicle situation.

The most likely path to productization would be via light-weight AR glasses, either brought-in (as a consumer owned device, like a smartphone), or glasses that are part of the car (tethered, which would make them super light-weight since neither battery nor compute need to be on the glasses, just the light engine).

Our sister project Bare-Hand Gesture Control of Spatialized Sounds has more details on the gesture interaction technology.

Years: 2013 - 2018
Status: dormant
Domain: engineering

Type: multiple demonstrators, concept video, multiple demo videos (2013, 2017, 2018), granted patent, invited paper

My position: project and team lead

Collaborators: Davide Di Censo, Joey Verbeke, as well as interns Brett Leibowitz, Ashkan Moezzi, and Raj Narayanaswamy

This video shows our proof-of-concept 3D pen which allows a user to write and draw anywhere, including in mid air (shown at around 1:30). It uses a video see-through tablet and tracked pen and display. This video shows also a bit of the background of this project: infrastructure needed, etc. To emphasize, this is a fully working system, not a mock-up, and the video shows exactly what the system's capabilities are.<br><br>This demo was created by my Future Concepts and Prototyping team at the HP/Palm R&D Center in Sunnyvale, California, in December 2011. Video of demo

These photos of a mock-up system show the user’s point of view, their User Experience: how reality augmentation renders the result of the writing on the display only. The user just wrote “Hello” (without real ink), and the display renders the word exactly where it was written.<br><br>Important to keep in mind is that with this system, the writings or drawings do not need to be limited to the surface the user is looking at, but can extend above and in any direction, beyond the surface. Mockup of interaction

From a side perspective, the writing is not visible. Side view

Beautifully traced photo from above. Patent drawing
This patent drawing shows that once the 3D object is drawn (top), the user can look at it from various perspectives (below). Patent drawing
3D Pen: augmented reality writing behind tablet (HP/Palm)

Wouldn't it be great to draw and write in mid air? With this project, we explored the interaction design and UX engineering of a pen capable of writing on surfaces and in air, as seen by the user via video see-through device (aka, any tablet these days). This results in a three-dimensionally capable pen, which can create persistent drawings on surfaces and, more importantly, in 3D space. The user (or multiple users) can see these drawings using a tablet or smart phone, using them as a Mobile AR device (MAR). A fiducial marker anchors the virtual rendering in the real world, making is persistent. An additional feature was that the pen could pick up colors before drawing (a feature inspired by I/O Brush).

Our prototype, as seen in the video, worked surprisingly well, and the UX was excellent. For our prototype system to work, we did require external sensors (OptiTrack) to track the pen and the tablet. We believe that was an acceptable compromise for an initial prototype. The goal was, though, to replace these external sensors with on-device sensors for a production device. We do have a granted patent on this system: the patent process took 6.5 years, though!

Year: 2011
Status: concluded
Domain: engineering
Type: fully working demo, video of demo, granted patent
Position: project and team lead
Collaborators: Davide Di Censo, Seung Wook Kim
This video shows a working prototype of a Ghost Fingers prototype. It demonstrates the UX of a novel interaction method that enables efficient and intuitive switching between keyboard and multi-touch input on systems where the display is out of arm's reach. Note that in this video, the display would be close enough to directly reach it; in an actual product, the display would be far away that it cannot be reached directly, e.g., a TV or large monitor a bit out of reach.<br><br> This work was done at HP/Palm in 2011 by my Future Concepts and Prototyping team. Video of demo

This video shows the finger visualization for Ghost Fingers, how they shine through translucently.<br><br> This work was done at HP/Palm in 2011 by my Future Concepts and Prototyping team. Visualization video

Prototype system: standard QWERTY physical keyboard equipped with an image sensor (e.g., webcam), able to detect the position of one or two hands over the keys. In text-input mode (left), the keyboard works like a normal keyboard. When user presses designated key (e.g., CTRL key), the system switches to “multi-touch mode” (right). In this mode, the sensor detects the user’s fingers on the keyboard. Simultaneously (in real-time), a highly transparent image of the user’s hands (including fingers) is displayed on the remote display (e.g., overlaid over the UI), resulting in “ghost fingers” or hand outlines. These ghost fingers or hand outlines allow the user to easily manipulate the UI on the display, e.g., pressing an icon on the screen. Typical use case

This figure illustrates the bi-manual simultaneous text and multi-touch input mode with partial Ghost Fingers. The left hand is in multi-touch input mode, manipulating the cards. The right hand is in text-input mode, typing text into an email message window.  Note that the user only sees her left hand (in multi-touch input mode) superimposed on the GUI. The right hand is not visualized, since it is not necessary for the user to see this hand. Bi-manual use case

The two patent applications have a large amount of beautiful illustrations. Patent drawing

Poster that accompanied the paper publication. Ghost Fingers poster
Ghost Fingers (HP/Palm)

For our project "Ghost Fingers" we created multiple working prototypes for a novel interaction method that enables efficient and intuitive switching between keyboard and multi-touch input on systems where the display is out of arm's reach: check out the videos on the left for how this actually looks and feels.

Ghost Fingers provided a translucent real-time visualization of the fingers and hands on the remote display, creating a closed interaction loop that enables direct manipulation even on remote displays. One of our prototypes consisted of a wireless keyboard with attached micro camera facing the keys, and which was used for two purposes: first, to determine the position of the user’s hand and fingers on the keyboard; and second, to provide a real-time translucent overlay visualization of hand and fingers over the remote UI.

Our prototypes worked very well, and the user experience was quite amazing. We wrote a paper about the system, but not before patenting all novel elements in two separate patent applications. The paper got accepted for publication, and both patents got granted, yay! The patent applications were sold to Qualcomm, as part of a large transfer of IP. However, even though the two patents got granted by the USPTO (you can look this up on Public PAIR), Qualcomm decided to abandon them by not paying the fees. I contacted Qualcomm's lawyers about this unusual step, and they said it was simply a business decision. Too bad! On the positive side, this IP is now publicly available, and is not protected, so a forward thinking company could create a product based on the descriptions in the two patent applications.

Year: 2011
Status: concluded
Domain: engineering
Type: working demos, video 1, video 2, paper, poster, patent 1, patent 2 (both granted, but inactive)
Position: project and team lead
Collaborators: Seung Wook Kim, Davide Di Censo
This brief video shows our early PoC for Augmented Glass. This was an exploratory project about the interaction design, ergonomics, and usability of thin transparent handheld displays. It was inspired by our vision that a truly useful tablet device should be optical see-through, in order to accommodate super realistic and immersive mobile AR experiences. Our work was about exploring what kind of transparent handheld display was possible, with as little bezel (frame) as possible. We created mechanical prototypes of tablet sized glass panels (sheet of glass), modified with a coating that allowed back-projection from a tiny portable projector (pico projector).<br><br> This work was done by my Future Concepts and Prototyping team at the HP/Palm R&D Center in Sunnyvale, California, in November 2011. Video of PoC

Note the rapid prototyping method using Meccano kit parts. Photos of PoC

Obviously keystoning is an important element of a pico short throw projector module (not implemented yet).
Augmented Glass (HP/Palm)

The project "Augmented Glass" was an exploratory project about the interaction design, ergonomics, and usability of thin transparent handheld displays, a potentially whole new product category that could transcend tablets. It was inspired by our vision that a truly useful tablet device should be optical see-through (like seen in many Sci-Fi movies from Avatar to Iron Man), in order to accommodate super realistic and immersive mobile AR experiences. Our work was about exploring what kind of transparent handheld display was possible, with as little bezel (frame) as possible. Also, how would one hold such a thin slab of transparent material (handle). We created mechanical prototypes of tablet-sized glass panels (sheets of glass), modified with a coating that allowed back-projection from a tiny portable projector (pico projector). Two cameras would be included eventually, one pointing forward (scene camera), one towards the user to track their face and do view dependent rendering.

Year: 2011
Status: concluded
Domain: engineering
Type: early PoC, brief video
Position: project and team lead
Collaborators: Davide Di Censo, Seung Wook Kim
iVi concept (2008) Concept
This video shows our fully working iVi prototype from 2009: all elements are as shown, no CG. Early demo video
iVi: Immersive Volumetric Interaction (Samsung R&D)

This is one of the larger projects that I created for the HCI research team. The core idea was to invent and create working prototypes of future consumer electronics devices, using new ways of interaction, such as spatial input (i.e., gestural and virtual touch interfaces) and spatial output (i.e., view dependent rendering, position dependent rendering, 3D displays). Platform focus was on nomadic and mobile devices (from cellphones to tablets to laptops), novel platforms (wearables, AR appliances), and some large display systems (i.e., 3D TV). We created dozens of prototypes (some described below) covered by a large number of patent applications. A very successful demo of spatial interaction (nVIS) was selected to be shown at the prestigious Samsung Tech Fair 2009. One of our systems with behind-display interaction (DRIVE) got selected for the Samsung Tech Fair 2010. Yet another one used hybrid inertial and vision sensors on mobile devices for position dependent rendering and navigating in virtual 3D environments (miVON).

Year: 2008 - 2010
Status: concluded
Domain: engineering, management
Type: portfolio management
Position: project and group leader

Collaborators: Seung Wook Kim (2008-2010), Francisco Imai (2008-2009), Anton Treskunov (2009-2010), Han Joo Chae (intern 2009), Nachiket Gokhale (intern 2010)

This video shows a research prototype system that demonstrates a behind display interaction method. Anaglyph based 3D rendering (colored glasses) and face tracking for view dependent rendering creates the illusion of dices sitting on top of a physical box. A depth camera is pointed at the users hands behind the display, creating a 3D model of the hand. Hand and virtual objects interact using a physics engine. The system allows users to interact seamlessly with both real and virtual objects. <br><br>
		This video, together with a short paper, was presented at the MVA 2011 conference (12th IAPR Conference on Machine Vision Applications, Nara, Japan, June 13-15, 2011).<br><br>
		Please note that this video is not 3D itself: the virtual content appears to the glasses wearing user as behind the display, where (from his perspective) they sit on a physical box. Video of interaction

Concept illustration of reach-behind, in AR game context Concept
DRIVE prototype based on optical see-through display panel Prototype
Some figures from the DRIVE patent application Patent drawings
DRIVE: Direct Reach Into Virtual Environment (Samsung R&D)

This novel interaction method allows users to reach behind a display and manipulate virtual content (e.g., AR objects) with their bare hands. We designed and constructed multiple prototypes: some are video see-through (tablet with an inline camera and sensors in the back), some are optical see-through (transparent LCD panel, depth sensor behind the device, front facing imager). The latter system featured (1) anaglyphic stereoscopic rendering (to make objects appear truly behind the device), (2) face tracking for view-dependent rendering (so that virtual content "sticks" to the real world), (3) hand tracking (for bare hand manipulation), and (4) virtual physics effects (allowing completely intuitive interaction with 3D content). It was realized using OpenCV for vision processing (face tracking), Ogre3D for graphics rendering, a Samsung 22-inch transparent display panel (early prototype), and a PMD CamBoard depth camera for finger tracking (closer range than what a Kinect allows).

This prototype was demonstrated Samsung internally to a large audience (Samsung Tech Fair 2010), and we filed patent applications. (Note that anaglyphic rendering was the only available stereoscopic rendering method with the transparent display. Future systems will likely be based on active shutter glasses, or parallax barrier or lenticular overlays. Also note that a smaller system does not have to be mounted to a frame, like our prototype, but can be handheld.)

Year: 2010
Status: concluded
Domain: engineering
Type: demo, paper (MVA2011), paper (3DUI2011), video
Position: project and group leader
Collaborators: Seung Wook Kim, Anton Treskunov
This video shows a research prototype that demonstrates new media browsing methods with direct bare hand manipulation in a 3D space on a large stereoscopic display (e.g., 3D TV) with 3D spatial sound.<br><br>This demo was created at the Samsung Electronics U.S. R&D Center in San Jose, California, in August 2010.<br><br>(Note that this video uses monoscopic rendering. The actual demo renders 3D content stereoscopically, and the user wears active shutter glasses to experience the depth rendering. Similarly, the sound of this video is normal stereoscopic sound, where as the original demo uses 7.1 spatial sound.) Video of demo

Interactive 3DTV UI (2010) Prototype
System setup, with perceived media wall (first scene). The media wall System setup
Spatial Gestures for 3DTV UI (Samsung R&D)

This project demonstrates new media browsing methods with direct bare hand manipulation in a 3D space on a large stereoscopic display (e.g., 3D TV) with 3D spatial sound. We developed a prototype on an ARM-based embedded Linux platform with OpenGL ES (visual rendering), OpenAL (spatial audio rendering), and ARToolKit (for hand tracking). The main contribution was to create multiple gesture interaction methods in a 3D spatial setting, and implement these interaction methods in a working prototype that includes remote spatial gestures, stereoscopic image rendering, and spatial sound rendering.

Year: 2010
Status: concluded
Domain: engineering
Type: demo, video, report
Position: project and group leader
Collaborators: Seung Wook Kim, Anton Treskunov
This video shows a series of nVIS research prototypes and interaction methods, all designed and implemented by the CSL HCI team at the Samsung Electronics U.S. R&D Center in San Jose, California.<br><br>
		This video was presented at the exhibition at the ACM SIGCHI Conference on Human Factors in Computing Systems 2010 (CHI 2010).<br><br>
		The HCI team members were Seung Wook Kim, Anton Treskunov, and Stefan Marti. This project was affiliated with the Samsung Advanced Institute of Technology (SAIT). Complete demo video

Close up of some of the tiles: view-dependent rendering on non-planar display space. Rendering with asymetric view frustum makes sure that the user experiences the virtual environment in a seamless way, regardless of the position and orientation of the display tiles. Curved displays
The system consists of static and even handheld (mobile) tiles, which render the 3D content spatially correct from the user's perspective (position dependent rendering). The handheld display is tracked in 6DOF to accomplish that, so the user can choose any view and perspective. Mobile & static
A walk-in version of the system was also created, using three 70-inch vertical displays. Walk-in sized
nVIS: Natural Virtual Immersive Space (Samsung R&D)

This project demonstrates novel ways to interact with volumetric 3D content on desktop and nomadic devices (e.g., laptop), using methods like curved display spaces, view-dependent rendering on non-planar displays, virtual spatial touch interaction methods, and more. We created a series of prototypes consisting of multiple display tiles simulating a curved display space (up to 6), rendering with asymmetric view frustum (in OpenGL), vision-based 6DOF face tracking (OpenCV based and faceAPI), and bare hand manipulation of 3D content with IR-marker based finger tracking. The system also shows a convergence feature, by dynamically combining a mobile device (e.g., cellphone or tablet) with the rest of the display space, while maintaining spatial visual continuity among all displays. One of the systems includes upper torso movement detection for locomotion in virtual space. In addition to desktop based systems, we created a prototype for public information display spaces, based on an array of 70-inch displays. All final systems were demonstrated Samsung internally at the Samsung Tech Fair 2009.

Year: 2008-2010
Status: concluded
Domain: engineering
Type: demos, videos, technical reports [CHI 2010 submission], patents
Position: project and group leader
Collaborators: Seung Wook Kim, Anton Treskunov
This video shows a collection of novel concepts for mobile interaction with virtual environments such as games, augmented reality, and virtual worlds (e.g., Second Life). These interaction methods can be used for cellphones, tablets, and other handheld devices.<br><br>
		The original concepts were created by the HCI team at the Samsung Electronics U.S. R&D Center in San Jose, California. The video was created in August 2008, and presented at the Virtual Worlds Expo of September 2008 in Los Angeles.<br><br>
		This video was intended as an outline for the HCI team's research projects that resulted in working prototypes and pending patent applications. Some of the interaction concepts are now well known, but were shown first in this concept video. Concept (2008)

This concept illustration shows multiple devices rendering 3D content spatially correct, depending on their respective positions on the 6DOF space. Multiple devices
The first part shows position dependent rendering on a netbook with two sideways facing imagers, each of them providing 2D optical flow data of the background. With this configuration, it is easy for the device to do egomotion detection, and disambiguate rotational movements from linear movements.<br><br>
		The second part shows position dependent rendering on a UMPC using only the backfacing imager. It provides 2D optical flow data of the background, which is used to determine pan and tilt of the device.<br><br>
		Note that we did use neither inertial nor magnetic sensors in either demo. Egomotion sensing

This video shows a software prototype for natural interaction with 3D content on a Samsung Omnia cellphone. It uses both the internal inertial sensors (accelerometers) as well as the camera for egomotion sensing. The camera is used for vision processing to determine optical flow of the background in 2D. This allows the device to detect slow linear motion, which is not possible with inertial sensors.<br><br>
		This prototype was created by the CSL HCI team at the Samsung Electronics U.S. R&D Center in San Jose, California, in summer 2009. Phone demo
miVON: Mobile Immersive Virtual Outreach Navigator (Samsung R&D)

This project is about a novel method for interacting with 3D content on mobile platforms (e.g., cellphone, tablet, etc.), showing position-dependent rendering (PDR) of a 3D scene such as a game or virtual world. The system disambiguates shifting and rotating motions based on vision-based pose estimation. We developed various prototypes: a UMPC-version using optical flow only for pose estimation and a Torque3D game engine, a netbook based prototype that used up to four cameras to disambiguate imager-based 6DOF pose estimation, and a cellphone based prototype that combined inertial and vision based sensing for 6DOF egomotion detection (see videos on the left). Multiple patents were filed, and our code base was transferred to the relevant Samsung business units.

Year: 2008-2010
Status: concluded
Domain: engineering

Type: demo, video, SBPA 2009 paper, patent

Position: project and group leader
Collaborators: Seung Wook Kim, Han Joo Chae, Nachiket Gokhale
PUPS system concept illustration, showing various use cases, and necessary technology pieces System concept
Dome umbrella with external projection and touch mockup Dome mockup
Nubrella with mockup map projection Nubrella mockup
Drawings for the patent application, showing multiple setup options. Patent drawings
PUPS: Portable Unfolding Projection Space (Samsung R&D)

This project is about mobile projection display systems, to be used as a platform for AR systems, games, and virtual worlds. The technology is based on "non-rigid semi-transparent collapsible portable projection surfaces," either wearable or backpack mounted. It includes multiple wearable display segments and cameras (some inline with the projectors), and semi-transparent and tunable opacity mobile projection surfaces which are unfolding in an origami style, adjusting to the eye positions. There is a multitude of applications for this platform, from multiparty video conferencing (conference partners are projected around the user, and a surround sound system disambiguates the voices), to augmented reality applications (the display space is relatively stable with regards to the user, and AR content will register much better with the environment than handheld or head worn AR platforms), to cloaking and rear view applications. The portable display space is ideal for touch interactions (surface is at arm's length), and can track the user's hands (gestures) as well as face (for view dependent rendering). This early stage project focused on patenting and scoping of the engineering tasks, but did not go much beyond that stage.

Year: 2008-2009
Status: concluded
Domain: engineering
Type: project plan, mockup, reports, patent application
Position: project and group leader
Collaborators: Seung Wook Kim, Francisco Imai
Pet robotics is an engineering area, but also highly relevant for HCI and HRI. Our approach is to emphasize lifelikeness of a pet robot, by employing (among others) emotional expressivity (expresses emotion with non-verbal cues, non-speech audio), soft body robotics (silent and sloppy actuator technologies; super soft sensor skin), and biomimetic learning (cognitive architecture and learning methods inspired by real pets). Pet robotics
Pet robotics makes most sense when the emotional attachment is combined with a utilitarian perspective. The purpose of an animatronic mediator is to add a familiar and physical front end to a device that eliminates need to learn device-specific GUI by using human-style interaction methods, both verbal and non-verbal. In order to achieve that, a handheld animatronic mediator (UI peripheral to cellphone) is needed which interprets user’s intentions, uses natural interaction methods (speech, eye contact, voice tone detection). Animatronic mediator
Pet Robotics and Animatronic Mediators (Samsung R&D)

Pets are shown to have a highly positive (therapeutic) effects on humans, but are not viable for all people (allergies, continued care necessary, living space restrictions, etc.) From a consumer electronics perspective, there is an opportunity to create robotic pets with high realism and consumer friendliness to fill in, and create high emotional attachment by the user. Our approach emphasizes the increase of lifelikeness of a pet robot, by employing (among others) emotional expressivity (expresses emotion with non-verbal cues, non-speech audio), soft body robotics (silent and sloppy actuator technologies; super soft sensor skin), and biomimetic learning (cognitive architecture and learning methods inspired by real pets). My in depth analysis of the field covered the hard technical problems to get to a realistic artificial pet, the price segment problem (gap between toy and luxury segments), how to deal with the uncanny valley, and many other issues. This early stage project focused on planning and project scoping, but did not enter engineering phase. However, it is related to my dissertation field of Autonomous Interactive Intermediaries.

Year: 2008
Status: concluded
Domain: engineering
Type: report
Position: project leader
Mobile search method by manually framing the target object with hands and fingers (using vision processing to detect these gestures). Patent drawing 1
Various options for framing an object of interest. Patent drawing 2
Mobile search method by finger pointing and snapping (using audio triangulation of the snapping sound) as an intuitive Patent drawing 3
Googling Objects with Physical Browsing (Samsung R&D)

This project is about advanced methods of free-hand mobile searching. I developed two novel interaction methods for a localized in-situ search. The underlying idea is that instead of searching for websites, people who are not sitting in front of a desktop computer may search for information on physical objects that they encounter in the world: places, buildings, landmarks, monuments, artifacts, products—in fact, any kind of physical object. This "search in spatial context, right-here and right-now," or physical browsing, poses special restrictions on the user interface and search technology. I developed two novel core interaction methods, one based on manually framing the target object with hands and fingers (using vision processing to detect these gestures), the other based on finger pointing and snapping (using audio triangulation of the snapping sound) as an intuitive "object selection" method. The project yielded two granted patents.

Year: 2006-2007
Status: concluded
Domain: engineering
Type: report, 2 granted patents: point and snap, finger framing
Position: project leader
Representative projects done at MIT
Cellular Bunny in user's hand, being able to make eye contact with humans. Cellular Bunny with back open, showing R/C receiver, speaker, and LiPoly cell (early hardware). Robotic elements of the intermediary, in particlular the novel eye lid actuation, which is important to conveying the dynamic alertness and sleepiness level of the intermediary.
Autonomous Interactive Intermediaries (MIT Media Lab)

An Autonomous Interactive Intermediary is a software and robotic agent that helps the user manage her mobile communication devices by, for example, harvesting “residual social intelligence” from close by human and non-human sources. This project explores ways to make mobile communication devices socially intelligent, both in their internal reasoning and in how they interact with people, trying to avoid, e.g., that our advanced communication devices interrupt us at completely inappropriate times. My Intermediary prototype is embodied in two domains: as a dual conversational agent, it is able to converse with caller and callee—at the same time, mediating between them, and possibly suggesting modality crossovers. As an animatronic device, it uses socially strong non-verbal cues like gaze, posture, and gestures, to alert and interact with the user and co-located people in a subtle but public way. I have built working prototypes of Intermediaries, embodied as a parrot, a small bunny, and a squirrel—which is why some have called my project simply "Bunny Phone" or "Cellular Squirrel". However, it is more than just an interactive animatronics that listens to you and whispers into your ear: When a call comes in, it detects face-to-face conversations to determine social groupings (see Conversation Finder project), may invite input ("vetos") from the local others (see Finger Ring project), consults memory of previous interactions stored in the location (called Room Memory project), and tries to assess the importance of the incoming communication by conversing with the caller (see Issue Detection project). This is my main PhD thesis work. More...

Year: 2002 - 2005
Status: dormant (not active as industry project, but I keep working on it)
Domain: engineering

Type: system, prototypes, paper, another paper,short illustrative videos, dissertation, demo video [YouTube], patent 1, patent 2

Press: many articles
Position: lead researcher
Advisor: Chris Schmandt
PhD Thesis committee: Chris Schmandt, Cynthia Breazeal, Henry Lieberman
Collaborators: Matt Hoffman (undergraduate collaborator)
Conversation Finder badge mounted on shirt, near neck of the user. Close up of Conversation Finder badge PCBs. Illustration of alignment of speech of four speakers: on the left side, all four speakers are part of the same conversational grouping, and their turns do not overlap. On the right, the group gets split up into two conversational grouping. It is clearly visible that, e.g., speaker black and speaker blue's turns overlap significantly, and therefore cannot be part of the same conversational grouping. At the same time, speaker green and speaker black (upper right) align well, and so do speaker blue and speaker orange (lower right).
Prototype (top), dual PCB (middle), alignment example (bottom)
Conversation Finder (MIT Media Lab)

Conversation Finder is a system based on a decentralized network of body-worn wireless sensor nodes that independently try to determine with who the user is in a face-to-face conversation with. Conversational groupings are detected by looking at alignment of speech—the way we take turns when we talk to each other. Each node has a microphone and sends out short radio messages when its user is talking, and in turn listens for messages from close-by nodes. Each node then aggregates this information and continuously updates a list of people it thinks its user is talking to. A node can be queried for this information, and if necessary can activate a user's Finger Ring (see Finger Ring project). Depending on my conversational status, my phone might or might not interrupt me with an alert. This system is a component of my PhD work on Autonomous Interactive Intermediaries, a large research project in context-aware computer-mediated call control.

Year: 2002 - 2005
Status: dormant
Domain: engineering
Type: system, prototypes, papers, brief explanatory video 2003 [YouTube]
Position: lead researcher
Advisor: Chris Schmandt

Collaborators: Quinn Mahoney (undergraduate collaborator 2002-2003), Jonathan Harris (undergraduate collaborator 2002)

Working prototype of wireless finger ring, with all components mounted on the back of the inverted wireless module. Working system of wired finger ring, used for user tests. All components are shrink-wrapped. The circular large piece at the top of the ring is the vibration actuator. Working system of wired finger ring, used for user tests. The ring is worn on the index finger so that the microswitch can be easily reached by the thumb, even without any visual guidance. This allows the user to veto to a conversation even when the hand is under the table, or hidden anywhere, making it not (or less) obvious to the conversational partners that an interruption was vetoed.
Working prototype (top), wired rings used for user tests (middle, bottom)
Finger Ring, "Social Polling" (MIT Media Lab)

Finger Ring is a system in which a cell phone decides whether to ring by accepting votes from the others in a conversation with the called party. When a call comes in, the phone first determines who is in the user's conversation (see Conversation Finder project). It then vibrates all participants' wireless finger rings. Although the alerted people do not know if it is their own cellphones that are about to interrupt, each of them has the possibility to veto the call anonymously by touching his/her finger ring. If no one vetoes, the phone rings. Since no one knows which mobile communication device is about to interrupt, this system of “social polling” fosters collective responsibility for controlling interruption by communication devices. I have found empirical evidence that significantly more vetoes occur during a collaborative group-focused setting than during a less group oriented setting. This system is a component of my PhD work on Autonomous Interactive Intermediaries, a large research project in context-aware computer-mediated call control.

Year: 2002 - 2005
Status: dormant
Domain: engineering
Type: system, prototypes, paper
Position: lead researcher
Advisor: Chris Schmandt
Issue Detection (MIT Media Lab)

Issue Detection is a system that is able to assess in real-time the relevance of a call to the user. Being part of a conversational agent that picks up the phone when the user is busy, it engages the caller in a conversation using speech synthesis and speech recognition to get a rough idea for what the call might be about. Then it compares the recognized words with what it knows about what is currently “on the mind of the user”. The latter is harvested continuously in the background from sources like the user's most recent web searches, modified documents, email threads, together with more long term information mined from the user's personal web page. The mapping process has several options in addition to literal word mapping. It can do query extensions using Wordnet as well as sources of commonsense knowledge. This system is a component of my PhD work on Autonomous Interactive Intermediaries, a large research project in context-aware computer-mediated call control.

Year: 2002 - 2005
Status: dormant
Domain: engineering
Type: system
Position: lead researcher
Advisor: Chris Schmandt
Illustration of channel sequence: in this example, the system would escalate the communication effort seamlessly from cellphone to 2-way pager (skipping fax machine) to another pager to calling the user on their landline.
Illustration of channel sequence
Active Messenger (MIT Media Lab)

Active Messenger (AM) is a personal software agent that forwards incoming text messages to the user's mobile and stationary communication devices such as cellular phones, text and voice pagers, fax, etc., possibly to several devices in turn, monitoring the reactions of the user and the success of the delivery. If necessary, email messages are transformed to fax messages or read to the user over the phone. AM is aware of which devices are available for each subscriber, which devices were used recently, and if a message was received and read by the user by exploiting back-channel information and by inferring from the users communication behavior over time. The system treats filtering as a process rather than a routing problem. AM is up and running since 1998, serving between 2 and 5 users, and has been refined over the last 5 years in a tight iterative design process. This project started out as my Master's thesis at the MIT Media Lab (finished 1999), but has been continued until the present time. More...

Year: 1998 - 2005
Status: system stable (was in continuous use until 2007 or 2008!)
Domain: engineering
Type: system, thesis, paper (HCI), paper (IBM), tech report
Position: lead researcher
Advisor: Chris Schmandt
I/O Brush side view: the handle was custom hand-milled on a woodworking lathe at the MIT Woodworking Shop. I/O Brush front view: the bristles were arranged in a ring to leave space for the camera, LED, and sensors hidden inside. I/O Brush system components, as mounted inside the wooden handle. I/O Brush at the ARS Electronica exhibition.
I/O Brush (MIT Media Lab)

I/O Brush is a new drawing tool to explore colors, textures, and movements found in everyday materials by "picking up" and drawing with them. I/O Brush looks like a regular physical paintbrush but has a small video camera with lights and touch sensors embedded inside. Outside of the drawing canvas, the brush can pick up color, texture, and movement of a brushed surface. On the canvas, artists can draw with the special "ink" they just picked up from their immediate environment.

I designed the electronics on the brush (sensors, etc), the electronics “glue” between the brush and the computers, and wrote the early software. This project is the PhD work of Prof. Kimiko Ryokai, and has been presented at many events, including a 2-year interactive exhibition at the Ars Electronica Center in Linz, Austria. More...


Year: 2003 - 2005
Status: active
Domain: engineering

Type: system, prototypes, paper, paper (design), video [MPEG (27MB)] [MOV (25MB)] [YouTube], manual, handling instructions

Position: collaborator

Collaborators: Kimiko Ryokai (lead researcher), Rob Figueiredo (undergraduate collaborator), Joshua Jen C. Monzon (undergraduate collaborator)

Advisor: Hiroshi Ishii
Early projects at MIT and before
Robotic F.A.C.E. Robotic F.A.C.E.: mechanics inside
Robotic F.A.C.E. (MIT Media Lab)

Robotic F.A.C.E., which stands for Facial Alerting in a Communication Environment, explored the use of a physical object in the form of a face as a means of user interaction, taking advantage of socially intuitive facial expressions. We have built an interface to an expressive robotic head (based on the mechanics of a commercial robotic toy) that allows the use of socially strong non-verbal facial cues to alert and notify. The head, which can be controlled via a serial protocol, is capable of expressing most basic emotions not only in a static way, but also as dynamic animation loops that vary some parameter, e.g., activity, over time. Although in later projects with animatronic components (Robotic P.A.C.E., Autonomous Interactive Intermediaries) I did not reverse engineer a toy interface anymore, the experience gained with this project was very valuable. More...

Year: 2003 - 2004
Status: done
Domain: engineering
Type: system
Position: lead researcher
Collaborators: Mark Newman (undergraduate collaborator)
Parrot on my shoulder. Parrot with open back zipper. Robotic P.A.C.E. (MIT Media Lab)

The goal of the Robotic P.A.C.E. project was to explore the use of a robotic embodiment in the form of a parrot, sitting on the user's shoulder, as a means of user interaction, taking advantage of socially intuitive non-verbal cues like gaze and postures. These are different from facial expressions (as explored in the Robotic F.A.C.E. project), but at least as important as them for grabbing attention and interrupting in a socially appropriate way. I have built an animatronic parrot (based on a hand puppet and commercially available R/C gear) that allows the use of strong non-verbal social cues like posture and gaze to alert and notify. The wireless parrot, which can be controlled from anywhere by connecting to a server via TCP which in turn connects to a hacked R/C long range transmitter, is capable of quite expressive head and wing movements. Robotic P.A.C.E. was a first embodiment for a communication agent that reasons and acts with social intelligence.

Year: 2003 - 2004
Status: done
Domain: engineering
Type: system, illustrative video (2 minutes) [ Quicktime 7,279kb] [YouTube]
Position: lead researcher
Final working prototype. CAD rendering for the prototype. Actual projection onto wall. Tiny Projector (MIT Media Lab)

Mobile communication devices get smaller and smaller, but we'd prefer if the displays would get larger instead. The solution to this dilemma is to add projection capabilities to the mobile device. The basic idea behind TinyProjector was to create the smallest possible character projector that can be either integrated into mobile devices like cellphones, or linked wirelessly via protocols like Bluetooth. During this 2-year project, I have built ten working prototypes; the latest one uses eight laser diodes and a servo-controlled mirror that "paints" characters onto any surface like a matrix printer. Because of the laser light, the projection is highly visible even in daylight and on dark backgrounds. More...

Year: 2000 - 2002
Status: done
Domain: engineering
Type: prototypes, report
Position: lead researcher
The Media Lab had a close relationship with Motorola (I was Motorola Fellow 1997/1998), and we even had our own 2-way paging base station on the roof of the lab which we could experiment on. The system was maintained by a colleague of mine, Pascal Chenais. He gave a select group of people at the lab these cool 2-way pagers (above, Motorola SkyWriters), and my system is built on top of that infrastructure.<br><br>Later on, we also used Knothole with SMS capable devices: essentially messaging capable cellphones, common in Europe, but a novelty back then in the U.S.A. And even later, we used PDAs like the PalmPilot (and I had no idea that I would work for Palm 15 years later on!). My Knothole system was agnostic to the underlying communication technology, as long as it was 2-way messaging capable.
2-way pagers like this were used for the Knothole system
Knothole (MIT Media Lab)

Knothole (KH) uses mobile devices such as cellphones and two-way pagers as mobile interfaces to our desktop computers, combining PDA functionality, communication, and Web access into a single device. Rather than put intelligence into the portable device, it relies on the wireless network to connect to services that enable access to multiple desktop databases, such as your calendar and address book, and external sources, such as news, weather, stock quotes, and traffic. In order to poll a piece of information, the user sends a small message to the KH server, which then collects the requested information from different sources and sends it back as a short text summary. Although development of KH has finished 1998, it is currently used by Active Messenger, which it enhances and with which it interacts seamlessly. More...

Year: 1997 - 1998
Status: system stable and in continuous use
Domain: engineering
Type: system, prototypes, paper (related)
Position: lead researcher
Advisor: Chris Schmandt
Early working prototype. Rendering of an advanced miniaturized version which uses a ducted fan and shifts its center of gravity with servos on a slide to move forward, backward, left and right. Yet another render of an advanced version which uses internal lifting surfaces.
Early prototype (top), later designs (middle, bottom)
Free Flying Micro Platforms, "Zero-G Eye" (MIT Media Lab)

A Free Flying Micro Platform (FFMP) is a vision for small autonomously hovering mobot with a wireless video camera that carries out requests for aerial photography missions. It would operate indoors and in obstacle rich areas, where it avoids obstacles automatically. Early FFMPs would follow high level spoken commands, like "Go up a little bit, turn left, follow me, and would try to evade capture. Later it would understand complex spoken language such as "Give me a close up of John Doe from an altitude of 3 feet" and would have refined situational awareness. The Zero-G Eye is a first implementation of a FFMP that was built to explore ways of creating an autonomously hovering small device. The sensor-actuator loop is working, but flight times were highly constrained because of a too low lift-to-weight ratio. Later prototypes are in different planning stages, and profit from experiences made with earlier devices. As a side note, I have been virtually obsessed with small hovering devices for a very long time already, and have designed such devices since I was 12 years old. More...

Year: 1997 - 2001
Status: prototypes developed; project dormant
Domain: engineering
Type: prototypes, report 1, report 2, report 3, paper, paper (related)
Position: lead researcher
One of the early platforms. One of the early platforms Autonomous Helicopter Robot (MIT Media Lab)

The MIT Aerial Robotics Club advanced the field of Aerial Robotics within the MIT community through the physical construction of flying robots. It was the intention of this Club to assist its members to learn about the details of constructing Aerial Robots by active participation in competitions and projects which can be solved or will benefit by the use of autonomous flying vehicles. Over the years, the team built several aerial robots based on large R/C controlled helicopters. For a brief time, I was part of the GNC/Ground Station group, and my job was testing, calibration, and integration of the compass module.

Year: 1998 - 1999
Status: club discontinued
Domain: engineering
Type: system, prototypes, paper
Position: collaborator of MIT Aerial Robotics Club
Collaborators: many
System diagram. ASSOL (Adaptive Song Selector Or Locator) (MIT Media Lab)

The Adaptive Song Selector Or Locator (ASSOL) is an adaptive song selection system that dynamically generates play lists from MP3 collections of users that are present in a public space. When a user logs into a workstation, the ASSOL server is notified, and the background music that is currently played in this space is influenced by the presence of the new user and her musical preferences. Her preferences are simply extracted from her personal digital music collection, which can be stored anywhere on the network and are streamed from their original location. A first time user merely has to tell the ASSOL system where her music files are stored. From then on, the play lists are compiled dynamically, and adapt to all the users in a given area. In addition, the system has a Web interface that allows users to personalize certain songs to convey certain information and alert them without interrupting other people in the public space. More...

Year: 2000
Status: done
Domain: engineering
Type: system, prototype, report
Position: researcher
Collaborators: Kwan Hong Lee (co-researcher)
The GUI of the system (browser based): top right is the selection of the language, and the upload area. Bottom left is is where the user can synthesize voices. Bottom right is the list of word-soundfiles that the system currently has in this language. OpenSource SpeechSynth (MIT Media Lab)

The OpenSource SpeechSynth (OSSS) is a purely Web based text-to-speech synthesizer for minority languages, for which no commercial speech synthesizer software is available, e.g., Swiss German. It is based on a collaborative approach where many people contribute a little, so that everybody can profit from the accumulated resources. Its Web interface allows visitors to both upload sound files (words), as well as synthesize existing text. The speech synthesizing method used in this project is word based, which means that the smallest sound units are words. Sentences are obtained by waveform concatenation of word sounds. Due to the word concatenation approach, the OSSS works with any conceivable human language. It currently lists 90 languages, but users can easily add a new language if they wish, and then start adding word sounds. During the 4 years the OSSS is online, it has been tested by many Web visitors, specifically by the Lojban community. More...

Year: 2000 - 2001
Status: done; up and running
Domain: engineering
Type: system, report
Position: lead researcher
Early morning situation. note that the time of day is set by manually moving the “sun” across the sky. Close up of the cloud layer (from below), while rain is falling down. Top view of the WeatherTank system.
Various weather conditions
WeatherTank (MIT Media Lab)

WeatherTank is a tangible interface that looks like a tabletop sized vivarium or a diorama, and uses everyday weather metaphors to present information from a variety of domains, e.g., "a storm is brewing" for increasingly stormy weather, indicating upcoming hectic activities in the stock exchange market. WeatherTank represents such well-known weather metaphors with desktop sized but real wind, clouds, waves, and rain, allowing users to not only see, but also feel information, taking advantage of our skills developed through our lifetimes of physical world interaction. A prototype was built that included propellers for wind, cloud machines, wave and rain generators, and a color-changing lamp as sun mounted on a rod that can be used to move the sun along an arc over the tank, allowing the user to manipulate the time of day.

Year: 2001
Status: done
Domain: engineering

Type: system, report (short), report, video of demo (1:14) [YouTube], unedited demo video (18.5 minutes) [RealVideo] [YouTube]

Position: researcher
Collaborators: Deva Seetharam (co-researcher)
Screenshot of the user interface. Screenshot of UI Impressionist visualization of online communication (MIT Media Lab)

This system provides an intuitive, non-textual representation of online discussion. In the context of a chat forum, all textual information of each participant is transformed to a continuous stream of video. The semantic content of the text messages is mapped onto a sequence of videos and pictures. The mapping is realized on the side of the receiver, because a simple text line like "I love cats" means different things to different people. Some would associate this with an ad for cat food, some other would be more negative because they dislike the mentality of cats and would therefore see pictures like a dog chasing a cat. For this purpose, each participant has a personal database of semantic descriptions of pictures and videos. If the participant scans the messages of a group, this textual information is transformed automatically to a user specific multiple stream of video. These video snippets have purely connotative meanings. I have built a proof-of-concept system with live video streams. More...

Year: 1998
Status: done
Domain: engineering, art installation
Type: system, report
Position: lead researcher
Example of how the system traverses WordNet during the game: the user's guess (“snakes”) and the word to guess (“Sam”, the name of a cat) are each followed up the semantic tree until they meet. This allows the system to give hints, such as: “It is not a reptile, but a mammal”. System reasoning tree
During the game, one applet (left) was responsible for text output (system output), one applet (right) was for the user to enter text. Game example
Daboo (MIT Media Lab)

Daboo is a real time computer system for the automatic generation of text in the specific context of the word guessing game Taboo. To achieve the game’s goal—let the user guess a word as fast as possible without using certain taboo word—our system uses sophisticated algorithms to model user knowledge and to interpret semantically the user input. The former is very important to gradually enhance the performance of the system by adapting to the user's strongest "context" of knowledge. The latter helps bridge the gap between a guess and the actual word to guess by creating a semantic relationship between the two. For this purpose we rely on the semantic inheritance tree of WordNet. Daboo acts effectively as the clue giving party of a Taboo session by interactively generating textual descriptions in real time. More...

Year: 1997
Status: done
Domain: engineering
Type: system, report
Position: researcher
Collaborators: Keith Emnett (co-researcher)
This diagram shows one of the core relationships between the immediacy of a communication channel (X axis) and the preference of a user for a communication channel (Y axis). The relationship is following an inverse U curve, but the curve is shifted along the X axis depending on the intimacy that the given task requires. Illustration of relationships between core parameters, immediacy of communication channel, and user preference for communication channel Psychological Impact of Modern Communication Technologies (University of Bern)

In this two-year study, I examined both the communicative behavior in general and the use of communication technologies of eight subjects in detail using extensive problem-centered interviews. From the interview summaries, a general criterion for media separation was extracted, which allows the systematic separation of all media into two groups: on one side the verbal-vocal, realtime-interactive, and non-time-buffered media like telephone, intercom, and face-to-face communication; on the other side the text-based, asynchronous, and time-buffering media like letter, telefax, and email. The two media answering machine and online chatting (realtime communication via computer monitor and keyboard) occupy exceptional positions because they cannot be assigned to either group. Therefore, these two media were examined in detail. Through analyzing them under the aspects of both a semiotic ecological approach and a privacy regulation model, important characteristics and phenomena of their use can be explained, and future trends be predicted. This work was part of my first Master's thesis in Psychology.

Year: 1993
Status: done
Domain: psychology
Type: study, thesis, paper
Position: researcher
Advisor: Urs Fuhrer
This was an extensive study, with empirical data collection from an experiment. I wish I still have visuals of the experimental setup, but in essence, we let our study participants listen to songs, with either the matching music clip (made by the artists), mis-matched music clips (random video, not related to the music), or just the music by itself. Then we asked the participants to rate each song using a semantic differential. Title page of the 163-page report on the main results of the study Influence of Video Clip on Perception of Music
(University of Bern)

This study explored the question if music is perceived and rated differently when presented alone versus when presented together with a promotional video clip. Thirty-six subjects filled out semantic differentials after having listened to tree different songs, each under one of the following conditions: the song was presented without video; the song was presented with the corresponding promotional video clip; the song was presented with random video. We found that subjects instructed to evaluate music do this in similar ways, independent from the presence of a matching or mismatching video clip. However, presenting a song together with a corresponding video clip decreases the possibility for the listeners/viewers to interpret the music in their own way. Furthermore, our data suggests that it is difficult and perhaps arbitrary to rate an incompatible or unmotivated music video mix, and that an appropriate video clip makes the "meaning" of the song more unequivocal.

Year: 1989
Status: done
Domain: psychology
Type: study, report
Position: researcher
Advisor: Alfred Lang
Collaborators: Fränzi Jeker und Christoph Arn (co-researchers)
This table is a hand-made summary of all the data we got. On the Y axis, there is time (essentially all the music the participants listened to was on one music cassette). On the X axis, we list how many of the participants pressed the button in a specific time interval. They were instructed to press the button when they <i>really</i> liked a song segment. In the middle column, we listed our predictions (groups labeled A to X) where we <i>expected</i> the participants to like a specific song segment (and then press the button), because in our own experience, these particular song segments were the spots we ourselves felt goose bumps. Half of the songs we played were specifically selected because they do produce such goose bumps (again, in our own experience). Songs that produce goose bumps at all were quite rare, though: most of them were super successful in the music charts, and commercial super hits. Main data table showing the self-report real-time user behaviors Physiological Reactions to Modern Music (University of Bern)

This half-year study was motivated by our subjective experiences of a striking physiological reaction ("goose pimples") when listening to some modern pop and rock songs. We hypothesized that this physiological reaction, possibly caused by adrenaline, is the most important factor that determines if a song pleases an audience: it can lead to massive sales of records and high rankings on music charts. This project explores if an accumulation of such "pleasing spots" can be found when listening to specific songs. Our experimental results from 33 subjects prove the existence of such accumulations; additionally, we examined if we were able to predict these accumulations ahead of the experiment. Our results show also that we were able to predict them intuitively, but not all of them and not without errors. Only assumptions can be made about the criteria for passages that attracted attention as accumulations of conjectured physiological reactions. There are four factors: increase of musical density (for example, chorus passages); musical intensification of melody or harmony; increase of rhythm; increase of volume.

Year: 1986
Status: done
Domain: psychology
Type: study, report
Position: researcher
Advisor: Alfred Lang
Collaborators: Sibylle Perler, Christine Thoma, Robert Müller, and Markus Fehlmann (all co-researchers)

© 1997 - 2020 by Stefan Marti & MIT Media Lab. Send comments to: . Last updated: February 3, 2022

home | research | writings | courses | personal
Home Writings Courses Personal