RoomAlive: Magical Experiences . - Projection Mapping

Transcription

RoomAlive: Magical Experiences Enabled by Scalable,Adaptive Projector-Camera UnitsBrett Jones1,2, Rajinder Sodhi1,2, Michael Murdock1,3, Ravish Mehra1,4, Hrvoje Benko1,Andrew D. Wilson1, Eyal Ofek1, Blair MacIntyre1,5, Nikunj Raghuvanshi1, Lior Shapira11Microsoft Research, Redmond 2University of Illinois at Urbana-Champaign35University of Southern California 4UNC Chapel HillGeorgia TechFigure 1. RoomAlive is a proof-of-concept system that transforms any room into an immersive, augmented gaming experience. E.g.RoomAlive can change the room’s appearance to be (a) Tron themed, or (b) a swampy river. In an example Whack-A-Mole game,players physically (c) whack or (d) shoot moles popping out of their floor, walls and couch. (e) Players control a robot running aroundthe room and shoot enemy robots using a video game controller. (f) An adventure game where players avoid deadly blow dart trapswith physical body movement. (All images in the paper were captured live; with the real-time working prototype).ABSTRACTAuthor KeywordsRoomAlive is a proof-of-concept prototype that transformsany room into an immersive, augmented entertainmentexperience. Our system enables new interactive projectionmapping experiences that dynamically adapts content to anyroom. Users can touch, shoot, stomp, dodge and steerprojected content that seamlessly co-exists with theirexisting physical environment. The basic building blocks ofRoomAlive are projector-depth camera units, which can becombined through a scalable, distributed framework. Theprojector-depth camera units are individually autocalibrating, self-localizing, and create a unified model of theroom with no user intervention. We investigate the designspace of gaming experiences that are possible withRoomAlive and explore methods for dynamically mappingcontent based on room layout and user position. Finally weshowcase four experience prototypes that demonstrate thenovel interactive experiences that are possible withRoomAlive and discuss the design challenges of adapting anygame to any room.Projection mapping; spatial augmented reality; projectorcamera systemPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. Copyrights forcomponents of this work owned by others than the author(s) must behonored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee. Request permissions from permissions@acm.org.UIST’14, October 8–11, 2014.Copyright is held by the author(s). Publication rights licensed to ACM.ACM 978-1-4503-2268-3/13/10. 15.00.http://dx.doi.org/10.1145/2501988.2502032ACM Classification KeywordsH.5.2 User Interfaces: Graphical user interfaces, Inputdevices and strategies, Interaction styles.INTRODUCTIONIn recent years the way people play video games changed.We can now interact more naturally with the game world bymoving our body (i.e., Nintendo Wiimote, Microsoft Kinectand PlayStation Move). Coupled with new displaytechnologies (e.g., Oculus Rift), we can now feel more “inthe game” than ever before. However, despite theseadvances, the game world is still distinctly separate from ourreal world. We may feel more present in the game world, butthe game is not present in our world.This paper makes a system contribution by demonstrating anovel distributed system that enables a unique set ofaugmented reality experiences in any room. Our proof-ofconcept prototype, RoomAlive, transforms any room into animmersive, augmented entertainment experience and enablesusers to naturally interact with augmented content in theireveryday physical environment. RoomAlive builds on a widearray of research fields. Although each individual componentof RoomAlive could be improved, the unified systemdemonstrates the immersive, augmented gaming experiencesthat might be possible when games can adapt to our room.

Figure 2. Each projector depth camera unit (a procam unit)consists of a commodity wide field-of-view projector, depthcamera and computer.The basic building block of RoomAlive is a combination of aprojector and a depth camera (Figure 2), or procam unit [28].By tiling and overlapping multiple procam units, RoomAliveis able to cover the room’s walls and furniture withinput/output pixels. RoomAlive tracks users’ movements anddynamically adapts the gaming content to the room. Userscan touch, shoot, dodge and steer virtual content thatseamlessly co-exists with the existing physical environment.RoomAlive captures and analyses a unified 3D model of theappearance and geometry of the room, identifying planarsurfaces like walls and the floor. This is used to adapt theaugmented content to the particular room, for instancespawning enemy robots only on the floor of the room. Thecontent also reacts to the user’s movement. RoomAlive usesa distributed framework for tracking body movement andtouch detection using optical-flow based particle tracking[4,15], and pointing using an infrared gun [19].To demonstrate the scalability of our prototype system, sixprocam units were used to cover the floor, walls and furnitureof a large living room. We explore the design space of wholeroom, augmented interactive experiences and the authoringprocess for these experiences. Through our exploration, weshowcase four example experiences that demonstrate the richand immersive experiences that are possible with RoomAlive.Finally, we discuss design guidelines and system limitationsfor future game designers who wish to create adaptiveinteractive projection mapped games.MOTIVATING SCENARIOImagine playing a video game in your living room without atelevision. Instead, the game happens in the room, all aroundyou. When the game starts, the room magically transformsinto an ancient castle, the walls turn to stone, and flamingtorches emerge from the walls casting flickering shadowsonto the furniture. Out of the corner of your eye, you see aglowing idol appear on your couch. You walk towards theidol when suddenly, a trap opens on the wall next to you,exposing blow darts ready to fire. You leap out of the way,only to land on the floor face-to-face with a giant cockroach.You quickly get up and jump on the roach. You reach theidol successfully, and a scoreboard drops down showing thatyou have just scored the best time for the adventure course.This is just one of many interaction scenarios that are madepossible by RoomAlive.Figure 3. We have installed RoomAlive in a large (18 ft x 12 ft)living room using six overlapping procam units, with three unitspointed towards the three walls of the room, and another threeare pointed downwards to cover the floor.RELATED WORKImmersive DisplaysWith gaming experiences, bigger is often better. A largerdisplay has a wider field of view, which makes the user feelmore immersed and more present in the experience [16,30].CAVE systems [10] and tiled displays [7,33] surround theuser in visuals, but require special blank projection surfaces.Head mounted displays (HMDs) [26], enable users to becompletely immersed in virtual environments, but requireeach user to wear a device. Optically opaque displays like theOculus Rift block out the physical environment, and evensee-through HMDs often limit the user’s field of view (theframe of the glasses). HMDs can also increase the chance ofsimulator sickness by downgrading the user’s perception tothe resolution, frame-rate and latency of the display.RoomAlive provides unencumbered shared, augmentedexperience that adapts to the physical environment.Spatial Augmented RealitySpatial Augmented Reality (SAR) uses projected light toalter the appearance of physical objects (e.g. [3,5,34,35,36]).Previous work has explored SAR in mobile form factors[14,31,41], steerable projector systems [27,42], and even fornovel gaming experiences: e.g. racing a car across a desktop[6], playing chess [44], or playing miniature golf on top ofwooden blocks [19]. By using real-time depth sensors,previous work has enabled physical interactions usingsimilar procam units [4,43,45].Most closely related to our work is IllumiRoom [18], wherea single projector was used to surround a traditionaltelevision with projected light to enhance gamingexperiences. IllumiRoom explored focus plus contextvisualizations anchored by traditional screen based gamingexperiences. RoomAlive explores gaming completelyuntethered from a screen, in a unified, scalable multiprojector system that dynamically adapts gaming content tothe room and explores additional methods for physicalinteraction in the environment.Commercially, SAR is known as ‘projection mapping’ andhas recently exploded in popularity [22,29]. Projectionmapping can create stunning visuals for outdoor advertising,

theater, theme parks, museums and art galleries. However,the projected content is usually passive, and must belaboriously created specifically to the underlying surface. Incontrast, with RoomAlive users can physically interact withprojected content that adapts to any living roomenvironment.ROOMALIVE SYSTEMThe RoomAlive system is comprised of multiple projectordepth camera units or “procam” units. Each unit contains adepth-camera (which includes a color camera, infrared (IR)camera and IR emitter), a commodity wide field-of-viewprojector, and a computer. A single unit can be used inisolation to create a set-up like IllumiRoom [18], or multipleunits can be combined to canvas an entire living room in I/Opixels.Our current proof-of-concept prototype demonstrates theRoomAlive concept in a large living room (18 ft x 12 ft), withsix procam units overlapping to cover three walls, the floorand all the furniture in the room (see Figure 3). RoomAlive isimplemented as a plug-in to the Unity3D commercial gameengine. Unity3D enables designers to easily author gamecontent using an intuitive 3D editor and an interactivedevelopment environment.With RoomAlive procam units are connected through adistributed client-server model. Each client node isresponsible for tracking the players and the room geometrywithin the view of its own local depth sensor. The clients arealso responsible for any un-synchronized local game state(e.g. intensive physics simulations) and for rendering theirlocal view of the global scene, including view dependentrendering. The master server node synchronizes game stateand user tracking state across units. The master server nodealso acts as a client, handling its own game state andrendering.Procam HardwareTo demonstrate the flexibility of the approach, the procamunits are built using a variety of commodity wide field ofview projectors (InFocus IN1126, BenQ W770ST, BenQW1080ST). Each unit also contains a Microsoft Kinect forWindows v1 sensor.In our prototype, each unit is connected to its own personalcomputer with a high-end consumer grade graphics card(e.g., NVidia GTX 660 Ti). We use high-end personalcomputers to explore rich, computationally intensiveinteractions (e.g., GPU based optical flow) that will bepossible in smaller form factors in the future. All procamunits are currently mounted to the ceiling of the living room.Procams could also be mounted on the ceiling, tripod orplaced on a bookshelf or a coffee table, or generally anylocation that maximizes coverage and minimizes occlusions.Automatic CalibrationThe RoomAlive calibration process is completely automatic,requiring no user intervention and is distributed acrossmultiple procam units. Installers simply mount the procamFigure 4. To compute correspondences between two units (Unit0 and Unit 1), we map a depth pixel in Unit 0 (𝒅𝟎 ), to an RGBpixel in Unit 0 (𝒓𝟎 ). Next, we look up the correspondingprojector pixel in Unit 1 (𝒑𝟏 ), via the decoded Gray codes. Wethen invert the Gray code correspondences to look up the RGBpixel in Unit 1 (𝒓𝟏 ). Finally, we invert the Kinect’s transfer mapresulting in depth pixel (𝒅𝟏 ).units with some overlap ( 10%) between units. Each procamunit displays a Gray code sequence [2], while all otherprocam units observe and decode the sequences. Thisestablishes dense correspondences between each projectorpixel and all Kinect cameras that can observe that point.Each correspondence is transformed into the Kinect’s depthimage via the Kinect SDK, resulting in 2D to 3D pointcorrespondences. Using these correspondences, we solve forthe intrinsics and extrinsics of each unit using OpenCV’scalibrateCamera function. To increase robustness we embedthis in a RANSAC procedure [11]. This process only worksfor non-planar scenes. A procam unit viewing a planarsurface may be temporarily rotated to view a non-planarscene for internal calibration.To establish global extrinsics (rotation and translation)between units, we chain together correspondences (seeFigure 4). A rigid transform is estimated between each pairof procam units using a singular value decomposition, and isrefined via pairwise iterative closest point [24]. Thesetransforms are chained together using a maximal spanningtree with weights using the number of inliers for eachpairwise transform. The global coordinate system is centeredat the first unit to connect to the system, and it is adjusted sogravity points downwards (via the Kinect’s accelerometer).Automatic Scene AnalysisA unified 3D model of the room is formed by combining thedepth maps from each unit (on the master node). This modelis analyzed, finding continuous planar surfaces (walls, floor)across units and labeling these surfaces. This process mustbe repeated when the system performs a new scan of theenvironment, e.g., recalibration or when objects are moved.The system uses recent techniques in plane and scene modelanalysis [23]. To find planes, the surface normal is calculatedusing principal component analysis. The Hough transform[17] is used to select a finite set of planes. Each 3D point and

its surface normal votes for a plane equation parameterizedby its azimuth, elevation, and distance from the origin. Agreedy strategy is used for associating scene points withplanes. Unassigned 3D points that lie in the vicinity of eachcandidate plane (up to 10 cm), and has compatible normaldirection, are associated with the plane ( 0.1 angle). Planesare categorized into ‘vertical’, ‘horizontal’ or ‘other’ basedon their orientation with respect to gravity. The ‘floor’ isidentified as the lowest ‘horizontal’ plane. Within eachplane, points that are close together are converted intopolygons using the outer hull of the set. Texture coordinatesare assigned according to gravity and the principalcomponent axis.Authoring Augmented ContentRoomAlive makes use of Unity3D’s modular pluginframework and scripting interface (Figure 5). Gamedesigners only need to add a single game object to their sceneto load the RoomAlive plugin. The game designer then hasaccess to the API of RoomAlive directly within the scriptinginterface of Unity3D. RoomAlive uses a high resolution 3Dmodel of the room as seen from the current procam unit, anduses Unity3D to render the virtual objects from theprojector’s point of view. Alternatively, designers could usea stored room model for testing purposes.Game art assets can be easily imported from external contentauthoring software and positioned relative to the augmentedenvironment and previewed in situ. Behaviors can then beapplied to a game object using a C# scripting interface, e.g.,how to react to gun fire. These scripts have access to theRoomAlive API, enabling queries regarding scene semantics(e.g., “On floor?”) or players’ touch collision events.Mapping ContentOne of the key technical challenges in making a projectionbased experience work in any living room is the placementof virtual game elements. Unlike traditional game designwhere elements like the game terrain are known a priori,RoomAlive experiences must operate in multitude of rooms.Figure 5. In the Unity3D editor’s Play mode (debug mode), ahigh resolution model of the current living room isautomatically instantiated, along with virtual cameras (withcorrect intrinsics/extrinsics) for all the procam units in thescene. On the left is the scene view where artists can drag anddrop game assets and view the game from any arbitraryviewpoint. On the right is a live-preview of the image that theprojector will display.The mapping of game elements to a physical environmentmust be done in real time. Content must be mappedprocedurally based on a set of rules that combine the goalsof the game designer with the structure of the physical space,which can be significantly complex [39]. While we do notoffer a complete solution to this problem, RoomAliveemploys four techniques for mapping: Random mapping maps content in a uniformly randomway. For example, targets presented to the user can bemapped to random positions around the room. Semantic Mapping leverages additional semanticinformation recovered from the scene to map content. Forinstance, grass and rock game objects could be mappedto only appear in locations on the floor of the room. Toaccomplish this, the game script queries the RoomAliveAPI on start-up to supply a list of points that belong to thefloor of the room, which can then be sampled toinstantiate game content. User-constrained mapping places content based on thecurrent location of the user or user state. For example, acannon that shoots at a user can be dynamically placed ata location in the room that offers a clear view of the user. User-driven mapping relies on users to interactivelyarrange content in the room, spawning content during agaming experience by touching a physical surface orpointing with a gun at surfaces in the room. This enablesusers to level-edit or re-decorate their game room.Tracking User InteractionRoomAlive supports interacting with augmented content oting and traditional controller input. Forcomputational efficiency, processing is done locally on eachclient procam unit, and only resulting changes in game stateare synchronized across units.Proxy Particle RepresentationTo enable interaction through body movement, touching andstomping, we use the captured depth map along with a realtime physics simulation. Using a similar approach as [4,15],moving objects (e.g., players) in the room are represented bya cloud of spherical ‘proxy particles’, that are capable ofexerting frictional forces. This enables physically realisticinteractions with virtual game objects. For instance, acollision event can be triggered by the proxy particles whenthe user touches or stomps the physical environment.The proxy particles are tracked using a depth-aware opticalflow algorithm. The 3D flow field pipeline uses a GPUimplementation of Brox’s algorithm [8] to compute opticalflow on the captured 2D depth video, updating the proxyparticles to follow moving objects. While flow is typicallycomputed on the RGB image, we do not use the color video,as projected content can lead to incorrect flow results. Thecomputed 2D flow field is re-projected onto the depth datato generate the 3D displacements of the proxy particles.

Gun/Pointing InputTo support pointing at a distance, RoomAlive supports apointing/gun controller. The gun controller uses anintegrated IR LED matching the Kinect’s infrared band-passfilter. Optics within the gun focus the light, generating a lightspot when the player presses the trigger. The light is observedby the IR camera and the target 3D location is recovered.Traditional Controller InputIn addition to natural user interactions, RoomAlive alsosupports traditional physical game controllers, such as aMicrosoft Wireless Xbox controller. This allows users tointeract with whole room augmented games using the sameinput affordances as traditional television gamingexperiences. The controller is connected to the server, whichdistributes user interaction to the clients.RenderingProjection mapped content can only physically appear wherethere is a physical surface. However, virtual content canappear to be at any arbitrary 3D location, by displaying aperspective rendering of the virtual content on the surface,from the view direction of the player.RoomAlive tracks the player’s head position and renders allvirtual content with a two-pass view dependent rendering [5](see Figure 6). Content that is aligned with the physicalenvironment can be rendered realistically without a need forview dependent rendering. A unique challenge is the possiblepresence of multiple users. While multi-user viewpointrendering remains an open problem [38], RoomAlive uses asimple approach by averaging users’ head positions. Inpractice, 3D scene geometry that strays farther than existingphysical surfaces makes RoomAlive a single-viewerexperience. For scenes where the virtual content is near thephysical surfaces, rendering with the average head positionoffers a good working solution [13].Another challenge arises from the use of existing roomfurniture as projection surfaces which may contain a nonwhite surface albedo. To overcome this problem, aradiometric compensation [32] is applied to compensate theprojected image for the color of the surface. This process islimited by the brightness, dynamic range and color space ofthe projector and some desired surface colors maybeFigure 6. (a) Virtual objects that exist off-surface like this gamecharacter are rendered in a view dependent manner withradiometric compensation. (b) A virtual cube placed in front ofa wall and viewed straight on. (c) The same cube viewed from amore oblique angle to the right.Figure 7. Playing room adaptive Whack-A-Mole with an IRgun. First, a crack appears to draw the user’s attention.unachievable. For example, a solid red object in our physicalenvironment cannot be made to appear completely green.EXAMPLE EXPERIENCESWe envision the RoomAlive system supporting a diverserange of applications, limited only by the imagination ofgame designers. Four example applications were prototypedto demonstrate the potential of whole-room augmentedexperiences. These experiences were developed inpartnership with a game designer, and illustrate theinteractive capabilities of the system. These experiencesrepresent a limited survey and are not an exhaustive list. Anaccompanying video showcases the experiences.Whack-A-MoleThe Whack-A-Mole experience demonstrates a combinationof whole body movement, touch and gun/pointing input.Similar to the popular arcade game, in Whack-A-Mole, usersrace to whack, stomp, and shoot moles that randomly appearin the living room. The moles are generated uniformly acrossthe entire room. First, a crack appears on a surface in theroom (see Figure 7). Then an audio clip plays, “I’m overhere”, attracting the user’s attention followed by a 3Danimated mole that emerges from the crack. The mole isrendered from the player’s viewpoint and casts appropriateshadows onto the physical room. Figure 1c-d shows a playerwhacking and shooting a mole on a wall and floor. A virtualscoreboard on the wall counts the player’s achievements.Robot AttackRoomAlive also supports non-physically realistic gamingmechanics that create entirely new experiences in the livingroom. Robot Attack (based on Unity3D’s Angry Bots) allowsa user to control a virtual character that can run across thewalls, floor, chair, etc. (see Figure 8 and Figure 1e). Thecharacter is entirely constrained to the surfaces of the livingroom. The character can move forwards, backwards, left,Figure 8. (left) A virtual character runs up the side of abookshelf. (right) A virtual character fights a robot tank.

right on the surface, but not off of the surface. As thecharacter moves around the living room, surface constrainedenemy robots appear, tracking and firing weapons at thevirtual character. The character must defend and shoot backagainst the robots. The surface constrained nature of theexperience enables the game characters to walk up walls,adapting their orientation to the normal vector of the surface.Weapons also follow a similar behavior, where ‘bullets’remain surface constrained going up and around objectsrather than bouncing away. The control of the game is doneusing a traditional game controller.TrapsWhile the previous experiences demonstrate virtual objectsnear a physical surface, virtual objects can also appearunattached to a physical surface. In Traps, a user issurrounded by virtual traps and must navigate the physicalenvironment to avoid being hit by virtual darts. Thisexperience is inspired by many adventure games whereplayers must run, dodge and advance through complexobstacles. If a user navigates through a trigger volume, dartsare emitted and collide with physics-enabled proxy particlesthat represent the user and any dynamically moving object.Because the darts move through open space, they arerendered in view dependent fashion. If the user is hit, bloodparticles are emitted at the collision location, which followsthe user based on proxy particle tracking (see Figure 9).Setting the StageImagine being able to instantly transform your entire roomto match the environment of a video game or film.RoomAlive can use projected light to automatically augmentthe appearance of the entire room, procedurally changing thesurface color of objects in the room environment. Figure 1(a-b) shows two examples of procedurally generatedenvironments. Any Unity3D material, including proceduralmaterials, can be easily assigned to a part of the roomgeometry with a few clicks. The automatically generatedtexture coordinates are used for each room polygon. Gamedesigners can specify semantic groups to be given a texture(e.g., vertical surfaces). We demonstrate special effects thatuse 3D scene information, such as rain, or animated insectscrawling over floors and tables (Figure 14).DESIGN CONSIDERATIONSIn order to create experiences for RoomAlive, game designersmust consider the unique benefits and challenges associatedFigure 9. Blow dart traps pop out of the wall, forcing the user tododge. If the user is hit, battle damage shows on their body.Figure 10. The room is transformed (via the extracted polygonsand texture coordinates) into: (a) a cockroach infestation and(b) a futuristic style game room.with interactive, adaptive, projection-mapped game content.Four critical aspects should be considered when designingroom sized augmented experiences.Input Choice TradeoffsRoomAlive supports a variety of input techniques. Some,such as touching or stomping, can enable physicalinteractions, which move users around the room. However,certain parts of a living room environment may be lesssuitable for touch-based experiences. For example, anexpensive vase or painting may be inappropriate for directcontact. Particularly for children’s games, designers maywant to limit touch interaction to the floor and walls (largevertical planes). Users could also tag fragile objectsremoving them from areas of play. Alternatively, theexperiences could rely solely on whole body movement,pointing and shooting, or traditional controller input.Pointing and controller input are ideal for controlling objectsat a distance, such as a virtual car driving across a wall. Wegenerally found that experiences were most enjoyable whenall forms of input were available to users.Capturing User AttentionWhen the display surface consists of a large living roomenvironment, directing a user's attention becomes a criticalaspect of designing the experience. If a user is on one side ofthe living room, they may not realize that their attention isneeded on the opposite end of the room. Surround sound maybe used to direct the user’s attention to an approximatelocation in the room. Also the designer can incorporate awarm-up animation into the game design, for instance inWhack-A-Mole, a large crack appears seconds before themole pops out, attracting the user’s attention by appearing intheir peripheral vision (Figure 7).Selecting Game PhysicsLiving room projected experiences enable game designers tocreate unique interactions that may not be physicallyrealistic. The concept of gravity can be applied globally toall virtual objects in an experience, as is done traditionally ina typical game. Alternatively, gravity can also be appliedlocally to a virtual object, constraining the object to a surface[19]. For example, a ball rolling on a surface can bounce offthe wall or continue rolling up the wall depending on theexperience. A local surface adherence model requires a localcoordinate system, where each virtual object’s local “floorplane” lies tangent to a surface. For instance, a virtual

Figure 11. Calibration errors between units result in ghostingartifacts in projector overlapping regions. (left) The characterin a single projector, and (right) in an overlapping region.character's “up” direction would be the normal perpendicularto the tangent plane on the surface (see Robot Attack).Controlling Player MovementIn traditional video games, the user is represented as anavatar in a virtual world. The game designer has explicitcontrol over the actions that are possible in the virtual world,including where the user’s avatar can walk, what objects theycan interact with, etc. In a RoomAlive experience, a player’smovement and actions are completely controlled by the user,and therefore uncontrollable by the game designer. Imaginea game that involves building physical pillow forts for avirtual snow fight. The game designer cannot stop a userfrom knocking over another user’s pillow fort. A gamedesigner cannot control where a user walks or the surfacesthey interact with. Everything is fair game. Therefore, thedesigner must take care to handle edge cases and ensure thatthe game mechanics guide the user into a desirable behavior.SYSTEM LIMITATIONSWhile RoomAlive enables new and exciting interactionpossibilities in the living room, there are several challen

mapping experiences that dynamically adapts content to any room. Users can touch, shoot, stomp, dodge and steer projected content that seamlessly co-exists with their existing physical environment. The basic building blocks of RoomAlive are projector-depth camera units, which c