A High Performance Framework For Agent Based Pedestrian Dynamics On GPU .


A HIGH PERFORMANCE FRAMEWORK FOR AGENT BASED PEDESTRIANDYNAMICS ON GPU HARDWAREPaul RichmondDr Daniela M. RomanoDepartment of Computer ScienceUniversity of SheffieldRegent Court, 211 PortobelloSheffield, UKWeb: http://www.dcs.shef.ac.uk/ paulE-mail: paul@dcs.shef.ac.ukKEYWORDSGraphics, Parallel methods, Interactive simulation, Realtime simulation, Scientific visualization.ABSTRACTPedestrian simulations have recently focused on top-downimplementations ignoring more computationally intensiveagent based dynamics. This paper presents a framework foragent based modelling on the Graphics Processing Unit(GPU) which demonstrates large scale pedestrian simulationand rendering. GPU hardware offers significantperformance for dynamic large crowd simulations, howeverthe process of mapping computational tasks to the GPU isnot trivial and expert knowledge is required. An agent basedspecification technique is therefore presented, which allowsthe underlying GPU data storage and agent communicationto be hidden. The framework allows the use of static mapsto set static environment obstacles and a zoning technique isdescribed for route planning. Parallel population feedbackroutines are also used to implement Level of Detail (LOD)rendering which avoids any costly CPU data read-back.INTRODUCTIONAgent based dynamics are particularly well suited toreplicating pedestrian and general crowd behaviours. Theindividual style of modelling closely resembles the cognitiveprocess of decision making and allows complex interactionsto emerge from specifying a number of simple rules. Whilstearlier work (Reynolds 1999) has focused on thespecification of such rules in agent based environments,more recent work uses non agent based approximationtechniques (Treuille et al. 2006, Courty and Musse 2005) toavoid direct evaluation of inter-agent interactions. Suchtechniques are computationally cheaper and often map toGraphics Processing Unit (GPU) hardware offeringsignificant performance improvements. This has becomeparticularly important as urban environment visualisationhas scaled in accordance to GPU hardware requiring farlarger population sizes at interactive rates.The trend towards utilising the GPU for dynamicssimulation is well documented (Harris et al. 2002). Theparallel nature of GPU processing offers a massive speed upopportunity for algorithms, which are written withparallelism in mind. In the case of large scale real-timepedestrian dynamics, where populations can reach tens ofthousands, utilisation of the GPU is almost a requirement.Without maintaining positional data of pedestrians in GPUmemory, the transfer from system memory quickly becomesa significant bottleneck and limiting factor. GPU basedalternatives have the significant advantage of maintainingdata where it is required for rendering, eliminating thisproblem entirely.Despite the obvious performance advantages of harnessingthe power of the GPU, a significant disadvantage is thedifficulty of mapping computational tasks to the GPUsarchitecture. Whilst CPU based programming allowssimultaneous read write access to DRAM, the data flow ofthe GPU is more closely resembled by a stream processor.Traditionally this has been made available to programmersthough varying programmable stages (the vertex, geometryand pixel stages). Those that use the GPU for generalpurpose computation, or General Purpose computation onGPUs (GPGPU), have used these programmable stageswithin a traditional graphics environment by placing datainto textures and using the fragment processor to invoke aquad of parallel processes. More recently the introduction ofNVIDIA CUDA and unified programmable stage processorshas allowed more direct access to graphics hardware,removing the requirement to deal directly with textures andgraphics primitives. Whilst this is advantageous the lack ofstandardisation between hardware vendors makes traditionalgraphics GPGPU the most flexible option. Further moreexpert knowledge of the underlying hardware is required ineither case to achieve the best performance results.This paper presents an agent based framework forpedestrian modelling on the GPU using graphics basedGPGPU. Its primary goals are to hide the underlying GPUdata storage and agent communication, allowing user focuson agent based pedestrian scripting. It builds upon previouswork (Richmond and Romano 2008) allowing C likescripting to be used for agent behaviour with acomplementary Object Orientated API for setting up initialconditions and initial agent data. Static maps can be set

allowing agent interaction with obstacles which are alsoused to demonstrate route planning through theenvironment. Pedestrian animation and rendering isimplemented through a key framed animation techniquewith a level of detail system which is maintained entirely onthe GPU. Whilst the agent based pedestrian dynamics usedin this paper are based on simple but well established work(described in the following section), it is advised that thedescribed framework is suitable for inclusion of moreadvanced cognitive and emotional models (Romano et al.2005).Much attention in this work is given to a spatial partitioningtechnique which distributes pedestrians across theprocessing node with communication between them handledwith a Message Passing Interface (MPI) library. Thistechnique, as also demonstrated more generically by theFlexible Large-scale Agent Modelling Environment(FLAME) (Adra et al 2008), is one which inspires thiswork. The use of spatial partitions is essential in producinglarge simulations and scalable algorithms, particularly withrespect to pedestrians where there is no need for totalcommunication between all agents.AGENT BASED PEDESTRIAN DYNAMICSCoutry and Musse (Courty and Musse 2004) present a GPUimplementation of pedestrian dynamics which is inspired byHelbings social forces model (Helbing and Molnar 1995).Its uses a cellular system which allows agents to movebetween empty cells in discreetly partitioned space. As eachcell occupies at most a single agent the social repulsive forceis calculated on the fly during the update stage byconsidering a neighbourhood of surrounding cells.Improving upon this by avoiding the potential of agentsoccupying the same grid space, Courty and Musse (Courtyand Musse 2005) propose an implementation of adynamically created force field texture. This technique usesGPU hardware blending to compute the sum of all forces foreach discrete partition. The result is rendered to an offscreen texture, the force field texture, and is then usedduring the update stage to avoid direct communicationbetween pedestrians is the system. Similarly to this Treuilleet al. (Treuille et al. 2006) uses the same technique toconstruct a density field of pedestrians which is in turn usedto determine a more complex discomfort field. Thediscomfort field is constructed in real time by consideringlocal collision avoidance and global path planning.Performance of this technique is dependant on theresolution of the discomfort grid. With a grid of 60² Treuille(Treuille et al. 2006) reported performance of 10,000pedestrians 5fps before rendering in-between frames.Agent Base Modelling (ABM) of dynamic systems gainedpopularity through Craig Reynolds who, in 1987demonstrated flocking bird behaviour through an agentbased Boids model (Reynolds 1987). Unlike dynamicsystems controlled through sets of partial differentialequations, ABM examines the low level individualbehaviours which when part of a social system result inemergence of non predictable social events. In the case ofthe Boids model the low level behaviour can be described bythree simple rules; separation at close range, attraction tothe perceived flock centre and velocity matching ofperceived neighbours. This concept is extended in Reynoldslater work (Reynolds 1999) to include advanced behaviourssuch as pursuit, evasion, obstacle avoidance and flow fieldfollowing for general agent locomotion.Helbing (Helbing et al 2002) introduced the social forcesmodel as a class of equations which balance variousenvironmental and social influences on pedestrians in bothnormal and panic situations. Essentially this builds uponprevious work (Helbing and Molnar 1995) whichmathematically formalises the concept of social forces as abehavioural description of conflict situations. Mostimportant in this model is the description of a repulsivesocial force acting between pedestrians. In its simplest form,this force acts as a mechanism for collision avoidancegiving greater precedence to situations in font of thepedestrian, over situations occurring behind. Additionallylonger range attractive environment forces such as attractionto window displays, special attractions or acquaintances aremodelled using a force summation which are combined withthe social repulsive force. In emergency situations anadditional short range body force and sliding friction forceis described which avoids compression in high densitypopulations. Whilst simplistic with respect to the dynamicalrules, complex natural pedestrian activity can be observed inHelbings (Helbing and Molnar 1995) work. Phenomenonsuch as lane formation and intersection dynamics are justtwo of the emergent behaviours empirically observed in realworld conditions.Building upon Helbings (Helbing and Molnar 1995) work,whilst considering large population sizes, Quin et al. (Quinet al. 2003) demonstrated a technique which used aSWARM cluster of 10 processors to simulate 10,000evacuating pedestrians using the social forces method.As recent pedestrian modelling has tended towards lessagent based techniques, it is no surprise that recent highperformance agent based simulations are concentrated onmore general agent based and swarming applications.Reynolds more recent work on PSCrowd (Reynolds 2006),demonstrated through schools of fish, is analogous topedestrian behaviour and presents a 2D environmentperformance of 15’000 low resolution (36 polygon) fish at60fps. The technique used to implement this is based on theprevious work (Reynolds 1999) and uses a spatiallypartitioned 3D or 2D environment. The high performancespeed is achieved by utilising the Playstation3’s cellprocessor to batch a number of spatial buckets across to thecell processors 8 parallel Synergistic Processing Units(SPU’s). Unlike Reynolds earlier work, the neighbourhoodheuristic in this case is based on N-nearest neighbour, witheach agent maintaining its own neighbourhood list. Asinteraction is not based on a defined radius or limitedvision, this technique is highly dependant on the size of theneighbourhood lists.

More general agent based work has recently beendemonstrated by D’Souza et al. (D’Souza et al. 2007). Thiswork concentrates on implementing a number of 2D agentbased models on the GPU with particular attention toperformance. Similar to the previously described methods,spatial partitioning is used however each partition isresponsible for maintaining only a single agent. Althoughthis likens the system to that of cellular one, agents havecontinuous values and collisions (multiple agents occupyingthe same space) are resolved using a multi pass prioritysystem. Agent communication is achieved in the samefashion cellular automaton and performs an expandingsearch into neighbouring cells. Whilst this technique boastsa performance of up to 2 million simple agents in real time,it is important to understand the limitations of such atechnique when applied to a pedestrian system. Firstly finegrained partitioning requires large amounts of space. If alarger partition size is used, space is saved with the trade-offthat there will be more costly collisions to resolve. Secondlyif agents are well distributed, particularly into local groupsacross a large area then large amounts of partitioned spaceare wasted by holding no useful data. Decreasing thepartition size again helps to resolve this however as in thefirst case higher concentrated areas have to then resolveadditional collisions.AGENT BASED MODELLING ON THE GPUAs this work aims to present a useable framework for agentbased pedestrian specification, rather than a singlesimulation example, the mapping of agent data to the GPUis a process which it is important to abstract. The ideabehind this is that both agent scripting and theprogramming interface should be able to get and set agentdata variables without explicit knowledge of the underlyingGPU memory architecture. This is achieved through atranslation function ‘F’ which provides a mapping for bothagent update scripts and the getting and setting of initialagent data into a number of 2D stacked 32bit floating pointtextures (agent space). Similarly to previous work onparticle systems (Latta 2004, Kipfer et al. 2004) a 1D list ofvariables is easily translated into 2D texture space with a 2Dposition (i, j) in each stacked texture ‘t’ representing anindividual set of data. As texture access is read/write only,‘2t’ textures are required in total with data being stored inup to all four of the red, green, blue and alpha colourchannels respectfully (figure 1). Within OpenGL the FrameBuffer Object extension allows simultaneous writing toMultiple Render Targets (MRTs), on our implementationhardware (a NVIDIA GeForce 8800 card) up to 8 targets aresupported giving a total of 32 agent variables. For apotential non communicating system of agents, simulationis achievable by performing N parallel operations byrendering a single quad primitive. Assuming the quadprimitive is the same dimension as agent space andrendered from an orthogonal perspective, rasterisation willinvoke a parallel operation for each of the (i, j) agentpositions.Figure 1 - Mapping of three agent variables to 2D TextureSpaceIn many of the previous examples partition space is used toeither directly hold agent data or hold pointers to data inagent space. Similarly this work splits space into discretespatial partitions however the partition size is equal to theagent’s communication radius with each partition holdingany number of agents. Agent communication can then beachieved by consulting all agents within an agents own andneighbouring partitions. On the CPU this task is trivial andeach agent in a partition can be stored as a linked list. Onthe GPU this is significantly more difficult as dynamicstorage sizes are unsuitable for parallel computation. Insteada dynamic structure is computed before each agent updateand adapting techniques used in rigid body particles physics(Harada 2007, Green 2007) a matrix is calculated allowingeach agent within each spatial partition to be located inagent space. The first stage of the technique involves sortingthe agents in agent space depending on their location. Aswith particle based work (Kipfer et al 2004, Green 2007)each partition is assigned a unique spatial bin identifierbased on their x and y position. Agents can easily determinetheir location identifier and are sorted using cache efficientbitonic sorting (Govindaraju et al. 2005) into a theoretical1D (stored physically in 2D texture space) list according tothis. A vertex scattering technique is then applied, where foreach agent a vertex is drawn at the origin with texturecoordinates allowing each of the agent sort values to belooked up from agent space. Each sort value is compared tothat of the previous agent and if the sort value is the first inthe sorted 1D list then a linear search is performed until thelast sort value is also found. Assuming an off-screen buffersize with the same dimensions as partition space is used, thestart and end positions of each partition identifier can bescattered by offsetting the vertices position with the startand end indices preserved as texture coordinates. Verticeswhich are processed and are found not to be boundaryindices are simply positioned outside of the off-screen bufferand ignored. The end result of this is that each occupied

partition space in the off screen matrix contains the startand end positions of all agents within it. This is then used inthe same way as a CPU list to iterate through each agenttesting for agents within the communication radius. Whilstnot all agents tested will be suitable for communication thisimproves dramatically on the worst case O(n²) and unlikenearest neighbour techniques guarantees communication forall agents within the set radius.to store an x and y (goal) point value for each of the zones.These are used as goal points which the agent will movetowards. In environments where zones consist of complexobstacles with narrow gateways such as door ways,experimentation has shown that the doorway itself must usea zoned area. This step ensures that pedestrians intelligentlypass through gateways avoiding a situation where theredesired path is blocked.AGENT BEHAVIOUR SCRIPTINGUsing the API to simply get and set global 2D data it ispossible to encode environmental forces (Courty and Musse2005) avoiding the costly computation of each agent wallcombination as described in Helbings model (Helbing andMolnar 1995). As the granularity of this environmentalforce field texture is independent from agent interactionlarge grained force fields can be used, providing they havesufficient detail to capture the smallest static obstacle. 2Ddata arrays are stored in GPU memory as textures and as aresult up to 4 values can be stored in each array. For thepurposes of obstacle avoidance the first two components ofthe array (red and green) are enough to hold a directionalvelocity. In the remaining components of the array theexamples in this paper store a height map value, used forrendering static geometry at a later stage and a non regularenvironment space identifier. This identifier has the purposeof zoning the environment to allow longer range pathplanning where pedestrian’s behaviour is beyond that of arandom walk. Figure 2 demonstrates an environment mapwith (black) static obstacles split into 6 zones. Twoadditional 2D data sets encode firstly a navigation lookupgrid which indicates the next zone to move into from thecurrent zone (vertical) to the desired long range goal zone(horizontal). Secondly an additional set of data is requiredzone 1zone 2zone 3zone 5zone 6Goal XGloa Y123456zone 4ZoneZoneZoneZoneZoneZoneUse of the agent based pedestrian API is dependant on anappropriately defined agent update script. Agent scripts aredefined in a C like environment and must override afunction, agentMain, accepting an agent and globalvariables structure as arguments and returning a singleagent structure. As reference to variables in the agentstructures are mapped at compile time with the previouslydescribed mapping function, agent update scripts are free toreference agent variables directly. With respect tocommunication, agent scripts use two placeholdersFOR EACH AGENT A, and END FOR EACH to hide theunderlying algorithm. Further vision test can be applied toeach agent if the desired behaviour requires this. For agentpopulations with no environmental influences this is all thatis required to compute an agent’s new internal variables andhence provide a convincing simulation. The globalsargument to the main update function offers an extension tothis by allowing global static variables to be set betweeneach update stage. Single Float and 2D data array valuescan be set using the programming interfacesAgentPopulation class. In addition to this role the class alsohandles the initial setup, agent script compilation (to validcg code), agent initialisation and stepping of the simulationloop.Zone 1Zone 2Zone 3Zone 4Zone 5Zone neZone123456xxxxxxyyyyyyFigure 2 – A simple illustrative zoned environment withcorresponding data tablesPEDESTRIAN BEHAVIOURThe pedestrian behaviour in this paper is influenced mainlyby Reynolds work (Reynolds 1999) however as many factorsdraw strong parallels with Helbings social forces model(Helbing and Molnar 1995) the exact rules are somewhat ofa hybrid. Formally the following equation (1) is able todescribe the force exerted on each pedestrian during theupdate stage for all examples within this paper.Fi Ri Cri Gi Mi(1)The total force exerted on each agent Fi , is the result of asocial repulsion force Ri , a close range interaction force Cri ,a short term goal force Gi and an environmental force fieldforce Mi . This force is then used as a directional steeringforce to make some change to the pedestrian internalvelocity. This altered velocity is then checked to ensure itdoes not exceed the pedestrians maximum velocity and ifrequired the velocity is normalised accordingly. The socialrepulsion force uses the same preference to events in thedirect line of sight as in Helbings (Helbing and Molnar1995) model. The symbol λ is simplified to represent ascalar value indicating the size of angle between thepedestrian i’s line of sight and pedestrian j’s position.Additionally as in Reynolds work (Reynolds 1999) agents

are given a limited vision which filters agents outside theirfield of view. More formally the following equation (2)describes the force Ri where the letter j represents onlyagents from within the limited vision filter.0 IRi S ij Pi Pj i 2j ( Pi Pj ) (2)The static value S indicates a scalar value controlling theinfluence of the social repulsive force. The positions Pi andPj represent the positions of agent i and j respectfully andthe distance between them represents the directional forcevector between the two agents. This force is further scaledby the inverse square of the distance between the two agents.The value I is used to scale the effect of the inverse distanceeffect and in most examples has been tweaked depending onthe interaction radius between agents. Unlike the socialrepulsive force, the force Cri is independent of the directionbetween agents. Its influence is over a far smaller radius arerarely has effect on pedestrians unless there is a highconcentration in which it acts mainly as collision avoidance.The force Gi acts as an influence towards a specific goalpoint in the environment. In the case of a random walk thegoal position is directly in front of the pedestrianencouraging them to follow their current path. In casesinvolving longer range goals, this goal value is found as aresult of environmental lookups as described in the previoussection.PEDESTRIAN VISUALISATIONPedestrian rendering is achieved through two very differentmethods. The first of these uses a primitive directionaltriangle. In this case each triangle primitive is rendered atthe origin with multi texture coordinates reflecting an agentposition in (i, j) agent space. A vertex shader then looks upagent positional and velocity data and translates and rotatesthe primate object accordingly. As an entire populationrepresented as primitive objects contains relatively fewOpenGL calls in total, the entire population can be stored ina single display list. For more advanced pedestrianrepresentations where individual model sizes become muchlarger this technique quickly becomes unsuitable. In suchcases it is necessary to store only a single modelrepresentation in a display list. The single display list isthen called for each agent with the multi texture coordinatevalue set before the display list is called, allowing eachinstance of the model to be translated by a differing set ofagent values.As pedestrians do not move as static objects it is importantto animate them with walking behaviour as they movearound the environment. Key-framing provides an idealanimation technique as it is computationally cheap and iseasily capable of representing simple human locomotion.Through experimentation it is evident that reasonablewalking behaviour can be achieved through interpolationbetween only two key frames. For improved fidelity multiplekey frames can be used however as all draw calls are storedin a single display list it is necessary to store the positionaland normal information for each key frame model in thelist. Whilst this has a visually improved effect on animationof close pedestrians the overall performance degradationmakes interpolation between two key frames the preferredoption.With the previous technique every pedestrian in thepopulation is rendered with the same detail level. A moresuitable technique is to therefore apply a LOD renderingsystem which varies the pedestrian’s fidelity depending ondistance to the viewer.This is achieved through the use of a generalised feedbacksystem available for retrieving data about the agentpopulation without CPU read-back from the graphics card.Parallel reduction is used to reduce values in agent space tosingular values for a number of common reduction functionssuch as minimum, maximum, sum and count. For thepurposes of a LOD system is it required that the totalnumber of each detail level is know. An agent variabletherefore is used to hold a LOD level and is calculatedduring the agent update stage. A reduction function for eachdetail level then uses a filtered count function, counting onlythe number of occurrences of the specified detail level. Afterthe parallel reduction is complete the agent data must thenbe sorted according to the LOD levels. This ensures thatcalling a display list for each detail level, the number oftimes reported by the feedback step, matches the detaillevels to the agent data. Whilst this technique iscomputationally more expensive due to the secondary sort, itallows a massive reduction in rendering overheads whenhigh resolution models are required. This technique isdemonstrated in figure 3 which shows pedestrians colouredby their corresponding LOD. Additionally the sametechnique can be applied to achieve variance in pedestrianrepresentation. In this case varying pedestrian models eachwith an associated value are used within the population withthe value used as feedback and sorting key. This techniquealso allows simultaneous LOD rendering as long as eachLOD pedestrian representation combination has anassociated identifier and display list representation.

Figure 3 - 65'000 fully interacting agent based pedestrains rendered by LOD levelPERFORMANCE AND RESULTSIn order to test the simulation and rendering performances arange of pedestrian population sizes have been tested usingvarying rendering fidelities. The results obtained are basedon a single PC with an AMD Athlon 2.51Ghz Dual CoreProcessor with 3GB of RAM and a GeForce 8800 GT. Inall cases pedestrian behaviour incorporates all forcesdescribed in equation 1. Figure 4 additionally demonstrate amore complex force map representing the Peace Gardensarea of Sheffield city centre. The results of a pedestriansimulation compared to satellite imagery are also show infigure 9. Simple zoning is demonstrated in figure 5 and hasan immeasurable performance effect on the simulationcompared to the following results presented in this section.Figure 4 - A Force Map of Sheffield Peace GardensFigure 5 - Zoning around a congestion point (Black toWhite boundaries represent walls)Simulation performance is considered by increasing thepopulation size whilst maintaining a roughly staticpopulation density. Figure 6 demonstrates the populationdensity by showing the number of pedestrians considered forcommunication (lookups) and the number actually insidethe pedestrians communication radius (communications).As the population density is constant the communications tolookup ratio remains at roughly 35% in all cases. Figure 7shows a performance chart which demonstrates the effect ofincreased population sized on performance. Two pedestrianvision (or interaction) radii are used, the first of 4m issuggested in Helbings work (Helbing and Molnar 1995)whilst the second 32m radius acts to demonstrate theperformance in cases of longer range social force planningor higher congestion population densities. In both cases anenvironment force map is used to simply direct pedestriansaway form the outer edges of the environment. From theresults it is clear that the simulation performance which alsoincludes the basic direction triangle rendering is suitable for

large population sizes. More specifically interactivepopulation sizes of 262’144 pedestrians can be maintainedat 13 fps or 65’536 at 42 fps for a 4m pedestrian vision.LookupsCommunications350300250200Detail Level 0400Detail Level 1Detail Level 2832Vision (m)Figure 6 - Pedestrian vision compared to pedestrian lookupsand inter-agent communications5504m vision32m vision500450Frames per Second (FPS)Billboards450Fram es per Second (FP 655362621441048576Pedestrain CountFigure 7 - Simulation and rendering performance forprimitive agents with 4m and 32m visionIn order to consider more advanced rendering polygonmodels of increasing complexity have been used (detail level0 represents lowest complexity, detail level 2 the highest).Figure 8 demonstrates that even at relatively highpopulation sizes of 16384 pedestrians, high detail models(level 2) of up to 1000 polygons can be used at 40fps. Themore modest pedestrian representation of 400 polygons(level 1) with the same population size achieves aperformance of 40 fps and the more intelligent dynamicLOD rendering improves further achieving 50fps. What isnoticeable is the cost of using a dynamic LOD for agentpopulation sizes of less than 1024. In fact when agent sizesare this small the overhead of the additional sort stepsuggests that high resolution models for agents can be usedwhilst sustaining a performance of almost 200fps.256102440961638465536Pedestrian Population2621441048576Figure 8 - Simulation and rendering performance foradvanced pedestrian rendering at various detail levelsWith respect to previous work the performance of thesimulations in this paper are beyond that of existing socialforces implementations (Courty and Musse 2004, Courtyand Musse 2005). Whilst improvements in GPU hardwarehave some part to play in this, the decision to m

environment. Pedestrian animation and rendering is implemented through a key framed animation technique with a level of detail system which is maintained entirely on the GPU. Whilst the agent based pedestrian dynamics used in this paper are based on simple but well established work