Audio Engineering Society Convention Paper - Montana State University

Transcription

Audio Engineering SocietyConvention PaperPresented at the 145th Convention2018 October 17–20, New York, NY, USAThis Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at leasttwo qualified anonymous reviewers. The complete manuscript was not peer reviewed. This convention paper has been reproducedfrom the author's advance manuscript without editing, corrections, or consideration by the Review Board. The AES takes noresponsibility for the contents. This paper is available in the AES E-Library, http://www.aes.org/e-lib. All rights reserved.Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the AudioEngineering Society.Audio Forensic Gunshot Analysis and MultilaterationRobert C. Maher and Ethan R. HoerrElectrical & Computer Engineering, Montana State University, Bozeman, MT USA 59717-3780Correspondence should be addressed to R.C. Maher (rob.maher@montana.edu)ABSTRACTThis paper considers the opportunities and challenges of acoustic multilateration in gunshot forensics cases. Audioforensic investigations involving gunshot sounds may consist of multiple simultaneous but unsynchronized recordingsobtained in the vicinity of the shooting incident. The multiple recordings may provide information useful to the forensicinvestigation, such as the location and orientation of the firearm, and if multiple guns were present, addressing thecommon question “who shot first?” Sound source localization from multiple recordings typically employs timedifference of arrival (TDOA) estimation and related principles known as multilateration. In theory, multilateration canprovide a good estimate of the sound source location, but in practice acoustic echoes, refraction, diffraction,reverberation, noise, and spatial/temporal uncertainty can be confounding.1 IntroductionAudio forensic evidence is of increasing importance inlaw enforcement investigations because of the growinguse of personal audio/video recorders carried byofficers on duty, the routine use of dashboardaudio/video systems in police cruisers, and theincreasing likelihood that images and sounds fromcriminal incidents will be captured by public or privatesurveillance systems or mobile handheld devices. Insome cases, gunshots and other firearm sounds may becaptured by these recording devices [1, 2, 3].In the United States, criminal actions involving firearmsare of ongoing concern to law enforcement and thepublic. For example, the Bureau of Justice Statistics

Maher and HoerrForensic Gunshot Multilaterationreported over 467,000 individuals experienced a nonfatal criminal incident involving a firearm in 2011, andover 11,000 individuals died from firearm homicidesthat year [4].In some cases, the sound of gunfire may be recordedsimultaneously by two or more different microphoneslocated in proximity to the scene of the shooting. Audioforensic examiners may be asked to reconstruct ashooting scene based on these multiple audio recordingsusing the relative time-of-arrival of the gunshot soundat the several microphone locations—assuming therecordings are somehow synchronized in time, and theprecise spatial location of the microphones is known.Although commercial gunshot detection andlocalization systems are available using deliberatelydeployed and time-synchronized microphones at knownlocations [5], in the case of forensic reconstruction it ismore likely that an audio forensic examiner will need touse an ad hoc collection of recordings from mobileaudio recording devices that happened to be in acousticrange of the gunshot. Along with the need to estimateproper time synchronization, the examiner will alsohave to estimate the microphone positions, potentialreflections, and the local speed of sound determined byan estimate of the air temperature.This paper is organized as follows. First, we review themathematical formulation of acoustic multilateration,which is the proper general term describing techniquesto estimate the location of a sound source given a set ofspatially distributed observing microphones. Next, wepresent an example scenario involving gunshotobservations and representative forensic questionstypical of audio forensic cases. We simulate acousticalobservations with various degrees of uncertainty aboutthe time synchronization and microphone positions.Finally, we summarize the practical considerations withwhich audio forensic examiners need to be aware.2 Acoustic multilaterationWhen a source produces a sound pulse, the pulse willgenerally arrive at slightly different times at twospatially separated microphones. The time of arrivaldifference is due to the difference in path length fromthe source to each of the two microphones, assuming aconstant speed of sound. The use of the time differenceof arrival (TDOA) at two known microphone locationsidentifies a locus of possible positions of the sourcewith respect to the two microphones. The possiblesource locations comprise all positions that have thesame difference in distance from the two microphonesthat result in the measured TDOA. The source positionestimation based upon TDOA is known asmultilateration [6]. 1For example, consider a simplified two-dimensional(planar) case with two microphones located at knownCartesian coordinates (x1, y1) and (x2, y2), respectively,and the sound source at an unknown location (x, y), asshown in Figure 1. The pulse produced by the source at(x, y) arrives at the Mic 1 position Δt seconds before itarrives at the Mic 2 position.Figure 1: Example geometry for unknown sourceposition and known microphone pair.In general, the absolute time at which the sound pulseoccurs at (x, y) is not known, only the relative time ofarrival at the two microphones, Δt. Using the speed of1The TDOA multilateration approach is sometimeserroneously referred to as triangulation, a differentprocedure using angle measurements.AES 145th Convention, New York, NY, USA, 2018 October 17–20Page 2 of 9

Maher and HoerrForensic Gunshot Multilaterationsound, c, in meters per second, the path lengthdifference (path2 – path1) is cΔt meters. Thus, anypoint (x, y) for which the distance (path2 – path1) is cΔtcould be the location of the pulse source.From the example geometry, 𝑝𝑝𝑎𝑎𝑡𝑡ℎ1 (x x1 )2 (y y1 )2and 𝑝𝑝𝑝𝑝𝑝𝑝ℎ2 (𝑥𝑥 𝑥𝑥2 )2 (𝑦𝑦 𝑦𝑦2 )2 𝑝𝑝𝑝𝑝𝑝𝑝ℎ2 𝑝𝑝𝑝𝑝𝑝𝑝ℎ1 𝑐𝑐 𝑡𝑡(1)(2)(3)Accordingly, the unknown (x, y) coordinates of thepulse source need to satisfy Equation 3, given theTDOA Δt. Since there are two unknowns and oneequation, there are many possible solutions. Thesolutions can be found numerically or analytically.connecting the two microphones, and representing thex-axis (abscissa) of the Cartesian system. Thisgeometry, plan view, is shown in Figure 2. Any point(x’, y’) along the curve is a possible source location. Forexample, in this coordinate system the point (-xa, 0)represents a solution such that cΔt (x0 xa) – (x0-xa), orxa c(Δt)/2.In the new coordinate system, the equations equivalentto Equations 1-3 become: 𝑝𝑝𝑝𝑝𝑝𝑝ℎ1 (𝑥𝑥 ′ 𝑥𝑥0 )2 (𝑦𝑦′)2and(3) 𝑝𝑝𝑝𝑝𝑝𝑝ℎ2 (𝑥𝑥′ 𝑥𝑥0 )2 (𝑦𝑦′)2(4) 𝑝𝑝𝑝𝑝𝑝𝑝ℎ2 𝑝𝑝𝑝𝑝𝑝𝑝ℎ1 𝑐𝑐 𝑡𝑡 2𝑥𝑥𝑎𝑎 .(5)Inserting (3) and (4) into (5), squaring and expanding toisolate the radical, then squaring again to simplify intoa form with x’ and y’, the result is the familiarhyperbola formula:2 𝑥𝑥 ′ where2𝑥𝑥𝑎𝑎 𝑥𝑥𝑎𝑎2 𝑦𝑦 ′ 22 𝑥𝑥02 𝑥𝑥𝑎𝑎(𝑐𝑐 𝑡𝑡)24 1,.(6)(7)A compact calculation based upon aligning the origin ofthe coordinate system at the closest sensor is alsopossible [7].Figure 2: Multilateration geometry with translated androtated coordinate system.To find the analytical solution, it is convenientmathematically to rotate and translate the coordinatesystem so that the origin is placed at the middle of a lineFrom (6) and (7), it becomes apparent that themultilateration expression will have a singularity ifc(Δt) becomes equal to the microphone spacing, 2x0. Inother words, referring to Figure 2, the time differenceof arrival (Δt) will have its maximum value if the soundsource is in-line with the x-axis at a position x’ x0 andy’ 0, for which the hyperbola devolves into a linesegment. In general, if the microphones are closelyspaced (small x0), the acceptable range of Δt iscorrespondingly small (Δt 2 x0/c), so the precision ofAES 145th Convention, New York, NY, USA, 2018 October 17–20Page 3 of 9

Maher and HoerrForensic Gunshot Multilaterationthe time difference measurement is particularlyimportant, as will be considered later.With N 3 microphones in a two-dimensional plane,there are N-1 TDOA values, and these mic-to-micdifferences can be used to create N-1 hyperbolas. Inprinciple, the analytical procedure resulting inEquations 6 and 7 can be repeated for each timedifference, and the resulting intersection point of thehyperbolas would, at least theoretically, represent thelocation of the sound source.In practice, the audio recordings used to determine theTDOA will contain noise, and therefore the timeestimates will have some level of uncertainty. A timingdiscrepancy for the N-1 Δt values means that thesolution to the intersection of multiple hyperbolas isalso uncertain. The solution of multiple non-linearequations in the presence of noise is typically handledwith numerical solvers rather than a closed-formanalytical solution, and such methods are applicable inthe case of TDOA multilateration.If the sound source is not an impulse, the time delayestimate can be obtained using cross-correlation of thereceived signals. The reliability of the time estimateswill depend upon the characteristics of the signal andany interfering noise for the correlation operation.3 The forensic situationAs described in the introduction, a relatively commoncontemporary scenario in audio forensic gunshotinvestigations involves several audio recordings of ashooting incident captured simultaneously by multipleunsynchronized recording systems positioned in anindiscriminate manner. If the incident takes place out ofthe view of the camera(s), or occurs at night orotherwise without good recorded images, the audioinformation may be very important to the investigation.Such a scenario (see Figure 3) could involve the soundof a gunfire recorded simultaneously by severaldashboard camera systems in law enforcement vehicles,body recorders worn by officers, nearby commercial orresidential surveillance systems, and even mobilerecording devices such as cell phone video [8, 9, 10,11].Figure 3: Scenario with multiple simultaneous butunsynchronized audio recordings from spatiallydistributed microphones.Questions that may arise for the audio forensicexaminer could include [12]: What is the estimated location of the firstgunshot with respect to a reference position,based upon the audio evidence? Were the second and subsequent shots fromthe same position as the first shot?3.1Position estimationIn order to attempt multilateration using the availableaudio recordings, the first requirement is obtaining areliable estimate of the position of each recordingdevice at the incident scene. Depending upon thecircumstances and the type of recordings available, ward, or it may be quite ambiguous. Theaudio forensic examiner would need to determine anappropriate spatial reference point, such as anindividual with a microphone who is visible in one ormore of the dashcam videos, and who is standing at aAES 145th Convention, New York, NY, USA, 2018 October 17–20Page 4 of 9

Maher and HoerrForensic Gunshot Multilaterationspatially identifiable spot, such as near a street sign orfence post. The other recording devices would thenneed to be located with respect to the known referencepoint.In some circumstances there may be surveyinformation, diagrams, and maps prepared by crimescene analysts. In other circumstances there may be stillphotographs of the scene, witness recollections, or othersources of spatial information. A common challenge isthat recordings from vest cameras or mobile handhelddevices often take place while the individual is moving,not stationary in a fixed and identifiable location. Theaudio forensic examiner will need to determine all ofthe information and uncertainty associated with theestimated recording positions.3.2SynchronizationIn order to attempt multilateration using the recordings,the second requirement is synchronization. Themultiple recordings generally do not have a commonclock, although dashboard camera systems often haveat least two audio channels recorded synchronously.One channel is typically from a microphone inside thecabin of the vehicle, and the other channel is often fedby a wireless microphone worn by the law enforcementofficer. The various recordings may also have differentsampling rates and formats.The first step is to collect and catalog all of the availablerecordings. Files that are in compressed format (e.g.,MP3 or a proprietary codec) need to be decoded intostandard PCM .wav files. It is also convenient toperform high-quality sampling rate conversion to get allof the available audio at a common high sampling rate,such as 48 kHz.The second step is to start with recording channels thatare known to be synchronized, such as the dashboardrecordings mentioned above. If multiple dashcams wereat the scene, a useful strategy is to identify an audiblesignal that is common to all of the dashcams. Onepossible solution is to identify a signal transmitted fromthe law enforcement dispatch center and picked upsimultaneously by the radios in the various cruisers: thecabin microphone in each dashboard recorder wouldcapture the dispatch signal. The audible signal commonto the cabin recordings is used as a time reference toalign the associated recordings.If no dispatcher radio signal is available, and for otherbystander recordings, the examiner would need toattempt to find another common signal from a knownlocation at the scene, and arrange to use that signal fortime alignment.3.3TDOA estimationIn the case of gunshots, the onset of the sound may beclearly observed in each of the synchronized audiorecordings. The audio forensic examiner would need touse a waveform display program to identify thecorresponding time instants, or use a cross-correlationprocedure to find the time delay corresponding to thebest alignment.However, there are often significant problems in tryingto interpret distorted recordings if the microphonesystem was close enough to the sound that the signal hasoverloaded the recording system. For example, theoutput of some audio recording systems essentiallydrops out when presented with extremely loud sounds(Figure 4).If the recording device uses an audio speech encodingsystem (e.g., VSELP), the recording may not be able torepresent an impulsive signal such as a gunshot in amanner suitable for time-of-arrival determination. Inother cases the microphone may not be in a direct lineof-sight position with respect to the gunshot, so thereceived signal is actually reflected off nearby surfaces,diffracted around obstacles, or otherwise traveled overa path not radial to the sound source. The audio forensicexaminer needs to consider all of these issues [12].AES 145th Convention, New York, NY, USA, 2018 October 17–20Page 5 of 9

Maher and HoerrForensic Gunshot MultilaterationFigure 4: Example forensic audio recording with two simultaneous channels, from a confidential law enforcementsource. Upper: a wireless vest microphone, showing signal drop outs due to overloading. Lower: a microphone in thecabin of the law enforcement vehicle, showing clipping but no drop out.4 Example investigationTo contemplate several practical questions related tomultilateration from heterogeneous audio recordings ofa gunshot incident, consider the following twodimensional example scenario. A gun is fired at arbitrary location:[-20 meters, 36 meters].The sound is observed at four arbitrarymicrophone locations, for what we will refer toas Configuration 1:[-7, -3], [-3, 1], [2, 0], and [5, -1] meters.The distances from the source to themicrophones are:41.11 m, 38.91 m, 42.19 m, and 44.65 m(microphone 2 is closest). Using speed of sound c 343.2 m/s (airtemperature 20 C), the theoretical TDOAswith respect to microphone 2 are:Δt1 2 6.408 ms, Δt3 2 9.556 ms, and Δt4 2 16.736 ms.The plan view of this 2-D example configuration isshown in Figure 5.4.1Multilateration using Configuration 1Now running the multilateration algorithm given thespecific microphone coordinates and the expectedTDOAs to simulate a forensic location estimation task,the calculated source position is [-18.92, 33.59] meters,which is an estimation error of [-1.08, 2.41] meterscompared to the true location. The discrepancy is due tothe numerical calculations used in the multilateration.Position discrepancies become more likely if the sourceAES 145th Convention, New York, NY, USA, 2018 October 17–20Page 6 of 9

Maher and HoerrForensic Gunshot Multilaterationis located in a positon that is in-line with the inter-sensorvector, because this orientation obscures the sourcedistance.Figure 5: Example Configuration 1.4.2Multilateration using Configuration 2Next, we keep the source location fixed, but consider adifferent scenario in which the microphones are spreadslightly farther apart to new arbitrary locations, givingConfiguration 2: [-10, -3], [-4, 1], [5, 0], and [10, -1]meters, as shown in Figure 6.In this configuration the multilateration algorithmcalculates the source position to be [-19.97, 35.88]meters, which gives an estimation error of [-0.03, 0.12]meters compared to the true location. Wider sensorspacing can provide greater inter-sensor time delay, andlowered sensitivity to timing and position errors.4.3Position uncertaintyHowever, if the microphone locations are uncertain,what does this do to the multilateration solution? UsingConfiguration 1 and Configuration 2, we perform aMonte Carlo simulation in which we randomly moveeach of the four simulated microphone positions withina 0.5 meter radius of the specified position, whilekeeping the source location unchanged. The TDOAinformation is then used to find the estimated sourcelocation, which is compared to the true location.Figure 6: Example Configuration 2 (microphonesspread slightly more in the horizontal direction).While not a perfect simulation of the uncertainty foundin a real forensic case, the results are intended to showhow an imperfect set of microphone location estimatescontribute to uncertainty in the multilateration result.The estimated source location compared to the truelocation for Configuration 1 with the 0.5 metermicrophone position uncertainty is shown in Figure 7.The result is that the position discrepancies can givesubstantial estimation errors, although the magnitude ofthe error varies for different microphone locations.Additional review indicates that the estimated direction(azimuth) of the source is often nearly correct, whichmay be useful in certain circumstances.Performing 1000 trials with Configuration 2(microphones spaced slightly farther apart) and againmoving the microphone positions randomly within a 0.5meter radius, the estimated source location compared tothe true location is shown in Figure 8.AES 145th Convention, New York, NY, USA, 2018 October 17–20Page 7 of 9

Maher and HoerrForensic Gunshot Multilaterationforensic examination involving ad hoc collections ofuncertain microphone positions must proceed withcaution regarding any conclusions about the estimatedsource location.5 ConclusionsFigure 7: Simulated source location estimation errorfor Configuration 1 (in meters) for 1000 trials with 0.5meter uncertainty in microphone positions.This paper has described several considerationsregarding practical sound source localization frommultiple recordings obtained assuming a twodimensional geometry. The well-known multilaterationprocedure employs time difference of arrival (TDOA)information at the microphone positions. In theory,multilateration provides a good estimate of the soundsource location. In practice, there can be difficulty indetermining the TDOA values in the presence of noise,acoustic echoes, refraction, diffraction, andreverberation. As demonstrated using simulatedgeometries, uncertainty about the spatial location of thesensors can lead to errors in the multilateration results.An audio forensic examiner should take this uncertaintyinto account when performing calculations using datafrom multiple unsynchronized recording systemspositioned in an indiscriminate manner [13].ReferencesFigure 8: Simulated source location estimation errorfor Configuration 2 (in meters) for 1000 trials with 0.5meter uncertainty in microphone positions.As found with Configuration 1, Configuration 2 alsoshows substantial estimation errors due to the positiondiscrepancy, but the sensitivity is reduced: more of theestimates are close to the true value with the widerspacing of Configuration 2.From the simulation, it is clear that uncertainty in themicrophone positions—even as seemingly small as 0.5 meters—can lead to substantial errors in themultilateration estimate of the source position. An audio[1]R.C. Maher, “Audio forensic examination:authenticity, enhancement, and interpretation,”IEEE Sig. Proc. Mag., vol. 26, pp. 84-94 (2009).[2]R.C. Maher, “Lending an ear in the courtroom:forensic acoustics,” Acoustics Today, vol. 11,no. 3, pp. 22-29 (2015).[3]R.C. Maher and S.R. Shaw, “Decipheringgunshot recordings,” Proc. Audio Eng. Soc. 33rdConf. Audio Forensics—Theory and Practice,Denver, CO (2008).[4]Bureau of Justice Statistics, Office of JusticePrograms, U.S. Department of 11pr.cfmAES 145th Convention, New York, NY, USA, 2018 October 17–20Page 8 of 9

Maher and HoerrForensic Gunshot Multilateration[5]G.L. Duckworth, D.C. Gilbert, and J.E. Barger,“Acoustic counter-sniper system,” Proc. ofSPIE, Command, Control, Communications,and Intelligence Systems for Law Enforcement,E.M. Carapezza and D. Spector, eds., pp. 262275 (1997).[6]W.R. Fried and M. Kayton, Avionics NavigationSystems, Wiley (1969).[7]B.T. Fang, “Simple Solutions for Hyperbolicand Related Position Fixes,” IEEE Trans. onAerospace Elect. Sys., vol. 26, no. 5 (1990).[8]B.M. Brustad and J.C. Freytag, “A survey ofaudio forensic gunshot investigations,” Proc.Audio Eng. Soc. 26th Conf. Audio Forensics inthe Digital Age, Denver, CO. (2005).[9]R.C. Maher, “Modeling and signal processing ofacoustic gunshot recordings,” Proc. IEEE SignalProcessing Society 12th DSP Workshop,Jackson Lake, WY, pp. 257-261 (2006).[10]R.C. Maher, "Acoustical characterization ofgunshots," in Proc. IEEE SAFE Workshop onSignal Processing Applications for PublicSecurity and Forensics, Washington, D.C.(2007).[11]S.D. Beck, H. Nakasone, and K.W. Marr,"Variations in recorded acoustic gunshotwaveforms generated by small firearms," J.Acoust. Soc. Am., vol. 129, no. 4, pp. 1748-1759(2011).[12]R.C. Maher, "Gunshot recordings from acriminal incident: who shot first?" J. Acoust.Soc. Am., vol. 139, no. 4, p. 2024 (2016). Laylanguage version: [13]National Academy of Sciences, StrengtheningForensic Science in the United States: A PathForward, National Academy Press, Washington,DC. (2009).AES 145th Convention, New York, NY, USA, 2018 October 17–20Page 9 of 9

Audio Engineering Society Convention Paper . Presented at the 145. th. Convention . 2018 October 17-20, New York, NY, USA . This Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at least two qualified anonymous reviewers. The complete manuscript was not peer reviewed.