Approximate Optimal Motion Planning To Avoid Unknown .

Transcription

414IEEE TRANSACTIONS ON ROBOTICS, VOL. 36, NO. 2, APRIL 2020Approximate Optimal Motion Planning to AvoidUnknown Moving Avoidance RegionsPatryk Deptula , Hsi-Yuan Chen , Ryan A. Licitra , Joel A. Rosenfeld, and Warren E. Dixon , Fellow, IEEEAbstract—In this article, an infinite-horizon optimal regulationproblem is considered for a control-affine nonlinear autonomousagent subject to input constraints in the presence of dynamicavoidance regions. A local model-based approximate dynamicprogramming method is implemented to approximate the valuefunction in a local neighborhood of the agent. By performing localapproximations, prior knowledge of the locations of avoidanceregions is not required. To alleviate the a priori knowledge of thenumber of avoidance regions in the operating domain, an extensionis provided that modifies the value function approximation. Thedeveloped feedback-based motion planning strategy guaranteesuniformly ultimately bounded convergence of the approximatedcontrol policy to the optimal policy while also ensuring the agentremains outside avoidance regions. Simulations are included todemonstrate the preliminary development for a kinematic unicycleand generic nonlinear system. Results from three experimentsare also presented to illustrate the performance of the developedmethod, where a quadcopter achieves approximate optimal regulation while avoiding three mobile obstacles. To demonstrate thedeveloped method, known avoidance regions are used in the firstexperiment, unknown avoidance regions are used in the secondexperiment, and an unknown time-varying obstacle directed by aremote pilot is included in the third experiment.Index Terms—Data-based control, learning and adaptivesystems, motion and path planning, neural and fuzzy control,optimization and optimal control.I. INTRODUCTIONMANY challenges exist for real-time navigation in uncertain environments. To operate safely in an uncertainManuscript received February 7, 2019; accepted November 12, 2019. Dateof publication December 10, 2019; date of current version April 2, 2020. Thisarticle was recommended for publication by Associate Editor S.-J. Chung andEditor P. Robuffo Giordano upon evaluation of the reviewers’ comments. Thiswork was supported in part by National Science Foundation (NSF) under Award1509516, in part by the Office of Naval Research under Grant N00014-13-10151, and in part by the Air Force Office of Scientific Research (AFOSR) underAward FA9550-19-1-0169. The work of P. Deptula was done prior to joiningThe Charles Stark Draper Laboratory, Inc. The work of H.-Y. Chen was doneprior to joining Amazon Robotics. (Corresponding author: Patryk Deptula.)P. Deptula is with the Perception and Autonomy Group, The CharlesStark Draper Laboratory, Inc., Cambridge, MA 02139 USA (e-mail:pdeptula@draper.com).H.-Y. Chen is with the Amazon Robotics, North Reading, MA 01864 USA(e-mail: hsiyuc@amazon.com).R. A. Licitra and W. E. Dixon are with the Department of Mechanical andAerospace Engineering, University of Florida, Gainesville, FL 32611 USA(e-mail: rlicitra@ufl.edu; wdixon@ufl.edu).J. A. Rosenfeld is with the Department of Mathematics and Statistics, University of South Florida, Tampa, FL 33620 USA (e-mail: rosenfeldj@usf.edu).This article has supplementary downloadable multimedia material availableat http://ieeexplore.ieee.org provided by the authors.Color versions of one or more of the figures in this article are available onlineat http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TRO.2019.2955321environment, an autonomous agent must identify and react topossible collisions. In practice, challenges come from limitations in computational resources, sensing, communication, andmobility. Hence, robot navigation, motion planning, and pathplanning continues to be an active research area (cf., [1] andreferences therein).Because motion and path-planning strategies need to accountfor environmental factors with various uncertainties, they canbe divided into two groups—global and local approaches [2].Global planners seek the best trajectory by using models of theentire environment, are computed before a mission begins, andtend to provide high-level plans (cf., [3]–[7]). Local planners(sometimes referred to as reactive methods) plan only a few timesteps forward based on limited knowledge using sensory data;hence, they have the advantage of providing optimal feedback ifthe agent is forced off of its original path, but they may need tobe recomputed online (cf., [7]–[10]). Since complex operatingconditions present significant navigation, guidance, and controlchallenges (i.e., agents’ dynamics, obstacles, disturbances, oreven faults), online feedback-based control/guidance algorithmswith online learning and adaptation capabilities are essential forreplanning and execution in dynamically changing and uncertain environments. Constrained optimization methods can beleveraged to generate guidance/control laws for agents operating in complex environments. However, agents often exhibitnonlinear dynamics and navigate in environments with uncertain dynamics or constraints, which makes the determinationof analytical solutions to constrained optimization problemsdifficult. Traditional guidance/control solutions exploit numerical methods to generate approximate optimal solutions. Forinstance, approaches may use pseudoscpectral methods, theymay solve the Hamilton–Jocobi–Bellman (HJB) equation offlinevia discretization and interpolation, or viscosity solutions canbe solved offline before a mission begins (cf., [3], [11]–[14]).Such results may provide performance guarantees; however, numerical nonlinear optimization problems are typically computationally expensive (often preventing real-time implementation),especially as the dimension of the system increases. Generally,numerical methods are unable to consider uncertainty in thedynamics or environment, and are ill suited for dynamicallychanging environments because new guidance/control solutionswould need to be recalculated offline in the event of a changein the environment. Such challenges motivate the use of approximate optimal control methods that use parametric functionapproximation techniques capable of approximating the solutionto the HJB online (cf., [15]–[26]).1552-3098 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See l for more information.Authorized licensed use limited to: University of Florida. Downloaded on April 05,2020 at 20:58:50 UTC from IEEE Xplore. Restrictions apply.

DEPTULA et al.: APPROXIMATE OPTIMAL MOTION PLANNING TO AVOID UNKNOWN MOVING AVOIDANCE REGIONSFurther complicating the task of optimal motion planning areagent actuator constraints and state constraints (e.g., static ormobile avoidance regions) often present en route to an objective.Certain avoidance regions may remain undiscovered until theyfall into a given detection range. The concept of avoidancecontrol was introduced in [7] for two-player pursuit-evasiongames. However, results such as [9], [10], and [27]–[29] haveused navigation functions for low-level control with collisionavoidance in applications, such as multiagent systems. Otherresults, such as [30]–[32] have considered collision avoidancein multiagent systems with limited sensing by using boundedavoidance functions in the controller which are only active whenagents are within a defined sensing radius. The results in [9] and[28]–[32] do not consider optimal controllers, and in certaincases do not consider control constraints. Compared to suchresults which do not consider optimality, work such as [33]utilizes unbounded avoidance functions to explicitly computeoptimal controllers for cooperative avoidance for multiagentsystems. Moreover, results such as [34]–[36], develop sets offeasible states along with safe controllers using reachabilitymethods such as [14] by developing differential games betweentwo players. Moreover, results such as [37]–[41] approach collision avoidance problems through the use of collision conesin conjunction with other methods based on engagement geometry between two point objects. In such works, dynamicallymoving objects are modeled by quadric surfaces and collisionconditions are derived for dynamic inversion-based avoidancestrategies between agents. Despite the progress, the resultsin [33] rely on explicitly computed controllers, which are unknown when the optimal value function is unknown, and whileresults such as [37]–[40] establish a framework for providingcollision cones, they are still combined with methods whichmay not necessarily be optimal, cf., [41]. However, althoughresults such as [14] and [34]–[36] provide optimality guarantees,they rely on numerical techniques, which tend to be computationally intensive, and need to be resolved when conditionschange.Over the last several years, model predictive control (MPC)has gained attention for its capability to solve finite horizon optimal control problems in real-time (cf., [8], [42]–[45]). Moreover,MPC has been applied in a plethora of optimization problems;MPC is known for handling complex problems, such as of multiobjective problems, point-to-point trajectory generation problems, and collision avoidance (cf., [8], [42]–[45]). Specifically,works such as [8] consider multiobjective MPC frameworks forautonomous underwater vehicles with different prioritized objectives where the main objective is path convergence, while thesecondary objective is different (i.e., speed assignment, whichcan be sacrificed at times in lieu of better performance on pathconvergence), or the objective is purely trajectory generation,such as [42] and [43], where the goal is point-to-point trajectory generation (i.e., offline multiagent trajectory generation ortrajectory generation for constrained linearized agent models).Unlike, the aforementioned MPC results, results such as [44]and [45] take advantage of MPC’s ability for fast optimizationto combine it with other methods when considering collisionavoidance problems. Although MPC has shown to be effective415in motion/path planning and obstacle avoidance problems, thesystem dynamics are generally considered to be discretized andat each time-step, a finite horizon optimal control problem needsto be solved where a sequence of control inputs is generated.Even in the absence of obstacles, MPC methods generally donot yield an optimal policy over the complete trajectory sincenew solutions need to be recomputed at the end of each timehorizon. Specifically, limited horizon methods, such as MPC,often require linear dynamics (cf., [42], [43]) or at least knowndynamics (cf., [8], [42]–[45]). Since in practice, the environmentand agents are prone to uncertainties, motivation exists to useparametric methods, such as neural-networks (NNs), to approximate optimal controllers online in continuous state nonlinearsystems.In recent years, approximate dynamic programming (ADP)has been successfully used in deterministic autonomous controlaffine systems to solve optimal control problems [15]–[18],[46], [47]. By utilizing parametric approximation methods, ADPmethods approximate the value function, which is the solutionto the HJB and is used to compute the online forward-in-timeoptimal policy. Input constraints are considered in [19]–[21]by using a nonquadratic cost function [48] to yield a boundedapproximate optimal controller.For general nonlinear systems, generic basis functions, suchas Gaussian radial basis functions, polynomials, or universalkernel functions are used to approximate the value function.One limitation of these generic approximation methods is thatthey only ensure an approximation over a compact neighborhoodof the origin. Once outside the compact set, the approximationerror tends to either grow or decay depending on the selectedfunctions. Consequently, in the absence of domain knowledge,a large number of basis functions, and hence, a large number ofunknown parameters, are required for value function approximation. A recent advancement in ADP utilizes computationallyefficient state-following (StaF) kernel basis functions for localapproximation of the value function around the current state,thereby reducing the number of basis functions required forsufficient value function approximation [22], [49]–[51]. Theauthors in [49] utilized the StaF approximation method to develop an approximate optimal online path planner with staticobstacle avoidance. However, the development in [49] used atransitioning controller which switched between the approximate controller and a robust controller when the obstacles wheresensed.Inspired by advances in [22]–[26], [49], and [50], an approximate local optimal feedback-based motion planner is developedin this article that considers input and state constraints withmobile avoidance regions. The developed method differs fromnumerical approaches, such as [15]–[26], or MPC approaches,such as [42] and [43], because this article provides an onlineclosed-loop feedback controller with computational efficiencyprovided by the local StaF approximation method. Moreover,the agent’s trajectory is not computed offline, but instead theagent adjusts its trajectory online when it encounters an obstacle. Compared to works such as [9] and [28]–[32], which donot consider optimality, the controller designed in this articleis based on an optimal control formulation that provides anAuthorized licensed use limited to: University of Florida. Downloaded on April 05,2020 at 20:58:50 UTC from IEEE Xplore. Restrictions apply.

416IEEE TRANSACTIONS ON ROBOTICS, VOL. 36, NO. 2, APRIL 2020approximate optimal control solution. In addition, unlike [49]and other path planners, this method tackles the challenge ofavoiding dynamic avoidance regions within the control strategywithout switching between controllers. Since the StaF methoduses local approximations, it does not require knowledge of uncertainties in the state space outside an approximation window.Local approximations of the StaF kernel method can be appliedwhen an agent is approaching avoidance regions representedas (n 1)-spheres, not known a priori, in addition to stateand system constraints. Because the avoidance regions becomecoupled with the agent in the HJB, their respective states must beincorporated when approximating the value function. Hence, abasis is given for each region which is zero outside of the sensingradius but is active when the avoidance region is sensed. Inapplications, such as station keeping of marine craft (e.g., [52]),knowledge of the weights for an avoidance region may provideuseful information, as the approximation of the value functioncan be improved every time the region is encountered. To preventcollision, a penalizing term is added to the cost function whichguarantees that the agent stays outside of the avoidance regions.A Lyapunov-based stability analysis is presented and guaranteesuniformly ultimately bounded convergence while also ensuringthat the agent remains outside of the avoidance regions. Thiswork extends from the preliminary results in [53]. Unlike thepreliminary work in [53], this article provides a unique valuefunction representation and approximation, the actor update lawis modified, and a more detailed stability analysis is included.The significance of this work over [53], is the mathematicaldevelopment that considers an uncertain number of avoidanceregions by transforming the autonomous value function approximation into a nonautonomous approximation. Because time doesnot lie on a compact set, it cannot be used in the StaF NNs, atransformation is performed so that a bounded signal of time isleveraged in the NNs. Moreover, experimental validations arepresented to illustrate the performance of the developed pathplanning strategy.NotationIn the following development, R denotes the set of real numbers, Rn and Rn m denote the sets of real n-vectors and n mmatrices, and R a and R a denote the sets of real numbersgreater than or equal to a and strictly greater than a, respectively,where a R. The n n identity matrix, column vector of onesof dimension j, and the zeros matrix or dimension m n aredenoted by In , 1j , and 0m n , respectively; hence, if n 1,0m n reduces to a vector of zeros. The partial derivative of kwith respect to the state x is denoted by k(x, y, . . .), while thetranspose of a matrix or vector is denoted by (·)T . For a vectorξ Rm , the notation Tanh(ξ) Rm and sgn(ξ) Rm are defined as Tanh(ξ) [tanh(ξi ), . . . , tanh(ξm )]T and sgn(ξ) [sgn(ξi ), . . . , sgn(ξm )]T , respectively, where tanh(·) denotesthe hyperbolic tangent function and sgn(·) denotes the signumfunction. The notation U [a, b]1n 1 denotes a n-dimensionalvector selected from a uniform distribution on [a, b], and 1n mdenotes a n m matrix of ones.II. PROBLEM FORMULATIONConsider an autonomous agent with control-affine nonlineardynamics given byẋ(t) f (x(t)) g (x(t)) u(t)(1)for all t R t0 , where x : R t0 R denotes the state, f :Rn Rn denotes the drift dynamics, g : Rn Rn m denotesthe control effectiveness, u : Rt t0 Rm denotes the controlinput, and t0 R 0 denotes the initial time. In addition, considerdynamic avoidance regions with nonlinear dynamics given bynżi (t) hi (zi (t))(2)for all t R t0 , where zi : Rt t0 Rn denotes the state of thecenter of the ith avoidance region and hi : Rn Rn denotes thedrift dynamics for the ith zone in M {1, 2, . . . , M }, whereM is the set of avoidance regions in the state space Rn .1 Thedynamics in (2) are modeled as autonomous and isolated systemsto facilitate the control problem formulation. The representationof the dynamics in (2) would require that complete knowledgeof the dynamics over the entire operating domain are used.However, motivated by real systems where agents may onlyhave local sensing, it is desired to only consider the zone insidea detection radius. Therefore, to alleviate the need for the HJB torequire knowledge of the avoidance region dynamics outside ofthe agents’ ability to sense the obstacles, the avoidance regionsare represented asżi (t) Fi (x(t), zi (t)) hi (zi (t))(3)for all t R t0 . In (3), Fi : Rn Rn [0, 1] is a smooth transition function that satisfies Fi (x, zi ) 0 for x zi rdand Fi (x, zi ) 1 for x zi r̄, where rd R 0 denotesthe detection radius of the system in (1), and r̄ (ra , rd ) wherera R 0 denotes the radius of the avoidance region. From theagent’s perspective, the dynamics of the obstacles do not affectthe agent outside of the sensing radius.Remark 1: In application, a standard practice is to enforcea minimum avoidance radius to ensure safety [30], [31]. Inaddition, the detection radius rd and avoidance radius rs dependon the system parameters such as the maximum agent velocitylimits.Assumption 1: The number of dynamic avoidance regionsM is known; however, the locations of the states of each regionis unknown until it is within the sensing radius of the agent.Section VII presents an approach to alleviate Assumption 1.Assumption 2: The drift dynamics f , hi , and control effectiveness g are locally Lipschitz continuous, and g is boundedsuch that 0 g(x(t)) g for all x Rn and all t R t0where g R 0 . Furthermore, f (0) 0, and f : Rn Rn Rn is continuous.Assumption 3: The equilibrium points zie for the obstaclesgiven by the dynamics in (3) lie outside of a ball of radius rdcentered at the origin. That is, the origin is sufficiently clear ofobstacles. Furthermore, obstacles do not trap the agent, meaning1 Theterms avoidance regions and obstacles are used interchangeably.Authorized licensed use limited to: University of Florida. Downloaded on April 05,2020 at 20:58:50 UTC from IEEE Xplore. Restrictions apply.

DEPTULA et al.: APPROXIMATE OPTIMAL MOTION PLANNING TO AVOID UNKNOWN MOVING AVOIDANCE REGIONS417The goal is to simultaneously design and i

mobility. Hence, robot navigation, motion planning, and path . of analytical solutions to constrained optimization problems difficult. Traditional guidance/control solutions exploit numer-ical methods to generate approx