RoboCup Challenges. Robotics. Simulation League Small League Medium-sized League (less interest) SONY Legged League Humanoid League

Similar documents
Waiting for Affordable Housing in NYC

Network Analysis: Minimum Spanning Tree, The Shortest Path Problem, Maximal Flow Problem. Métodos Cuantitativos M. en C. Eduardo Bustos Farías 1

Oligopoly Theory (6) Endogenous Timing in Oligopoly

Marginalized kernels for biological sequences

Hunting the Elusive Within-person and Between-person Effects in Random Coefficients Growth Models

Housing market and finance

ECON 522- SECTION 4- INTELLECTUAL PROPERTY, FUGITIVE PROP- 1. Intellectual Property. 2. Adverse Possession. 3. Fugitive Property

The Improved Net Rate Analysis

Network Analysis: Minimum Spanning Tree,

Economics. Oligopoly. Measuring Market Concentration. In this chapter, look for the answers to these questions: N. Gregory Mankiw

THE LEGAL AND FINANCIAL FRAMEWORK OF AN EFFICIENT PRIVATE RENTAL SECTOR: THE GERMAN EXPERIENCE

On the Relationship between Track Geometry Defects and Development of Internal Rail Defects

Optimization of Multiple Related Negotiation through Multi-Negotiation Network

Oligopoly. Introduction: Between Monopoly and Competition. In this chapter, look for the answers to these questions: Two extremes

Nonlocal methods for image processing

11.433J / J Real Estate Economics Fall 2008

Oligopoly. Introduction: Between Monopoly and Competition. In this chapter, look for the answers to these questions: Two extremes

SAS at Los Angeles County Assessor s Office

Cube Land integration between land use and transportation

Automatic Cryptanalysis of Block Ciphers with CP

Overview of OR Modeling Approach & Introduction to Linear Programming

Oligopoly Theory (8) Product Differentiation and Spatial Competition

Negative Gearing and Welfare: A Quantitative Study of the Australian Housing Market

Online Appendix "The Housing Market(s) of San Diego"

Bargaining position, bargaining power, and the property rights approach

Influence of Digital Computer Technology on Architectural Design Teaching Mode

Sorting based on amenities and income

METROPOLITAN COUNCIL S FORECASTS METHODOLOGY

United States High Power Rocket Sporting Code

Homes.com Fusion Portal. Connect Your Members with Quality Leads ALLIANCE NETWORK MEMBER

United States Soccer Federation. Introduction to Referee Assigning

Analyzing Ventilation Effects of Different Apartment Styles by CFD

The Analytic Hierarchy Process. M. En C. Eduardo Bustos Farías

CUDAHY MUNICIPAL CODE

Course Descriptions Real Estate and the Built Environment

Urbanization and Property Rights

Why Good Architects Act as Chameleons

IFRS Training. IAS 38 Intangible Assets. Professional Advisory Services

Comparative Housing Market Analysis: Minnetonka and Surrounding Communities

Solar Open House Toolkit

Incentives for Spatially Coordinated Land Conservation: A Conditional Agglomeration Bonus

Depreciation and Depletion

Land Rent. Economics 312 Martin Farnham

Resolving Conflicts at Complex Intersections June 20, 2012

Edward Mitchell AIA; Yale University and Edward Mitchell Architects, New Haven CT

A Brief Overview of H-GAC s Regional Growth Forecast Methodology

The New Starts Grant and Affordable Housing A Roadmap for Austin s Project Connect

Chapter 1. Introduction: Some Representative Problems. CS 350: Winter 2018

Groupe de Recherche en Économie et Développement International. Cahier de recherche / Working Paper 04-06

Robustness of multi-objective optimization of building refurbishment to suboptimal weather data

Metro Boston Perfect Fit Parking Initiative

Evacuation Design Focused on Quality of Flow

Waverley Elementary School Feasibility Study

RULES OF THE GEORGIA STATE BOARD OF REGISTRATION FOR PROFESSIONAL ENGINEERS AND LAND SURVEYORS CHAPTER APPLICATIONS TABLE OF CONTENTS

Multifamily Owners: Including Utilities May Be Killing Your Profits Learn how to protect your NOI

Article 7. URBAN CENTER (C-) NEIGHBORHOOD CONTEXT

Game theory. Análisis Económico de la Empresa. M. En C. Eduardo Bustos Farías 1

The Home Selling Process

METROPOLITAN COUNCIL S FORECASTS METHODOLOGY JUNE 14, 2017

The Proposal of Cadastral Value Determination Based on Artificial Intelligence

Click to edit Master title style

Mortgage Market Institutions and Housing Market Outcomes

Figure 1. The chart showing how the effort and cost of the design changes are affected as the project progresses (Anon.) Simulation tools are a key co

ZONING CITY ACREAGE PERCENT OF CITY ACREAGE TOTAL. Residential Low (RL) 1, % Residential Medium (RM) % Residential High (RH) 228.

Real Estate Reference Material

Real Estate Services Proposal

38 th & Blake Height Amendments: Public Meeting #5 Building Design Comments July 13 th, 2016

Proposed FASB Staff Position No. 142-d, Amortization and Impairment of Acquired Renewable Intangible Assets (FSP 142-d)

Financial Accounting Standards Committee

2011 AICP Review Course

7224 Nall Ave Prairie Village, KS 66208

For legal reasons, we cannot and will not respond to messages asking for more information about a property.

*Predicted median absolute deviation of a CASA value estimate from the sale price

DATA APPENDIX. 1. Census Variables

Residential - Mixed Housing Urban and Suburban zones

Professional - Loyal - Dedicated. Compliments of: Liz Wright Buyers Specialist

JAMES G. PROPP. Department of Mathematical Sciences Phone: (617) University of Massachusetts Lowell

86M 4.2% Executive Summary. Valuation Whitepaper. The purposes of this paper are threefold: At a Glance. Median absolute prediction error (MdAPE)

We contacted all RNOs in the area to come to their meetings and personally explain the draft, and take questions. Four RNOs took us up on the offer,

Software Architecture Context

14.471: Fall 2012: Recitation 4: Government intervention in the housing market: Who wins, who loses?

Public Review of the Slot Home Text Amendment


Ad-valorem and Royalty Licensing under Decreasing Returns to Scale

Sell Your House in DAYS Instead of Months

Naked Exclusion with Minimum-Share Requirements

Village of Scarsdale

Design idea on planning skill training system of real estate development projects in colleges and universities

** PRELIMINARY DRAFT** An Agent-Based Model of Exurban Land Development

Financial Computer Systems Inc. (203)

John Barbian Carolyn Barbian. January 5, 2013

Mutual Exchanges Policy

UNIT FIVE RATIONAL EXPRESSIONS 18 HOURS MATH 521B

for taxation 2019 Finnish revaluation of land Presented at the FIG Working Week 2017, May 29 - June 2, 2017 in Helsinki, Finland

RBC-Pembina Home Location Study. Understanding where Greater Toronto Area residents prefer to live

Chapter 7. Objective1 Assess Costs and Benefits of Renting. Chapter Objectives

MULTIFAMILY PERFORMANCE PROGRAM Technical Topic New Buildings Savings Opportunity - Stairwell Lighting

Winning with Foreclosures

Companies are grouped into four types based on how they choose office space to rent.

The Impact of Balconies on Wind Induced Ventilation of Singlesided Naturally Ventilated Multi-storey Apartment

Transcription:

Robotics (c) 2003 Thomas G. Dietterich 1 RoboCup Challenges Simulation League Small League Medium-sized League (less interest) SONY Legged League Humanoid League (c) 2003 Thomas G. Dietterich 2

Small League Overhead camera Central controlling computer for each team Fast and agile Winning teams have had the best hardware (c) 2003 Thomas G. Dietterich 3 Small League: CMU vs. Cornell @ American Open (2003) (c) 2003 Thomas G. Dietterich 4

Medium-Sized League Largest fully-autonomous robots Has been plagued by hardware challenges (c) 2003 Thomas G. Dietterich 5 Humanoid League Still demonstrating technology and skills (kicking, vision, localization) (c) 2003 Thomas G. Dietterich 6

SONY Legged League CMU vs. New South Wales (1999) (c) 2003 Thomas G. Dietterich 7 CMU vs. New South Wales (2002) (c) 2003 Thomas G. Dietterich 8

Special Purpose Vision Bright Light Dim Light (c) 2003 Thomas G. Dietterich 9 What the Dog Sees (c) 2003 Thomas G. Dietterich 10

Making Sense of Sensing P(Image t CameraPose t, World t ) P(CameraPose t BodyPose t ) P(BodyPose t BodyPose t-1, Action t ) P(World t World t-1 ) argmax Wt P(W t I t,a t,)= argmax Wt Wt-1,Ct,Ct-1,Bt,Bt-1 P(I t W t,c t ) P(W t W t-1 ) P(C t B t ) P(B t B t-1,a t ) = argmax Wt Ct P(I t W t,c t ) Wt-1 P(W t W t- 1 ) Bt P(C t B t ) Bt-1 P(B t B t-1,a t ) B t-1 A t B t C t I t W t-1 W t (c) 2003 Thomas G. Dietterich 11 The World Locations (and orientations and velocities) of self ball other players on same team players on other team (c) 2003 Thomas G. Dietterich 12

Actions Actions can be described at many levels of detail low level actions: moving body joints intermediate level actions: walking gaits, shooting and passing motions, localization motions, celebration dances learned or programmed prior to the game higher-level actions: shoot on goal, pass to X, keep away from Y decisions are made at this level during the game (c) 2003 Thomas G. Dietterich 13 Choosing Actions to Maximize Utility Markov Decision Process Set of states X Set of actions A State transition function: P(X t X t-1,a t ) Reward function: R(X t-1,a t,x t ) Discount factor γ Policy: π: X a A maps from states to actions Value of a policy: E[R 1 + γ R 2 + γ 2 R 3 + L] (c) 2003 Thomas G. Dietterich 14

The Reward Function Overall Reward function R(X) reward received when entering state X example: scoring goal R = +1 example: opponent scores: R = -1 reward is zero most of the time. We say that reward is delayed (c) 2003 Thomas G. Dietterich 15 The Value Function V π (X) is the expected discounted reward of being in state X and executing policy π. V π (X) = X P(X X,π(X)) [R(X,π(X),X ) + γv π (X )] 0.6 X 1 4.0 a 1 0.4 X π a 2 0.3 0.7 X 2 X 3-2.0 V π (X) = 0.3 γ (-2) + 0.7 γ (1.5) = 0.405 1.5 (c) 2003 Thomas G. (γ Dietterich = 0.9) 16

Computing the Optimal Policy by Computing its Value Function Let V*(X) denote the expected discounted reward of following the optimal policy, π*, starting in state X. V*(X) = max a X P(X X,a) [R(X,a,X ) + γ V*(X )] Value Iteration: Initialize V(X) = 0 in all states X repeat until V converges: for each state X, compute V*(X) := max a S P(X X,a) [R(X,a,X ) + γ V*(X )] (c) 2003 Thomas G. Dietterich 17 Computing the Optimal Policy from V* π * (X) := argmax a X P(X X,a) [R(X,a,X ) + γ V*(X )] Perform a one-step lookahead, evaluate the resulting states X using V*, and choose the best action (c) 2003 Thomas G. Dietterich 18

Value Iteration Scale-up Problems Requires O( X A B) time, where B is the branching factor (number of states resulting from an action) Not practical for more than 30,000 states Not practical for continuous state spaces Where do the probability distributions come from? (c) 2003 Thomas G. Dietterich 19 Reinforcement Learning Learn the transition function and the reward function by experimenting with the environment Perform value iteration to compute π* Other methods compute V* or π* directly without learning P(X X,A) or R(X,A,X ) Q learning SARSA(λ) (c) 2003 Thomas G. Dietterich 20

Scaling Methods Value Function Approximation Compact parameterizations of value functions (e.g., as linear, polynomial, or non-linear functions) Policy Approximation Compact representation of the policy Gradient descent in policy space (c) 2003 Thomas G. Dietterich 21 Multiple Agents The Markov Decision Process is a model of only a single agent, but robocup involves multiple cooperative and competitive agents There is a separate reward function for each agent, but it depends on the actions of all of the other agents R 1 (X, a 1,, a N, X ) R 2 (X, a 1,, a N, X ) R N (X, a 1,, a N, X ) (c) 2003 Thomas G. Dietterich 22

Game Theory Each agent ( player ) has a policy for choosing actions The combination of policies results in a value function for each player Each player seeks to optimize his/her own value function Stable solutions: Nash Equilibrium Each player s current policy is a local optimum if all of the other players policies are kept fixed Each player has no incentive to change Computing Nash Equilibria in general is a research problem, although there are special cases where solutions are known. (c) 2003 Thomas G. Dietterich 23 Stochastic Policies In games, the optimal policy may be stochastic (i.e., actions are chosen according to a probability distribution) π(x,a) = probability of choosing action A in state X Example: Rock, Paper, Scissors Nash equilibrium: choose randomly among the three actions (c) 2003 Thomas G. Dietterich 24

How to choose actions when you don t know your opponent s policy Consider one or more policies that your opponent is likely to play Design a policy that works well against all of them (c) 2003 Thomas G. Dietterich 25 The Segway League? (c) 2003 Thomas G. Dietterich 26

Non-Mobile Robot Motion Planning Industrial robot arms Degrees of Freedom (one for each independent direction in which a robot or one of its effectors can move) How many (internal) degrees of freedom does this arm have? P R R R R R (c) 2003 Thomas G. Dietterich 27 Kinematics and Dynamics Kinematic State joint angle of each joint Dynamic State Kinematic State + velocities and accelerations of each joint (c) 2003 Thomas G. Dietterich 28

Holonomic vs. Non-Holonomic Automobile (on a plane) 3 degrees of freedom (x,y,θ) only 2 controllable degrees of freedom wheels and steering Holonomic: number of degrees of freedom = number of controllable degrees of freedom easier to control, often more expensive Non-Holonomic: degrees of freedom > controllable degrees of freedom (c) 2003 Thomas G. Dietterich 29 Path Planning Want to move robot arm from one location (conf-1) to another (conf-2) (c) 2003 Thomas G. Dietterich 30

Two Different Coordinate Systems Locations can be specified in two different coordinate systems Workspace Coordinates position of end-effector (x,y,z) and possibly its orientation (roll,pitch,yaw) Joint Coordinates angle of each joint (c) 2003 Thomas G. Dietterich 31 Configuration Space (C-Space) (c) 2003 Thomas G. Dietterich 32

Forward and Inverse Kinematics Forward Kinematics Given joint angles compute workspace coordinates easy Inverse Kinematics Given workspace coordinates compute joint angles hard: may exist multiple solutions (often infinitely many) Path planning involves finding a path easy to do in joint angle space avoiding obstacles easy to do in workspace (c) 2003 Thomas G. Dietterich 33 Computing Obstacle Representations in C-Space Must convert each obstacle from a region of workspace to a region in configuration space Often done by sampling generate grid of points in C-space test if corresponding point is occupied by obstacle Interesting computational geometry challenge (c) 2003 Thomas G. Dietterich 34

Path Planning in Configuration Space Cell Decomposition Methods Potential Field Methods Voronoi Graph Methods Probabilistic Roadmap Methods key problem: C-Space is continuous! (c) 2003 Thomas G. Dietterich 35 Cell Decomposition Define a grid of cells for free space A path consists of a sequence of cells Legal moves: go from center of one cell to center of 8 neighboring cells: Converts path planning to discrete search problem (use A* or Value Iteration) (c) 2003 Thomas G. Dietterich 36

Cell Decomposition cell color indicates optimal value function (distance to goal along optimal policy) (c) 2003 Thomas G. Dietterich 37 Problems with Cell Decomposition How do we handle cells that overlap obstacles? ignore: algorithm is incomplete (possible plan will not be found) include: algorithm is unsound (plan may not work) Number of cells grows exponentially with number of joints (dimensionality of C-Space) Paths may touch (or pass too close to) obstacles (c) 2003 Thomas G. Dietterich 38

Solutions to Cell Problems Cells too big/too small Use variable resolution cell size. Degree cell size near obstacles Cell scaling Voronoi and Roadmap methods Touching obstacles Potential Field Methods (c) 2003 Thomas G. Dietterich 39 Potential Field Method Define a cost for getting close to obstacles ( the potential ) Find optimal path that minimizes the combined path length + cost (c) 2003 Thomas G. Dietterich 40

Potential Field Result (c) 2003 Thomas G. Dietterich 41 Define set of points equidistant from two or more obstacles This has lower dimensionality (often 1- D). Finitely-many intersections. Path: from start to Voronoi skeleton, along skeleton, from skeleton to end Voronoi Methods ( skeletonization ) (c) 2003 Thomas G. Dietterich 42

Problems with Voronoi Method Resulting paths maximize clearance from obstacles Does not work well in large open spaces Path goes through middle of space Computing the diagram can be difficult in C-Space. (c) 2003 Thomas G. Dietterich 43 Probabilistic Roadmap Draw a sample of points in C-Space. Keep those points that are in free space. Compute Delauney Triangulation of the sample points This gives a graph of points in free space Search in this graph (c) 2003 Thomas G. Dietterich 44

Probabilistic Roadmap (c) 2003 Thomas G. Dietterich 45 Scaling Problems All of these methods do not scale to very high dimensional spaces Probabilistic roadmap and Voronoi method scale best Probabilistic roadmap is cheapest to compute Sampling can be dynamically refined based on initial paths (c) 2003 Thomas G. Dietterich 46

Executing Robot Plans Path only specifies the kinematic state of the robot arm Actually moving the arm must deal with dynamics: acceleration, mass, friction, etc. Control theory has well-developed methods for smoothly following a trajectory e.g., PID controllers (proportional integral derivative controllers) (c) 2003 Thomas G. Dietterich 47 Robotics Summary Robots live in partially-observable, stochastic environments that may contain other cooperative and competitive agents Robot Tasks localization mapping action selection planning (single agent; multiple cooperative agents; multiple competitive agents; teams) action execution (c) 2003 Thomas G. Dietterich 48

Robot Planning For single-agent stochastic environments MDP model Value Iteration Reinforcement Learning For multiple-agent stochastic environments Game theory model Still a research topic For single-agent deterministic (non-mobile) environment Configuration space planning (c) 2003 Thomas G. Dietterich 49