Using the probabilities within the stochastic statemachine,we incorporate reinforcement learning in the architecture.
For example, assume that when a hand is presented in front of the robot, there are several possible responses. Let’s say, for example, there are 5 possible behaviors. One of the possible behaviors is the “give me a paw” behavior. At the beginning of learning, the probability for each possible behavior being manifested is 0.2. When the “give me a paw” behavior is selected with its initial probability, then the user gives a reward such as petting the robot’s head. This causes an increase in the probability of the behavior from 0.2 to 0.4, and the other behaviors’ probabilities decrease to 0.15. Then, if again the hand is presented in front of the robot, now the “giveme a paw” behavior has a higher probability of being selected. Thus, a user can customize AIBO’s responsethrough reinforcement learning. This also increases the complexity of behaviors.
by Kohtaro Sabe "Development of Entertainment Robot and Its Future"
研究:新发现的蛋白质可帮助拯救生命
3 年前

没有评论:
发表评论