Reinforcement Learning - Markov Decision Process

Introduction

Reinforcement Learning

In reinforcement learning we are given x and z we should find y and f(x)

Markov Decision Process

Properties:

Policy

Given a particular state a policy will suggest you a particular action to be taken

Optimal Policy

It is a policy which will help in optimize in receiving a maximum rewards.

Hence in reinforcement learning x refers to state, z refers to rewards y refers to the actions and f(x) refers to the optimal policy

Temporal Credit Assignment

Identifying the action of which made us to get the rewards or for the given state we are in what was sequence of actions that was taken.

Carefully choosing the rewards will determine the end results.

Sequence of Rewards