Skip to content

Implementation analysis

Guzman GP edited this page Feb 22, 2019 · 5 revisions

QLearning class:

These explanations will provide a better diagram’s pseudo code comprehension. Starting with the main ones:

  • observe reward value: receives the grid’s symbol stepped on and gives its respective reward. Its boolean’s value determinates the possible ending of the current episode.
  • extract possible actions: actions along the grid are defined here. In this case as spatial movements directions. If necessary they can be re-designed for other applica- tions.
  • choose action: this applies the E-greedy policy strategy when selecting the action (exploration vs exploitation). It could be possible to be changed for further utilities, for example, to Boltzmann’s strategy.
  • learn: Bellman’s equation process. This function coordinates most of the previously described ones and fills the q table attribute.
  • infer path: deduction of the optimal policy from the q table with a given initial state and maximum range of steps. To continue with, the secondary functions used for sensibility analisys and visualization purposes:
  • visualize inferred path: string logged representation of the states resulted as opti- mal policy.
  • visualize max quality action: Seaborn and Matplotlib coloured representation of the maximum quality value in each state.
  • q_value_ascii_action: translates the resulted quality values into Unicode characters. In the jupyter notebook practical example there can be found another plotted sensibility convergence function.

Learning process:

Inferring process:

Clone this wiki locally