Skip to content

soarcreator/CartPole-Q-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

CartPoleLinearApproximation

First q learning. It failed maybe because theta times w_3 cannot express the lifespan, which is the sum of the future rewards. The lifespan is longer when the theta is small number than when it is big number, but all this model can do is just to multiply it by w_3.

lifespan(theta) = theta * w_3 > 0

lifespan(1) = 1 * w_3
=> must be some big positive number.
=> w_3 must be big.

lifespan(100) = 100 * w_3
=> must be some small positive number.
=> w_3 must be small.
=> contradictory

I think that's why this way does not work well, and the w diverges.

About

First q learning. It failed because...

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages