Saturday, June 4, 2022

Reinforcement Learning as One Big Sequence Modeling Problem

Transformers as dynamics models

Predictive dynamics models often have excellent single-step error, but poor long-horizon accuracy due to compounding errors. We show that Transformers are more reliable long-horizon predictors than state-of-the-art single-step models, even in continuous Markovian domains.

         
Attention patterns of the Trajectory Transformer, showing (left) a discovered
Markovian stratetgy and (right) an approach with action smoothing.


from Hacker News https://ift.tt/oPTmBVf

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.