Mathematics & Machine Learning Seminar
CANCELED
When decisions are made at high frequency, traditional reinforcement learning (RL) agents struggle to accurately estimate the values of their actions (their action-values). In turn, their performance is inconsistent and often poor. To what extent the performance of distributional RL (DRL) agents suffers similarly, however, is unknown. For instance, does estimating the full distribution of action-conditioned returns lessen this struggle?
In this talk, we will show that DRL agents are just as sensitive as their traditional counterparts and how to make them more robust, to decision frequency. We will introduce distributional perspectives on action gaps and advantages. In particular, we will introduce the superiority as a probabilistic generalization of the advantage function —the core object in approaches to mitigating performance issues in high-frequency value-based RL. In addition, we will build a collection of superiority-based DRL algorithms. Through simulations in an option-trading domain, we will show that proper modeling of the superiority distribution produces improved controllers at high decision frequencies.
