RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ NeurIPS GenPlan 2023
Bhatia, A., Nashed, SB., & Zilberstein, S. (2023). In NeurIPS Workshop on Generalization in Planning. URL PDF
TL;DR: Incorporating task-specific Q-value estimates as inputs to a meta-RL policy can lead to improved generalization and better performance over longer adaptation periods.