example:
gap:
example:
contribution:
example: we introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT), drawing inspiration from economics.
methodology:
example: directly estimates the optimal soft-value functions (LogSumExp) in the maximum entropy RL setting without needing to sample from a policy.
benefits:
example: avoid computing Q-values using out-of-distribution actions which is often a substantial source of error
results:
experiment results
take-home message: