author list

example:

brief

gap:

example:

RL require estimates of the maximal Q-value, which are difficult to compute in continuous domains with an infinite number of possible actions.

contribution:

example: we introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT), drawing inspiration from economics.

methodology:

example: directly estimates the optimal soft-value functions (LogSumExp) in the maximum entropy RL setting without needing to sample from a policy.

benefits:

example: avoid computing Q-values using out-of-distribution actions which is often a substantial source of error

results:

experiment results

take-home message: