Math Bio Seminar
Sequential decision-making (DM) problems are ubiquitous in many scientific domains, with application areas spanning e.g., engineering, economics, management and biology. These problems involve a feedback loop where, at each time-step, a decision-maker determines a decision based on the available information. The result, from the viewpoint of an external observer, is a sequence of sensed information and decisions that are iterated over time. This talk is focused on sequential DM problems in which the agent aims to track a desired behavior while simultaneously minimizing a task-specific cost. After setting-up the problem we show that this is convex in the space of densities even when the task-specific cost is not. We then give an explicit expression for the optimal policy, which, rather counter- intuitively, we show that must be randomized. Finally, we leverage the structure of the policies to tackle an inverse problem where we want to the task-specific cost by observing actions sampled from the policy. Across the talk, the results are illustrated via concrete examples from different domains.