You probably heard about Neural ODEs, a neural network architecture based on
ordinary differential equations. To train this kind of models, a mysterious
trick called the adjoint state method is used. How does it work, why do we
need it, and how it is related to backpropagation?