Thanks. Makes sense. So for dropout one essentially pretends that node doesn’t exist.
It is true, it gets a bit harder for some algorithms, but then again I think it is just one more tensor multiplication added. However, other advanced algorithms are quite easy to implement. For example, I implemented the Adam optimizer. This was straightforward and just a few lines of code, and it provides a wonderful speed-up.