Programming Machine Learning: MNIST benchmark for multi-layer networks

I expect that you’ll start overfitting quite soon, at least if you keep using a simple dataset such as MNIST. That’s what I got from my experiments. Let me know if that matches your experience.