Training a network using a distributed system
This is useful when your network is large enough that the matrix multiplications involved in training become unwieldy on a traditional PC. This problem is particularly prevalent when you have harsh time constraints (e.g. online training). If your thinking about a distributed system because you want to fiddle with network parameters and not have to wait a day before fiddling some more , then simply run multiple instances of the network with different parameters on different machines. That way you can make use of your cluster without dealing with actual distributed computation.
Your training a network to find the number of people in images. Instead of a predefined set of training examples (image-number of people pairs) you decide to have the program pull random images from Google. While the network in processing the image, you must view the image and provide feedback on how many people are actually in the image. Since this is image processing, your network size is probably on the scale of millions of units. And since your providing the feedback in real time the speed of the network’s computations matters. Thus, you should probably invest in a distributed implementation.
Training a network on a GPU
This is the right choice if the major computational bottleneck isn’t the network size, but the size of the training set (though the networks are still generally quite large). Since GPUs are ideal for situations involving applying the same vector/matrix operation across a large number of data sets, they are mainly used when you can use batch training with a very large batch size.
Your training a network to answer questions posed in natural language. You have a huge database of question-answer pairs and don’t mind the network only updating its weights every 10000 questions. With such a large batch size and presumably a rather large network as well, a GPU based implementation would be a good idea.