Training may be based on reinforcement, as common in real life, rather than explicit example. However, reinforcement can be reduced to the case of explicit examples, using dynamic programming.
Regression can be reduced to surface learning or conditional probability learning, which are therefore harder.
Can be reduced to unconditional probability learning, which is therefore harder.
This is the hardest of all. Consider that you never directly observe a probability. Probability learning is also the most lucrative, since it subsumes the rest. The learning process itself can be modeled with probabilities (so-called Bayesian learning).
Note the similarity with compression of information. The first issue is how to design an encoder/decoder pair given knowledge of the information's distribution. This was solved by Huffman and arithmetic coding. The second issue is how to discover this distribution automatically. This is still unsolved but Lempel-Ziv does a good job. Many coding algorithms, especially image coders, unfortunately do not distinguish these two issues; the same is true for many learning algorithms. I think learning research should strive to recover the distinction and embrace probabilistic models as the representation of a domain.