Humans are a pattern-seeking species. This is obvious in the way people "see" faces and objects in clouds, tortilla chips and the tops of tomato sauce cans.
Music, at it's most basic, is an auditory pattern. Both visual and auditory pattern recognition would be very advantageous to hunter/gatherers and to people living on a savanah or being hunted by other carnivores.
I can see how learning to be attracted to patterns, if only so we can determine if it's threat, friend, or nothing, could lead us to appreciate and even mimic patterns. The more complex a pattern is, the more talented that person is, and voila, we have an artist.