... representation. In the figure there are threeexplicit regions. The left region represents speech (0–8 sec-onds), the middle region represents silence (8–10 seconds),and the right region represents ... andnonperiodic speech to short-time stationary and periodicspeech, due to phoneme transitions (e.g., consonant tovowel). On the other hand, music and environmental soundscan be periodic or monotonic, ... the potential beginning of the music segment and, at the same time, the measurement ofsegment duration begins. The algorithm then waits for the transition from music to speech (t2), and if the...