In this paper, we proposed left-right hidden Markov models (HMMs) combination with k-means threshold of Likelihood ratio test (LRT) to identify the start and end of the speech. This method builds two models of non-speech and speech but not two states, i.e. each model could conclude several states. In the experiments we present the Voice Activity Detection (VAD) results between two states hidden semi-Markov model (HSMM) and proposed algorithm. We also compare accuracy and robust between the k-means threshold and the adaptive threshold in high signal to noise rate in the background noise. It presents that k-means threshold is more effective than the adaptive threshold and the proposed method also make a better performance than two states HSMM based VAD, especially in the low signal-to-noise ratio (SNR) environment.