In this paper, we present a method for video semantic mining. Speech signal, video caption text and video frame images are all key factors for a person to understand the video content. Through above observation, we bring forward a method which integrating continuous speech recognition, video caption text recognition and object recognition. The video is firstly segmented to a serial of shots by shot detection. Then the caption text and speech recognition results are treated as two paragraphs of text. The object recognition results are presented by bag of words. The above three aspects of texts are processed by part of speech and stemming. Then only the noun words are kept. At last a video is represented by three bags of words. The words are further depicted as a graph. The graph vertices stand for the words and the edges denote the semantic distance between two neighboring words. In the last step, we apply the dense sub graph finding method to mine the video semantic meaning. Experiments show that our video semantic mining method is efficient.