Macro Segmentation and Content Analysis of TV Broadcast Stream

Article Preview

Abstract:

This study addresses a non-supervised approach to extract TV programs via repetition based detection of the Inter-Programs (IPs) and audio based segmentation and classification algorithm to analyze the massive raw TV stream. Acoustic and visual information are both adopted for IPs detection so as to avoid missing true-positive. Novel audio fingerprints scheme and shot based indexing algorithm are introduced to guarantee the efficient and superior detection performance. After the TV programs are further segmented into clips, Gaussian Mixture Models (GMMs) are used to classify the clips into three types, namely, pure speech, non-pure speech, and non-speech. Experiments on a test dataset composed of more than 500 hours content-unknown TV streams show that the F-measure of the programs extraction and content analysis achieve 0.986 and 0.887 respectively. The experiments also demonstrate that the proposed algorithm for detecting repeated IPs outperforms the state-of-art approach.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

3194-3198

Citation:

Online since:

January 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] L. Wang, Y. Dong, H. Bai, J. Zhangy, C. Huang, W. Liu, Content-based Large Scale Web Audio Copy Detection, International Conference on Multimedia & Expo (ICME), (2012).

DOI: 10.1109/icme.2012.17

Google Scholar

[2] H. Bai, L. Wang, G. Qin, J. Zhang, K. Tao, X. Chang, and Y. Dong, TV program segmentation using multi-modal information fusion, in Proceedings of the 1st ACM International Conference on Multimedia Retrieval, 2011 ACM, New York, NY, USA.

DOI: 10.1145/1991996.1992007

Google Scholar

[3] L. Shang, L. Yang, F. Wang, K. Chan, and X. Hua, Real-time large scale near-duplicate web video retrieval, in ACM MM, (2010).

DOI: 10.1145/1873951.1874021

Google Scholar

[4] A. F. Smeaton, P. Over, and A. R. Doherty. Video shot boundary detection: Seven years of trecvid activity. Computer Vision and Image Understanding, 114(4): 411–418, (2010).

DOI: 10.1016/j.cviu.2009.03.011

Google Scholar

[5] M. Covell and S. Baluja, Advertisement detection and replacement using acoustic and visual repetition, " in MMSP, 06, IEEE 8TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESING, (2006).

DOI: 10.1109/mmsp.2006.285351

Google Scholar

[6] Y. Ke, D. Hoiem, and R. Sukthankar, Computer vision for music identification: Video demonstration, in CVPR, (2005).

DOI: 10.1109/cvpr.2005.106

Google Scholar

[7] J. Haitsma and T. Kalker, Robust audio hashing for content identification, in Content-Based Multimedia Indexing (CBMI), (2001).

Google Scholar

[8] K. El-maleh, M. Klein, G. Petrucci, and P. Kabal, Speech/music discrimination for multimedia applications, IN IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 2000, pp.2445-2448.

DOI: 10.1109/icassp.2000.859336

Google Scholar

[9] J. Saunders, Real-time discrimination of broadcast speech/music, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. 2, (1996).

DOI: 10.1109/icassp.1996.543290

Google Scholar

[10] S. Tranter and D. Reynolds, An overview of automatic speaker diarization systems, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, 2006, pp.1557-1565.

DOI: 10.1109/tasl.2006.878256

Google Scholar

[11] J. Huang, Y. Dong, J. Liu, D. Chengyu, and W. Haila, Sports Audio Segmentation and Classification, Proceedings of IC-NIDC2009, Beijing: 2009, pp.379-383.

Google Scholar