A research effort aimed at creating a computer vision-based augmented reality assembly system is presented. The architecture of this system for assembly interaction is described. To realize intuitive human-computer interaction, a computer vision-based hand tracking and gesture recognition approach is applied. This approach uses the color-based method to segment the hand from background, and locate the hand position by the marker attached on hand, and then recognizes the gesture according to the geometry constraint of human hand. In comparison with the traditional approaches, this new method does not require a stationary camera, and is not sensitive with intensity difference. So, it provides a real time performance and is easy to realize. Moreover, occlusion identification is studied in this paper to raise the real and virtual objects combination effect. Finally, a prototype system is provided to demonstrate the effectiveness and robustness of the presented approaches.