This paper establishes a 3-D localization model and based on this model, it proposes a collaborative localization framework. In this framework, node that observes the object sends its attitude information and the relative position of the object's projection in its camera to the cluster head. The cluster head adopts an algorithm proposed in this paper to select some nodes to participate localization. The localization algorithm is based on least square method. Because the localization framework is based on a 3-D model, the size of the object or other prerequisites is not necessary. At the end of this paper, a simulation is taken on the numbers of nodes selected to locate and the localization accuracy. The result implies that selecting 3~4 nodes is proper. The theoretical analysis and the simulation result also imply that a const computation time cost is paid in this framework with a high localization accuracy (in our simulation environment, a 0.01 meter error).