Authors: Ming Cao, Zheng Zhou
Abstract: As Failure evolution processes algorithms of brittle rocks usually comply with the Single Instruction Multiple Data (SIMD) model, implementation efforts using such hardware resources are suitable. Graphics processor (GPU) is available SIMD hardware component nowadays, which can lead to substantial increase of computing performance. In this paper we propose a novel parallel cellular automaton algorithm for failure evolution processes of brittle rocks based on GPU. The details of implementation and optimized methods are presented. The performance results show that our GPU implementation achieves 39 times faster than original algorithm on common general purpose processor (CPU).
268
Authors: Suren Chilingaryan, Andrei Shkarin, Roman Shkarin, Matthias Vogelgesang, Sergey Tsapko
Abstract: There are various vendors of FFT libraries, but there is no software available for it automatic benchmarking on all available devices. In this article an application that allows easy measure the performance and precision of various FFT libraries on the available GPUs and CPUs is presented. This application has been used to find out the fastest FFT library for NVIDIA GTX TESLA and NVIDIA GTX TITAN. The obtained results shown that the best implementation is provided by cuFFT library developed by NVIDIA.
673
Authors: Qing Hu Zhang, Dong Wang, Ya Peng Jiang, Jun Quan Chen
Abstract: We present a parallel solution based on CUDA for accelerating the computation for solving large-scale Finite Element equations in electrical and magnetic field. JCG is used for solving equations and corresponding kernel function is designed for spMV. A computation speed test for solving FE equations is taken on NVIDIA Tesla K20c GPU hardware platform, the result proves that the method of kernel can reach 17.1 times faster than the solution using CPU, however it cannot ensure the advantage with CPU if we only use the lib functions on GPU to solve equations.
207
Authors: Song Wang, Shan Liang Yang, Ge Li
Abstract: This paper builds an infrared scene of sphere target based on JAMSE, which provides EO/IR environment and is suite to build infrared imaging simulation system of engineering and engagement-level. In addition, to speed up this infrared imaging simulation, we analyzed the process of external rendering mode, which is applied in JMAES EO/IR environment, and found the external rendering image compounding is a highly independently process, which is suite to parallel computing. After testing on NVIDIA TESLA C2075 GPU with CUDA, and comparing the performance with the corresponding sequentialprocess on CPU, we got a satisfied result. This process obtains a speed up of over 10.
2045
Authors: T.R. Vuyets, Vladislav A. Ovchinnikov
Abstract: Digital holography is a comparatively new observation method for micron-sized particles. It is based on numerical reconstruction of recorded interference fringe. Calculation processes for reconstruction are both time- and memory-intensive. The aim of this study was to develop a faster, more efficient algorithm for digital hologram reconstruction. To this purpose Central Processing Unit (CPU) and Graphics Processing Unit (GPU) programming were implemented. For the problem solving the algorithms’ run-time for both configurations was measured. The results showed that the algorithm using a GPU board is faster and more suitable for reconstruction processes. Thus, it makes possible the accomplishment of real-time analysis.
949
Authors: Ping Zhou, Mei Liu
Abstract: The Many stereo matching problems can be converted to energy minimization problem, by establishment of special network graph to obtain the minimum graph cut, and then obtaining the optimal solution. For graph cuts algorithm, complete network graph include all vertices and disparity edges, the computation of time and space is huge. In this paper, we put forward a method by combing local and global stereo matching, set up a reduced network graph by the possible disparities values for each pixel, and then global optimization, to slove the maximum flow polynomial through CUDA parallel computing, greatly reduced the consumption of time and space.
486
Authors: Zheng Feng Zhan, Ge Li, Xue He Zhang
Abstract: Aiming at the shortage of SIFT algorithm in time comsumption and getting less matching points. On the one hand, the paper improves the original SIFT algorithm, it proposes regional growth algorithm based on SIFT, so you can get many matching points which are good for generating the disparity map; on the other hand, the paper uses the CPU and GPU heterogeneous platforms and analyses the CUDA programming model and memory model. This paper analyses the algorithm in detail, so the algorithm can be carried out on CUDA. Experimental results show that, compared with the original algorithm, the algorithm is about 10 times faster, and generate good disparity map.
1652
Authors: Yong Xiang Xia, Zhi Cai Shi, Yu Zhang, Jian Dai
Abstract: To optimize training procedure of IDS based on SVM and reduce time consumption, a SVM intrusion detection method based on GPU is proposed in the study. During the simulation experiments with KDD Cup 1999 data, GPU-based parallel computing model is adopted. Results of the simulation experiments demonstrate that time consumption in the training procedure of IDS is reduced, and performance of IDS is kept as usual.
606
Abstract: Article careful analysis of existing processes and target tracking algorithm and part of the video image processing algorithm that consume large amount of data based CPU. Each algorithm for image pre-processing module and tracking module algorithm designed operational processes. Several studies algorithm for image pre-processing module, and each was carried out in parallel GPU accelerated processing, and were tested in a unified CUDA computing device. Experimental data show that some parts of the algorithm can be improved after the adoption of CUDA target tracking algorithm optimized overall efficiency, has some practical significance.
122
Authors: Li Li Dong, Zong Shuai Ma, Wei Dong, Xiang Zhang
Abstract: This paper analyzed the employees' MMPI Psychological data of a company. Aiming at the problem that traditional K-Means algorithm is sensitive to the initial clustering center, this paper used hierarchical clustering algorithm CURE to mitigate the problem. Finally using CUDA technology clustered several times, so as to improve the execution efficiency of the algorithm. Through experimental verification, the improved K-Means algorithm behaved well in both execution efficiency and clustering results.
1664