Network coding has emerged as a promising technique to improve network throughput and bandwidth. However, due to high computational complexity, its practicability has remained to be a challenge. At the same time, applications accelerated by GPU are confined to GPU acting as a coprocessor to consume dataset transferred from CPU. Therefore, an aggressive parallel network coding is customized for GPU using CUDA (Compute Unified Device Architecture), in which dataset are partitioned for exploiting both thread-level parallelism and data-level parallelism, and collaboration between GPU and CPU is introduced to decoding with texture cache so that GPU can act as not only data consumer but also data producer. Moreover, random linear network coding is parallelizing on CUDA-enabled GPU to validate proposed techniques. Experimental results demonstrate that it is effective to parallelize network coding on GPU-accelerated system using proposed techniques.