GPUs are high performance co-processors of CPU for scientific computing including CFD. We present an optimistic shared memory allocation strategy to solve 2D CFD problems using Red-Black SOR method on GPU with CUDA (Compute Unified Device Architecture). Lid-driven results are compared with the benchmark data. The speed up ratio of same problem size by using NVDIA GTX480 and Intel Core-Dual 3.0GHz processor is discussed, the performance of GPU is 120 times faster than the sequential code on CPU with the problem size of 756756. Based on this work, we conclude that using the memory hierarchy properly has a key role in improving the computational performance of GPU.