In this paper we will introduce a new learning approach for curvilinear bipedal walking of Nao humanoid robot using policy gradient method. A policy of walking is modeled by some policy parameters controlling some factors in programmable central pattern generators. A “Programmable” central pattern generator is made with coupled nonlinear oscillators capable to shape their state equations with some training trajectories. The proposed model has many benefits including smooth walking patterns, and modulation during walking to increase or decrease its speed. A suitable curvilinear walk was achieved, which is very similar to human ordinary walking. This model can be extended and used in Nao soccer player both in standard platform and 3D soccer simulation leagues of Robocup competitions to train different types of motions.