Fine-Grained Parallel and Distributed Spatial Stochastic Simulation of Biological Reactions
To date, discrete event stochastic simulations of large scale biological reaction systems are extremely compute-intensive and time-consuming. Besides, it has been widely accepted that spatial factor plays a critical role in the dynamics of most biological reaction systems. The NSM (the Next Sub-Volume Method), a spatial variation of the Gillespie’s stochastic simulation algorithm (SSA), has been proposed for spatially stochastic simulation of those systems. While being able to explore high degree of parallelism in systems, NSM is inherently sequential, which still suffers from the problem of low simulation speed. Fine-grained parallel execution is an elegant way to speed up sequential simulations. Thus, based on the discrete event simulation framework JAMES II, we design and implement a PDES (Parallel Discrete Event Simulation) TW (time warp) simulator to enable the fine-grained parallel execution of spatial stochastic simulations of biological reaction systems using the ANSM (the Abstract NSM), a parallel variation of the NSM. The simulation results of classical Lotka-Volterra biological reaction system show that our time warp simulator obtains remarkable parallel speed-up against sequential execution of the NSM.I.IntroductionThe goal of Systems biology is to obtain system-level investigations of the structure and behavior of biological reaction systems by integrating biology with system theory, mathematics and computer science , since the isolated knowledge of parts can not explain the dynamics of a whole system. As the complement of “wet-lab” experiments, stochastic simulation, being called the “dry-computational” experiment, plays a more and more important role in computing systems biology . Among many methods explored in systems biology, discrete event stochastic simulation is of greatly importance , since a great number of researches have present that stochasticity or “noise” have a crucial effect on the dynamics of small population biological reaction systems . Furthermore, recent research shows that the stochasticity is not only important in biological reaction systems with small population but also in some moderate/large population systems .To date, Gillespie’s SSA  is widely considered to be the most accurate way to capture the dynamics of biological reaction systems instead of traditional mathematical method . However, SSA-based stochastic simulation is confronted with two main challenges: Firstly, this type of simulation is extremely time-consuming, since when the types of species and the number of reactions in the biological system are large, SSA requires a huge amount of steps to sample these reactions; Secondly, the assumption that the systems are spatially homogeneous or well-stirred is hardly met in most real biological systems and spatial factors play a key role in the behaviors of most real biological systems . The next sub-volume method (NSM) , presents us an elegant way to access the special problem via domain partition. To our disappointment, sequential stochastic simulation with the NSM is still very time-consuming, and additionally introduced diffusion among neighbor sub-volumes makes things worse. Whereas, the NSM explores a very high degree of parallelism among sub-volumes, and parallelization has been widely accepted as the most meaningful way to tackle the performance bottleneck of sequential simulations . Thus, adapting parallel discrete event simulation (PDES) techniques to discrete event stochastic simulation would be particularly promising. Although there are a few attempts have been conducted , research in this filed is still in its infancy and many issues are in need of further discussion. The next section of the paper presents the background and related work in this domain. In section III, we give the details of design and implementation of model interfaces of LP paradigm and the time warp simulator based on the discrete event simulation framework JAMES II; the benchmark model and experiment results are shown in Section IV; in the last section, we conclude the paper with some future work.II. Background and Related WorkA. Parallel Discrete Event Simulation (PDES)The notion Logical Process (LP) is introduced to PDES as the abstract of the physical process , where a system consisting of many physical processes is usually modeled by a set of LP. LP is regarded as the smallest unit that can be executed in PDES and each LP holds a sub-partition of the whole system’s state variables as its private ones. When a LP processes an event, it can only modify the state variables of its own. If one LP needs to modify one of its neighbors’ state variables, it has to schedule an event to the target neighbor. That is to say event message exchanging is the only way that LPs interact with each other. Because of the data dependences or interactions among LPs, synchronization protocols have to be introduced to PDES to guarantee the so-called local causality constraint (LCC) . By now, there are a larger number of synchronization algorithms have been proposed, e.g. the null-message , the time warp (TW) , breath time warp (BTW)  and etc. According to whether can events of LPs be processed optimistically, they are generally divided into two types: conservative algorithms and optimistic algorithms. However, Dematté and Mazza have theoretically pointed out the disadvantages of pure conservative parallel simulation for biochemical reaction systems . B. NSM and ANSM The NSM is a spatial variation of Gillespie’ SSA, which integrates the direct method (DM)  with the next reaction method (NRM) . The NSM presents us a pretty good way to tackle the aspect of space in biological systems by partitioning a spatially inhomogeneous system into many much more smaller “homogeneous” ones, which can be simulated by SSA separately. However, the NSM is inherently combined with the sequential semantics, and all sub-volumes share one common data structure for events or messages. Thus, directly parallelization of the NSM may be confronted with the so-called boundary problem and high costs of synchronously accessing the common data structure . In order to obtain higher efficiency of parallel simulation, parallelization of NSM has to firstly free the NSM from the sequential semantics and secondly partition the shared data structure into many “parallel” ones. One of these is the abstract next sub-volume method (ANSM) . In the ANSM, each sub-volume is modeled by a logical process (LP) based on the LP paradigm of PDES, where each LP held its own event queue and state variables (see Fig. 1). In addition, the so-called retraction mechanism was introduced in the ANSM too (see algorithm 1). Besides, based on the ANSM, Wang etc.  have experimentally tested the performance of several PDES algorithms in the platform called YH-SUPE . However, their platform is designed for general simulation applications, thus it would sacrifice some performance for being not able to take into account the characteristics of biological reaction systems. Using the similar ideas of the ANSM, Dematté and Mazza have designed and realized an optimistic simulator. However, they processed events in time-stepped manner, which would lose a specific degree of precisions compared with the discrete event manner, and it is very hard to transfer a time-stepped simulation to a discrete event one. In addition, Jeschke etc. have designed and implemented a dynamic time-window simulator to execution the NSM in parallel on the grid computing environment, however, they paid main attention on the analysis of communication costs and determining a better size of the time-window.Fig. 1: the variations from SSA to NSM and from NSM to ANSMC. JAMES II JAMES II is an open source discrete event simulation experiment framework developed by the University of Rostock in Germany. It focuses on high flexibility and scalability . Based on the plug-in scheme , each function of JAMES II is defined as a specific plug-in type, and all plug-in types and plug-ins are declared in XML-files . Combined with the factory method pattern JAMES II innovatively split up the model and simulator, which makes JAMES II is very flexible to add and reuse both of models and simulators. In addition, JAMES II supports various types of modelling formalisms, e.g. cellular automata, discrete event system specification (DEVS), SpacePi, StochasticPi and etc.. Besides, a well-defined simulator selection mechanism is designed and developed in JAMES II, which can not only automatically choose the proper simulators according to the modeling formalism but also pick out a specific simulator from a serious of simulators supporting the same modeling formalism according to the user settings .III. The Model Interface and SimulatorAs we have mentioned in section II (part C), model and simulator are split up into two separate parts. Thus, in this section, we introduce the designation and implementation of model interface of LP paradigm and more importantly the time warp simulator.A. The Mod Interface of LP ParadigmJAMES II provides abstract model interfaces for different modeling formalism, based on which Wang etc. have designed and implemented model interface of LP paradigm. However, this interface is not scalable well for parallel and distributed simulation of larger scale systems. In our implementation, we accommodate the interface to the situation of parallel and distributed situations. Firstly, the neighbor LP’s reference is replaced by its name in LP’s neighbor queue, because it is improper even dangerous that a local LP hold the references of other LPs in remote memory space. In addition, (pseudo-)random number plays a crucial role to obtain valid and meaningful results in stochastic simulations. However, it is still a very challenge work to find a good random number generator (RNG) . Thus, in order to focus on our problems, we introduce one of the uniform RNGs of JAMES II to this model interface, where each LP holds a private RNG so that random number streams of different LPs can be independent stochastically. B. The Time Warp SimulatorBased on the simulator interface provided by JAMES II, we design and implement the time warp simulator, which contains the (master-)simulator, (LP-)simulator. The simulator works strictly as master/worker(s) paradigm for fine-grained parallel and distributed stochastic simulations. Communication costs are crucial to the performance of a fine-grained parallel and distributed simulation. Based on the Java remote method invocation (RMI) mechanism, P2P (peer-to-peer) communication is implemented among all (master-and LP-)simulators, where a simulator holds all the proxies of targeted ones that work on remote workers. One of the advantages of this communication approach is that PDES codes can be transferred to various hardwire environment, such as Clusters, Grids and distributed computing environment, with only a little modification; The other is that RMI mechanism is easy to realized and independent to any other non-Java libraries. Since the straggler event problem, states have to be saved to rollback events that are pre-processed optimistically. Each time being modified, the state is cloned to a queue by Java clone mechanism. Problem of this copy state saving approach is that it would cause loads of memory space. However, the problem can be made up by a condign GVT calculating mechanism. GVT reduction scheme also has a significant impact on the performance of parallel simulators, since it marks the highest time boundary of events that can be committed so that memories of fossils (processed events and states) less than GVT can be reallocated. GVT calculating is a very knotty for the notorious simultaneous reporting problem and transient messages problem. According to our problem, another GVT algorithm, called Twice Notification (TN-GVT) (see algorithm 2), is contributed to this already rich repository instead of implementing one of GVT algorithms in reference  and .This algorithm looks like the synchronous algorithm described in reference  (pp. 114), however, they are essentially different from each other. This algorithm has never stopped the simulators from processing events when GVT reduction, while algorithm in reference  blocks all simulators for GVT calculating. As for the transient message problem, it can be neglect in our implementation, because RMI based remote communication approach is synchronized, that means a simulator will not go on its processing until the remote the massage get to its destination. And because of this, the high-costs message acknowledgement, prevalent over many classical asynchronous GVT algorithms, is not needed anymore too, which should be constructive to the whole performance of the time warp simulator.IV. Benchmark Model and Experiment ResultsA. The Lotka-Volterra Predator-prey SystemIn our experiment, the spatial version of Lotka-Volterra predator-prey system is introduced as the benchmark model (see Fig. 2). We choose the system for two considerations: 1) this system is a classical experimental model that has been used in many related researches , so it is credible and the simulation results are comparable; 2) it is simple but helpful enough to test the issues we are interested in. The space of predator-prey System is partitioned into a 2D NXN grid, where N denotes the edge size of the grid. Initially the population of the Grass, Preys and Predators are set to 1000 in each single sub-volume (LP). In Fig. 2, r1, r2, r3 stand for the reaction constants of the reaction 1, 2 and 3 respectively. We use dGrass, dPrey and dPredator to stand for the diffusion rate of Grass, Prey and Predator separately. Being similar to reference , we also take the assumption that the population of the grass remains stable, and thus dGrass is set to zero.R1: Grass + Prey ->2Prey (1)R2: Predator +Prey -> 2Predator (2)R3: Predator -> NULL (3)r1=0.01; r2=0.01; r3=10 (4)dGrass=0.0; dPrey=2.5; dPredato=5.0 (5)Fig. 2: predator-prey systemB. Experiment ResultsThe simulation runs have been executed on a Linux Cluster with 40 computing nodes. Each computing node is equipped with two 64bit 2.53 GHz Intel Xeon QuadCore Processors with 24GB RAM, and nodes are interconnected with Gigabit Ethernet connection. The operating system is Kylin Server 3.5, with kernel 2.6.18. Experiments have been conducted on the benchmark model of different size of mode to investigate the execution time and speedup of the time warp simulator. As shown in Fig. 3, the execution time of simulation on single processor with 8 cores is compared. The result shows that it will take more wall clock time to simulate much larger scale systems for the same simulation time. This testifies the fact that larger scale systems will leads to more events in the same time interval. More importantly, the blue line shows that the sequential simulation performance declines very fast when the mode scale becomes large. The bottleneck of sequential simulator is due to the costs of accessing a long event queue to choose the next events. Besides, from the comparison between group 1 and group 2 in this experiment, we could also conclude that high diffusion rate increased the simulation time greatly both in sequential and parallel simulations. This is because LP paradigm has to split diffusion into two processes (diffusion (in) and diffusion (out) event) for two interactive LPs involved in diffusion and high diffusion rate will lead to high proportional of diffusion to reaction. In the second step shown in Fig. 4, the relationship between the speedups from time warp of two different model sizes and the number of work cores involved are demonstrated. The speedup is calculated against the sequential execution of the spatial reaction-diffusion systems model with the same model size and parameters using NSM.Fig. 4 shows the comparison of speedup of time warp on a 64X64 grid and a 100X100 grid. In the case of a 64X64 grid, under the condition that only one node is used, the lowest speedup (a little bigger than 1) is achieved when two cores involved, and the highest speedup (about 6) is achieved when 8 cores involved. The influence of the number of cores used in parallel simulation is investigated. In most cases, large number of cores could bring in considerable improvements in the performance of parallel simulation. Also, compared with the two results in Fig. 4, the simulation of larger model achieves better speedup. Combined with time tests (Fig. 3), we find that sequential simulator’s performance declines sharply when the model scale becomes very large, which makes the time warp simulator get better speed-up correspondingly.Fig. 3: Execution time (wall clock time) of Seq. and time warp with respect to different model sizes (N=32, 64, 100, and 128) and model parameters based on single computing node with 8 cores. Results of the test are grouped by the diffusion rates (Group 1: Sequential 1 and Time Warp 1. dPrey=2.5, dPredator=5.0; Group 2: dPrey=0.25, dPredator=0.5, Sequential 2 and Time Warp 2).Fig. 4: Speedup of time warp with respect to the number of work cores and the model size (N=64 and 100). Work cores are chose from one computing node. Diffusion rates are dPrey=2.5, dPredator=5.0 and dGrass=0.0.V. Conclusion and Future WorkIn this paper, a time warp simulator based on the discrete event simulation framework JAMES II is designed and implemented for fine-grained parallel and distributed discrete event spatial stochastic simulation of biological reaction systems. Several challenges have been overcome, such as state saving, roll back and especially GVT reduction in parallel execution of simulations. The Lotka-Volterra Predator-Prey system is chosen as the benchmark model to test the performance of our time warp simulator and the best experiment results show that it can obtain about 6 times of speed-up against the sequential simulation. The domain this paper concerns with is in the infancy, many interesting issues are worthy of further investigated, e.g. there are many excellent PDES optimistic synchronization algorithms (e.g. the BTW) as well. Next step, we would like to fill some of them into JAMES II. In addition, Gillespie approximation methods (tau-leap etc.) sacrifice some degree of precision for higher simulation speed, but still could not address the aspect of space of biological reaction systems. The combination of spatial element and approximation methods would be very interesting and promising; however, the parallel execution of tau-leap methods should have to overcome many obstacles on the road ahead.AcknowledgmentThis work is supported by the National Natural Science Foundation of China (NSF) Grant (No.60773019) and the Ph.D. Programs Foundation of Ministry of Education of China (No. 200899980004). The authors would like to show their great gratitude to Dr. Jan Himmelspach and Dr. Roland Ewald at the University of Rostock, Germany for their invaluable advice and kindly help with JAMES II.ReferencesH. Kitano, "Computational systems biology." Nature, vol. 420, no. 6912, pp. 206-210, November 2002.H. Kitano, "Systems biology: a brief overview." Science (New York, N.Y.), vol. 295, no. 5560, pp. 1662-1664, March 2002.A. Aderem, "Systems biology: Its practice and challenges," Cell, vol. 121, no. 4, pp. 511-513, May 2005. [Online]. Available: http://dx.doi.org/10.1016/j.cell.2005.04.020.H. de Jong, "Modeling and simulation of genetic regulatory systems: A literature review," Journal of Computational Biology, vol. 9, no. 1, pp. 67-103, January 2002.C. W. Gardiner, Handbook of Stochastic Methods: for Physics, Chemistry and the Natural Sciences (Springer Series in Synergetics), 3rd ed. Springer, April 2004.D. T. Gillespie, "Simulation methods in systems biology," in Formal Methods for Computational Systems Biology, ser. Lecture Notes in Computer Science, M. Bernardo, P. Degano, and G. Zavattaro, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, vol. 5016, ch. 5, pp. 125-167.Y. Tao, Y. Jia, and G. T. Dewey, "Stochastic fluctuations in gene expression far from equilibrium: Omega expansion and linear noise approximation," The Journal of Chemical Physics, vol. 122, no. 12, 2005.D. T. Gillespie, "Exact stochastic simulation of coupled chemical reactions," Journal of Physical Chemistry, vol. 81, no. 25, pp. 2340-2361, December 1977.D. T. Gillespie, "Stochastic simulation of chemical kinetics," Annual Review of Physical Chemistry, vol. 58, no. 1, pp. 35-55, 2007.D. T. Gillespie, "Approximate accelerated stochastic simulation of chemically reacting systems," The Journal of Chemical Physics, vol. 115, no. 4, pp. 1716-1733, 2001.J. Himmelspach, R. Ewald, and A. M. Uhrmacher, "A flexible and scalable experimentation layer," in WSC '08: Proceedings of the 40th Conference on Winter Simulation. Winter Simulation Conference, 2008, pp. 827-835.J. Himmelspach and A. M. Uhrmacher, "Plug'n simulate," in 40th Annual Simulation Symposium (ANSS'07). Washington, DC, USA: IEEE, March 2007, pp. 137-143.R. Ewald, J. Himmelspach, M. Jeschke, S. Leye, and A. M. Uhrmacher, "Flexible experimentation in the modeling and simulation framework james ii-implications for computational systems biology," Brief Bioinform, vol. 11, no. 3, pp. bbp067-300, January 2010.A. Uhrmacher, J. Himmelspach, M. Jeschke, M. John, S. Leye, C. Maus, M. Röhl, and R. Ewald, "One modelling formalism & simulator is not enough! a perspective for computational biology based on james ii," in Formal Methods in Systems Biology, ser. Lecture Notes in Computer Science, J. Fisher, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, vol. 5054, ch. 9, pp. 123-138. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-68413-8_9.R. Ewald, J. Himmelspach, and A. M. Uhrmacher, "An algorithm selection approach for simulation systems," pads, vol. 0, pp. 91-98, 2008.Bing Wang, Jan Himmelspach, Roland Ewald, Yiping Yao, and Adelinde M Uhrmacher. Experimental analysis of logical process simulation algorithms in james ii[C]// In M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, editors, Proceedings of the Winter Simulation Conference, IEEE Computer Science, 2009. 1167-1179.Ewald, J. Rössel, J. Himmelspach, and A. M. Uhrmacher, "A plug-in-based architecture for random number generation in simulation systems," in WSC '08: Proceedings of the 40th Conference on Winter Simulation. Winter Simulation Conference, 2008, pp. 836-844.J. Elf and M. Ehrenberg, "Spontaneous separation of bi-stable biochemical systems into spatial domains of opposite phases." Systems biology, vol. 1, no. 2, pp. 230-236, December 2004.K. Takahashi, S. Arjunan, and M. Tomita, "Space in systems biology of signaling pathways? Towards intracellular molecular crowding in silico," FEBS Letters, vol. 579, no. 8, pp. 1783-1788, March 2005.J. V. Rodriguez, J. A. Kaandorp, M. Dobrzynski, and J. G. Blom, "Spatial stochastic modelling of the phosphoenolpyruvate-dependent phosphotransferase (pts) pathway in escherichia coli," Bioinformatics, vol. 22, no. 15, pp. 1895-1901, August 2006.D. Ridgway, G. Broderick, and M. Ellison, "Accommodating space, time and randomness in network simulation," Current Opinion in Biotechnology, vol. 17, no. 5, pp. 493-498, October 2006.J. V. Rodriguez, J. A. Kaandorp, M. Dobrzynski, and J. G. Blom, "Spatial stochastic modelling of the phosphoenolpyruvate-dependent phosphotransferase (pts) pathway in escherichia coli," Bioinformatics, vol. 22, no. 15, pp. 1895-1901, August 2006.W. G. Wilson, A. M. Deroos, and E. Mccauley, "Spatial instabilities within the diffusive lotka-volterra system: Individual-based simulation results," Theoretical Population Biology, vol. 43, no. 1, pp. 91-127, February 1993.K. Kruse and J. Elf. Kinetics in spatially extended systems. In Z. Szallasi, J. Stelling, and V. Periwal, editors, System Modeling in Cellular Biology. From Concepts to Nuts and Bolts, pages 177–198. MIT Press, Cambridge, MA, 2006.M. A. Gibson and J. Bruck, "Efficient exact stochastic simulation of chemical systems with many species and many channels," The Journal of Physical Chemistry A, vol. 104, no. 9, pp. 1876-1889, March 2000.R. M. Fujimoto, Parallel and Distributed Simulation Systems (Wiley Series on Parallel and Distributed Computing). Wiley-Interscience, January 2000.Y. Yao and Y. Zhang, “Solution for analytic simulation based on parallel processing,” Journal of System Simulation, vol. 20, No.24, pp. 6617–6621, 2008.G. Chen and B. K. Szymanski, "Dsim: scaling time warp to 1,033 processors," in WSC '05: Proceedings of the 37th conference on Winter simulation. Winter Simulation Conference, 2005, pp. 346-355.M. Jeschke, A. Park, R. Ewald, R. Fujimoto, and A. M. Uhrmacher, "Parallel and distributed spatial simulation of chemical reactions," in 2008 22nd Workshop on Principles of Advanced and Distributed Simulation. Washington, DC, USA: IEEE, June 2008, pp. 51-59.B. Wang, Y. Yao, Y. Zhao, B. Hou, and S. Peng, "Experimental analysis of optimistic synchronization algorithms for parallel simulation of reaction-diffusion systems," High Performance Computational Systems Biology, International Workshop on, vol. 0, pp. 91-100, October 2009.L. Dematté and T. Mazza, "On parallel stochastic simulation of diffusive systems," in Computational Methods in Systems Biology, M. Heiner and A. M. Uhrmacher, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, vol. 5307, ch. 16, pp. 191-210.D. R. Jefferson, "Virtual time," ACM Trans. Program. Lang. Syst., vol. 7, no. 3, pp. 404-425, July 1985.J. S. Steinman, "Breathing time warp," SIGSIM Simul. Dig., vol. 23, no. 1, pp. 109-118, July 1993. [Online]. Available: http://dx.doi.org/10.1145/174134.158473 S. K. Park and K. W. Miller, "Random number generators: good ones are hard to find," Commun. ACM, vol. 31, no. 10, pp. 1192-1201, October 1988.
F. Xing et al., "Fine-Grained Parallel and Distributed Spatial Stochastic Simulation of Biological Reactions", Advanced Materials Research, Vol. 345, pp. 104-112, 2012