Bacterial Chemotaxis in Silico
Home Biology Software Data Publications Links

BCT | StochSim | Smoldyn

StochSim

Introduction

The computer program StochSim was written by Carl Firth (formerly Carl Morton-Firth) as part of his PhD work at the University of Cambridge (Morton-Firth, 1998). It was developed as part of a study of bacterial chemotaxis as a more realistic way to represent the stochastic features of this signalling pathway and also as a means to handle the large numbers of individual reactions encountered (Morton-Firth & Bray, 1998; Morton-Firth et al., 1999). The program now provides a general purpose biochemical simulator in which individual molecules or molecular complexes are represented as individual software objects. Reactions between molecules occur stochastically, according to probabilities derived from known rate constants. An important feature of the program is its ability to represent multiple post-translational modifications and conformational states of protein molecules.

StochSim consists of a platform-independent core simulation engine encapsulating the algorithm described above and a separate graphical user interface. The program is available for download from SourceForge.

Complementary information about the program is available on the website of Nicolas Le Novère.

Description of the algorithm

Each molecule is represented as a separate software object in StochSim, and the simulation also includes dummy molecules, or "pseudo-molecules", used in the simulation of unimolecular reactions. Time is quantised into a series of discrete, independent time-slices, the size of which is determined by the most rapid reaction in the system. At the start of the simulation, the user assigns the maximum number of molecules, N, the system will use. In each time-slice, one molecule (not a pseudo-molecule) is selected at random from N possibilities (the probability of selection of each molecule is 1/N). Then, another object, either a molecule or a pseudo-molecule, is selected at random from N possibilities. If two molecules are selected, any reaction that occurs will be bimolecular, whereas if one molecule and a pseudo-molecule are selected, it will be unimolecular. Another random number is then generated and used to see if a reaction occurs. The probability of a reaction is retrieved from a look-up table and if the probability exceeds the random number, the particles do not react. On the other hand, if the probability is less than the random number, the particles react, and the system is updated accordingly. The next time-slice then begins with another pair of molecules being selected.

Iteration steps of the StochSim algorithm

Schematic of three iteration steps of the StochSim algorithm. In this highly simplified diagram, the system contains four molecules of an enzyme E, four molecules of its substrate S, and two molecules of the enzyme product, P. In the first step the program happens to choose one S and one P. Referring to a look-up table, it finds that these two molecules cannot react, and so proceeds to the next iteration. In the second time step, one E and one S are selected. The look-up table reveals that these two can react, and gives a probability for this to happen. In this case the program decides (on the basis of a random number) that the reaction does in fact occur, and creates an enzyme substrate complex ES — the first stage of the enzyme-catalysed reaction. In the third iteration, ES has replaced the previously separate E and S, and a new selection is made.

Molecules that can exist in more than one state can be encoded in the program as a "multistate molecule" with a series of binary flags. Each flag represents a state or property of the molecule, such as a conformational state, ligand binding, phosphorylation, methylation, or other covalent modification. The flags specify the instantaneous state of the molecule and may modify the reactions it can perform. For instance, a multistate molecule may participate in a reaction at an increased rate as a result of phosphorylation, or fail to react because it is in an inactive conformation. The flags themselves can be modified in each time-slice as a result of a reaction, or they can be instantaneously equilibrated according to a fixed probability. The latter tactic is used with processes such as ligand binding or conformational change that occur several orders of magnitude faster than other chemical reactions in the system.

Example of a StochSim multistate molecule

Schematic of a multistate molecule used by the StochSim algorithm, in this case the complex of the bacterial aspartate receptor Tar with two other proteins, CheA and CheW. Circular flags represent different binary states of the complex, such as bound to aspartate or not bound, methylated at position #2 or not, charged with a phosphoryl group or not. Reaction probabilities of the complex are modified according to the current state of its binary flags.

If, in a particular time step, StochSim selects one or more multistate molecules, it then proceeds in the following manner. First, any rapidly equilibrated "fast flags" on the molecule are assigned to be on or off according to a weighted probability. A protein-conformation flag, for example, can be set to be active or inactive, according to which other flags of the molecules are currently on. A ligand-binding flag can, if desired, be set in a similar fashion, based on the concentration of ligand and the Kd.

Once the fast flags have been set, the program inspects the reactions available to species A and B. The chemical change associated with each type of reaction (binding, phosphotransfer, methylation, etc.) is represented in the program together with a "base values" of the reaction rate constants. The effect of the state flags on each instance of a reaction during the simulation is accessed from an array of state-dependent scaling factors, calculated at the beginning of the program when the reaction system is being initialised. Values in the array modify the reaction probability according to the particular set of binary flags. In this manner, StochSim calculates a set of probabilities corresponding to the reactions available to the particular states of molecules A and B, and then uses a random number to select which reaction (if any) will be executed in the next step. The reaction will be performed, if appropriate, and the relevant slow flag flipped.

Although it sounds complicated, the above sequence of events within an individual iteration takes place very quickly and even a relatively slow computer can carry out hundreds of thousands of iterations every second. Moreover, the strategy has the advantage of being intuitively simple and close to physical reality. For example, it is easy, if required, to label selected molecules and to follow their changes with time. Lastly, the speed of the program depends not on the number of reactions but on the numbers of molecular species in the simulation (with a time of execution proportional to N2).

Comparison with the Gillespie algorithm

The stochastic simulation of biochemical reactions was pioneered by Gillespie, who developed an elegant and efficient algorithm for this purpose (Gillespie, 1976; Gillespie, 1977). Gillespie showed in a rigorous fashion that his algorithm gives the same result, on average, as conventional kinetic treatments. In ensuing years, the algorithm has been widely used to analyse biochemical kinetics, for example, to simulate the stochastic events in lambda lysogeny (McAdams & Arkin, 1997). In view of its evident success, the question therefore arises: why did we not use the Gillespie algorithm but chose to develop our own formulation?

The Gillespie algorithm makes time steps of variable length, based on the rate constants and population size of each chemical species. The probability of one reaction occurring relative to another is obtained by multiplying the rate constant of each reaction with the numbers of its substrate molecules. A random number is then used to choose which reaction will occur, based on relative probabilities, and another random number determines how long the step will last. The chemical populations are altered according to the stoichiometry of the reaction and the process is repeated. Perhaps because the algorithm was developed at a time when computers were several thousand times slower than they are today, it makes extremely efficient use of CPU time. At each iteration, it selects the reaction most likely to occur and chooses a time step that optimises that reaction, so that the simulation proceeds extremely efficiently.

However, the efficiency of the Gillespie algorithm comes at a cost. The elegant algorithm that selects which reaction to perform, and what time interval to take, does not represent individual molecules in the system. With regard to the reactions of a typical cell signalling pathway, for example, it cannot associate physical quantities with each molecule, nor trace the fate of particular molecules over a period of time. This means that it is not possible to extend this algorithm to a more physically realistic model in which energies and conformational states are associated with each molecule. Similarly, without the ability to associate positional and velocity information with each particle, the algorithm cannot be easily adapted to simulate spatial location or diffusion.

A second deficiency of the Gillespie algorithm (from a cell-biological standpoint) is that it cannot easily handle the reactions of multistate molecules. Protein molecules are very frequently modified in the cell so as to alter their catalytic activity, binding affinity and so on. Cell signalling pathways, for example, carry information in the form of chemical changes such as phosphorylation or methylation, or as conformational state. A multi-protein complex may contain upwards of twenty sites, each of which can often be modified independently and each of which can, in principle, influence how the complex will participate in chemical reactions. With twenty sites, a complex can exist in a total of 220, or one million, unique states, each of which could react in a slightly different way. If our multi-protein complex interacts with only ten other chemical species, a detailed model may contain as many as ten million distinct chemical reactions — a combinatorial explosion. Any program in which the time taken increases in proportion to the number of reactions, as in a conventional, deterministic model, or in the Gillespie method, will come to a halt under these conditions.

To summarise, StochSim is likely to be slower than the Gillespie algorithm in calculating the eventual outcome of a small set of simple biochemical reactions, especially when the number of molecules is large. However, if the system contains molecules that can exist in multiple states, then StochSim will not only be faster but also closer to physical reality. It is easy, if required, to label selected molecules in this program and to follow their changes with time, including changes to their detailed post-translational modification and conformational state. Although the basic algorithm does not consider the spatial location of molecules, the individual-based nature of the program allows the incorporation of spatial information in a straightforward manner.

Spatial representation

The original version of StochSim (1.0) treated the entire reaction system as a uniformly mixed solution. Although this is clearly not how molecules are arranged within living cells, the omission of spatial heterogeneity has been the norm in biochemical simulations because it greatly facilitates modelling and reduces the computational load of simulation. However, as the resolution of our understanding of biochemical processes increases, it is becoming clear that even in bacteria, the simplest of cells, the spatial organisation of molecules often plays an important role. We have therefore extended StochSim to incorporate explicit spatial representation.

In versions of StochSim later than 1.2, a simple two-dimensional spatial structure is implemented, in which nearest-neighbour interactions of molecules (such as clustered receptors on a membrane) can be simulated. The original implementation of this spatial structure only allowed geometries composed of square units with four nearest neighbours, but as of version 1.4, two additional geometries, one composed of triangles and the other of hexagons, are supported. These three are the only three regular tessellations which can cover a two-dimensional surface. The new geometries can be used, for example, to reflect the recent prediction of the structural arrangement of the chemotaxis receptor complex (Shimizu et al., 2000).

The primary purpose of this spatial structure implementation so far has been to study various properties of complex arrays of fixed size and geometry. The binding of other complexes in the reaction system to complexes within the complex array is permitted, but does not alter the geometry of the array. Each node of the complex array has exactly one complex, and three to six nearest-neighbour nodes. The geometry of the array can therefore be triangular, rectangular or hexagonal, and real or toroidal boundaries can be used.

Sample output

Timecourse of number of CheYp molecules in response to a pulse of aspartate

Calculation of the number of CheYp molecules in a bacterium exposed to the attractant aspartate. The continuous line represents the results of numerical integration of approximately 70 kinetic equations; the fluctuating red line shows the stochastic prediction for the same set of reactions obtained using StochSim.

Stochastic simulation of a simple equilibration reaction

Stochastic simulation of a simple equilibration A (black) <—> B (red) showing the effects of the number of molecules in the system. For very large volumes (large numbers of molecules) the timecourse is a smooth line indistinguishable from a deterministic simulation. However, as the volume and the number of molecules becomes smaller, the time course becomes more noisy until, eventually, individual transitions A —> B and B —> A become visible.


Complex array with square geometry Complex array with triangular geometry Complex array with hexagonal geometry

The complex arrays in StochSim 1.4 can have geometries based on squares (left), triangles (centre) or hexagons (right). The three snapshots above are outputs of the activity of a 65 x 65 receptor cluster during a simulation of bacterial chemotaxis. The grey level of each small polygon represents the activity of a single receptor (black = low activity; white = high activity). Snapshots (and animations) such as these, which highlight specific states of the molecules in the spatial structure, can be generated from the StochSim output. Click on each image for a larger version.

Frequently asked questions (FAQ)

Q. How accurate is StochSim?
A. The simulation proceeds in a series of small time steps, dt, and as with any method of numerical integration, this introduces inaccuracies. Small errors can arise in StochSim because the program can simulate at most one reaction event in each interval, dt, whereas in reality molecular events occur at random times. However, the errors are as often positive and negative and therefore not cumulative. Moreover, they can be made as small as required by reducing the size of the time steps.

Q. If a reaction changes the total number of "particles", as in A + B —> C, or A —> B + C will that affect other reactions in the system? Why don't fluctuations in total numbers of particles cause an error?
A. At the start of the simulation the user assigns a maximum number of molecular species, N. All subsequent selections of molecules are made randomly from these N possibilities. The chance of selecting any individual molecule in the system at any step is therefore 1/N. This probability is not affected by other changes in the system.

Q. How are the reaction probabilities calculated? Do these change during the simulation?
A. Reaction probabilities are calculated directly from the experimentally determined kinetic rates, the number of molecules in the system, and the size of the time increment (Morton-Firth, 1998; Morton-Firth & Bray, 1998). Probabilities are calculated at the start of the simulation and do not change thereafter.

Q. Gillespie showed that his algorithm could be related to the chemical Master equation. Has this been done for StochSim?
A. Not explicitly. But the physical assumptions of StochSim have been derived from what Gillespie called the "fundamental hypothesis of stochastic chemical kinetics" (Shimizu & Bray, 2001).

Q. What are pseudo-molecules?
A. These are dummy molecules, created at the start of each run and used in the simulation of unimolecular reactions. The StochSim algorithm always chooses two molecules in each iteration. If one of these happens to be a pseudo-molecule, the program tries to react the other chosen molecule in a unimolecular reaction. Pseudo-molecules allow the program to decide between uni- or bimolecular reactions without having to generate an extra random number.

Q. How does StochSim represent unimolecular reactions?
A. Unimolecular reactions are represented, like other reactions, in a look-up table created at the start of the simulation. This table contains the name of the molecular species and the reaction probability.

Q. From the description of the algorithm it seems that StochSim will visit many molecules before it finds one that reacts. Won't that make it very slow?
A. It is true that in StochSim the majority of time steps are "unproductive" in the sense that no chemical change occurs, but the search algorithm itself is very rapid. As for any simulation, the overall time taken depends on the size and complexity of the system and the accuracy required. For systems of up to ten thousand molecular species capable of multiple states, nothing is faster than StochSim (we believe!).

Q. How do the number of molecules and the number of reactions affect the time of simulation?
A. The execution time of StochSim simulations scales with the maximum number of molecular species (proportional to N2). It is not affected by the number of reactions per se.

Q. How does StochSim deal with constant rates, such as a constant synthesis of ATP or the binding of a diffusible molecule present in vast excess?
A. If species A is produced at a constant rate we represent the source as a dummy complex type X that is unaltered by the reaction. The rate of X —> A remains constant. Similarly, if we want the rate of the bimolecular reaction L + R <—> RL to remain constant, we assume that the concentration of L is fixed and incorporate it into the rate constant.

Q. What happens if the algorithm chooses two molecular species that do not react?
A. The time is updated and the next two molecules are selected.

References

Gillespie, D. T. (1976). A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 22, 403-434.

Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340-2361.

Firth, C. A. J. M. & Bray, D. (2000). Stochastic simulation of cell signaling pathways. In "Computational Modeling of Genetic and Biochemical Networks". Bower, J. M. & Bolouri, H., eds. The MIT Press. Cambridge, MA. 263-286.

Le Novère, N. & Shimizu, T. S. (2001). StochSim: modelling of stochastic biomolecular processes. Bioinformatics 17, 575-576.

McAdams, H. H. & Arkin, A. (1997). Stochastic mechanisms in gene expression. Proc. Natl. Acad. Sci. USA 94, 814-819.

Morton-Firth, C. J. (1998). Stochastic simulation of cell signalling pathways. Ph. D., Cambridge.

Morton-Firth, C. J. & Bray, D. (1998). Predicting temporal fluctuations in an intracellular signalling pathway. J. Theor. Biol. 192, 117-128.

Morton-Firth, C. J., Shimizu, T. S. & Bray, D. (1999). A free-energy-based stochastic simulation of the Tar receptor complex. J. Mol. Biol. 286, 1059-1074.

Shimizu, T. S. and Bray, D. (2001) Computational cell biology - the stochastic approach. In "Foundations of Systems Biology". Kitano, H., ed. The MIT Press. Cambridge, MA.

Shimizu, T . S., Le Novère, N., Levin, M. D., Beavil, A. J., Sutton, B. J. & Bray, D. (2000). Molecular model of a lattice of signalling proteins involved in bacterial chemotaxis. Nat. Cell. Biol. 2, 792-796.

Site designed by Matthew Levin. This page updated 5 September 2010.
Valid HTML 4.01 Transitional