Gillespie algorithm for stochastic simulations of signaling pathways – vectorization in MATLAB

Modeling of signaling pathways is an important part of cancer research, as it is essential to understand how proteins interact with each other to provide or impair a specific cell function. There are two main approaches to tackle that problem mathematically: 1) deterministic approach, in which we assume that the number of molecules is large and stochastic effects are negligible, 2) stochastic approach (typically using Gillespie algorithm), in which we model exact number of molecules and take into account inherent system noise. Today I’ll focus on the stochastic approach. You can find online plenty of packages to perform the stochastic simulations using Gillespie algorithm, but I haven’t seen any that is harnessing the vectorization in MATLAB. Today I will show that using vectorization we can increase the speed of the simulations about 100 fold. Such an increase in computational speed is very important, as more and more frequently agent based models incorporate signaling pathways on the cell level.

Let us start with basic information about the Gillespie algorithm. Using the algorithm we describe the change in the number of molecules from K different species (x1(t),…,xK(t)) that at each moment can undergo one of L different biochemical reactions. Each reaction has a specific propensity function, e.g. biding of two molecules from different species can have propensity r*x1(t)*x2(t) where r is biding rate. The propensities determine the time to next reaction (exponentially distributed) and which reaction occurs next (discrete distribution with propensity based probabilites). After a reaction occurs the system is updated according to stoichiometric matrix, i.e. matrix describing how many reactants are removed from the system and what products are created.

As a working example we will consider a very basic gene expression pathway – we will describe the number of mRNAs and corresponding proteins. Four considered reactions will be: transcription, translation, mRNA degradation, and protein degradation. Ok, let us start coding by defining considered pathway in the main script.

%DEFINING REACTION RATES = 1/10;     %transcription rate = 10;   %translation rate = 1/150; %mRNA degradation rate = 1/30;  %protein degradation rate

prop = @(x,par)([,...      %transcription, one mRNA molecule created
       *x(1),... %translation, one protein created
       *x(1),... %mRNA degradation, one mRNA molecule removed
       *x(2)]);  %protein degradation, one protein molecule removed

init = [0;1];

%column corresponds to the reaction, row corresponds to the molecule
%order as in prop and init variables
stoch = [1 0 -1 0;... 0 1 0 -1];

tmesh = linspace(0,1000,100) ;

Now in a few lies we can code the whole Gillespie algorithm in a separate function file (without using any vectorization techniques first).

function trajectory = SSA(tmesh, par,prop,stoch, init )
   %tmesh - time mesh on which solution should be returned
   %per - parameters of the pathway
   %prop - definition of propensity functions
   %stoch - stochiometric matrix
   %init - initial condition for the pathway

   t = 0;                                          %current time
   state = init(:);                                %variable with current system state
   trajectory = zeros(length(init),length(tmesh)); %preparing output trajectory
   trajectory(:,1) = init(:);                      %setting initial value as the first element in trajectory
   cindx = 2;                                      %current trajectory index
   N = length(tmesh);                              %number of time points

   while t<tmesh(end)
      Q = feval(prop,state,par);          %calculating propensities of the reactions
      Qs = sum(Q);                        %total propensity
      dt = -log(rand())/Qs;               %generating time to the next reaction
      R = sum(rand >= cumsum([0 Q])/Qs);  %selecting reaction
      state = state + stoch(:,R);         %updating state
      t = t + dt;                         %updating time

      %writing the output
      while cindx<=N && t>tmesh(cindx)
         trajectory(:,cindx) = state;
         cindx = cindx+1;

and add the following lines in the main script to simulate and plot trajectory.

traj = SSA(tmesh, par,prop,stoch, init );

xlabel('Time, min')

xlabel('Time, min')

Below are the resulting plots after one stochastic realization of the whole pathway.

mRNA protein

So now we can simulate our simple pathway, but there is one problem: it takes about 3.5 seconds to generate trajectory presented in the above plot. If we want to simulate that pathway for each agent present in the system and we reach the level of 2000 agents, then simulating the pathway for each of the agents using simple for loop will take about 2 hours… What can we do to decrease computational time? We could of course use multiple CPUs – using 100 of them will allow to do the computations in about a minute. But who has access to 100 CPUs? Only few of us. What about using the vectorization option in MATLAB and modifying the SSA function to perform those 2000 simulations simultaneously? That is definitely worth trying. We need only to add additional argument to SSAv function – number of trajectories to generate, and modify few lines to do vectorized calculations:

function trajectory = SSAv(tmesh, par,prop,stoch, init, nSim )
   %tmesh - time mesh on which solution should be returned
   %par - parameters of the pathway
   %prop - definition of propensity functions
   %stoch - stochiometric matrix
   %init - initial condition for the pathway
   %nSim - number of simulations to perform

   tmesh = tmesh(:);   %reshaping mesh to be vertical vector
   t = zeros(nSim,1);  %current time for each simulation
   state = repmat(init(:)',nSim,1); %current state variable for each simulation
   trajectory = zeros(nSim,length(init),length(tmesh)); %preparing output trajectory
   trajectory(:,:,1) = state;%setting initial value as the first element in trajectory
   cindx = 2*ones(nSim,1);%current trajectory indices
   N = length(tmesh); %number of time points
   aux = 1:nSim; %

   while ~isempty(t)
      Q = feval(prop,state,par);         %calculating propensities of the reactions
      Qs = sum(Q,2);                     %total propensities
      dt = -log(rand(size(Qs,1),1))./Qs; %generating time to the next reaction
      P = bsxfun(@rdivide, cumsum([zeros(size(Qs,1),1) Q],2),Qs); %probabilities for each reaction
      R = sum(bsxfun(@ge,rand(size(Qs,1),1),P),2);                %selecting reaction
      state = state + stoch(:,R)';       %updating state
      t = t + dt;                        %updating time

     %writing the output
     update = t > tmesh(cindx);
     while any(update)
        %updating state
        iupdt = find(update);
        for i = 1:length(iupdt)
           trajectory(aux(iupdt(i)),:,cindx(iupdt(i))) = state(iupdt(i),:);
        cindx = cindx+update;

        %removing finished simulations from the variables
        indx = cindx > N;
        if any(indx)
           cindx(indx) = [];
           t(indx) = [];
           aux(indx) = [];
           state(indx,:) = [];
           if isempty(cindx)
        update = t > tmesh(cindx);

We need also to make sure that in the main script that the propensity functions are defined in the vectorized manner (everything else remains the same).

prop = @(x,par)([*ones(size(x,1),1),...

Below are the plots of trajectories generated using the vectorized function for nSim = 100.

mRNA100 protein100

Now the important question is how much time did it take to simulate those 100 trajectories. We know that using the simple non-vectorized code and a for loop it would take about 360 seconds. The vectorized function did it in 16 seconds! That is 22 fold decrease in computational time. Plot below shows how computational time changes with the number of trajectories to simulate.


Vectorized code generated 2000 trajectories in 62 seconds! So we don’t need those 100 CPUs to obtain reasonable time of computations.

I hope that this post will convince some people that MATLAB is not bad after all.


Agent-Based Modeling (ABM) of Tumors in Java

Before I begin I would like to quickly introduce myself — my name is Sameer Puri and I am a high school student working in an internship under Dr. Heiko Enderling through the HIP IMO program at Moffitt Cancer Center.

Over the past 5 weeks, I have been developing an ABM in Java for my research. My principal goal in developing this ABM was to quickly generate tumors with 1 million cells. (And prove that Java can perform better in some cases…!)

When I started, I was working with an ABM written by @jpoleszczuk and Heiko in C++ (available here). Having a strong background in Java and little past experience with C++, it was difficult for me to understand. Structs, vectors, memcpy, and other features of C++ are not present in Java. Instead of trying to learn C++, I converted the code to Java, my language of expertise.


I decided to optimize the code before making any extensions for my research. In Java, finding slow-down points is simple, since an array of profiling tools are available to find which part of the code is slowing down the ABM. I utilized the web-based profiler WarmRoast, developed by sk89q (on GitHub here).

  • One of the biggest time hogs was the Random class which was implemented in the 1990s, and after 20 years, much more efficient pseudo-random number generators have emerged. I replaced it with the XorShift generator, developed by George Marsaglia. The XorShift is theoretically 3 times faster than the default generator. However, the speed-up was far more significant — the call for a random double was so fast that it disappeared from the WarmRoast profiler’s results.
  • The returnEmptyPlace method was optimized with a strategy known as pregeneration, or exchanging memory for lower CPU time. The returnEmptyPlace method looks for an empty spot around a cell in the ABM, shuffling a list of 8 directions each time. Instead, I pregenerated all 40,320 permutations of the 8 direction vectors surrounding a lattice square. Now, the returnEmptyPlace method simply uses the speedy XorShift generator to select a random permutation of the directions list to use.
  • I also optimized the shuffling of the list of cells in the main simulation loop. I created a new class called SPCollections which uses the XorRandom generator for shuffling. There was a large speed-up here as well.
  • Using NetBeans’ built-in hashcode generator, I added hashcodes for the Cell and Indx classes, which will speed up the usage of HashMaps or other hash-based structures.
  • Since memory was not a serious concern for me, I made one optimization. The original model in C++ used the char primitive type for its lattice. I turned the lattice into a byte lattice, which has half the memory footprint (chars are 16 bits, bytes are 8 bits).


  • For my research, I had to grow 10 tumors in silico with one million cells each. To speed up simulation I utilized parallel computing with an ExecutorService. With my ABM, you can specify the number of “trials” you want to run. For each trial run, a simulator instance is created and run on a separate thread. Most CPUs today have at least 4 cores. Code in most programming languages tend to make use of only one core unless programmers specify methods that allow for parallel execution of code. By creating a separate thread for each simulation, the threads can operate on separate cores and maximize CPU usage!
  • To universalize my data, I saved it utilizing Java serialization. With serialization, Java can output objects to files, which can be loaded later on without needing to code a loader. Others will be able to easily load data generated using my ABM and run their own operations on it.
  • I coded an image generator to automatically generate tumor images. Once a simulation is finished, the image generator takes the serialized output of the simulation and generates three types of PNG images for each dumped instance: proliferation, quiescence, and nutrition. It also creates an animated GIF to show the growth of the tumor over time. Each image has the time step (1 step = 1 hour), number of stem cells and number of regular cancer cells at the upper left corner.

Proliferation during Development

Yellow = Glioma Stem Cell Reddish = few divisions ==> Black = many divisions

Nutrition during Development (on rim)

Nutrients of lattice on the outer edge is displayed. Unoccupied (0) and occupied (9) lattice spots are white

Quiescence during Development

Orange = quiescent Red = can divide/proliferate






You can download my ABM here

Parameter dependent implementations

When dealing with ordinary or partial differential equations we know that existing numerical solvers have different performance depending on the specific mathematical problem. In particular, the performance of any given solver can dramatically change if we change model parameters. Let us take the well known logistic equation as an example. Let C(t) be the current number of cancer cells, r the growth rate, and K the maximal number of cancer cells that can be sustained by the host. Then the growth dynamics can be described by the following equation:


The above equation has of course analytical solution, but let us use a MATLAB ordinary differential equations solver to find a numerical solution. Generally it is a good idea to start with the solver that has the highest order – in our case it will be ode45 based on the Runge-Kutta method. We will also consider another solver, ode15s, which is designed for solving stiff systems. We fix the value of carrying capacity K, initial condition C(0), and time interval in which we solve the equation. We will investigate the performance of both solvers for different values of growth rate r. The plots below show that for lower values of growth rate ode45 outperforms the ode15s, but situation changes when the growth rate is large (while keeping small relative difference between the calculated solutions).


That transition in solvers performance can be explained when one looks how solution changes for different values of growth rates (see figure below) – the larger the growth rate the steeper the solution. Stiff solvers, like ode15s, are designed to effectively deal with such phase transitions.


So, if we need to consider parameter dependent performance of the numerical solvers for differential equations, should we do the same for agent based models of cancer growth? Should we create parameter dependent implementations and switch between them on the fly? I think that yes, and I’ll try to convince you with the simple example below.

Let us consider a very simple CA: cell can either divide or migrate if there is a free spot – nothing else. We can adapt the very basic C++ implementation from the previous post as a baseline: we have a boolean matrix (in which value true indicates that the spot is occupied by the cell) and additional vector of cells keeping information about locations of all viable cells present in the system. At each simulation step we go through all of the cells in a vector in random order and decide about faith of each cell.

Let us tweak the code and at each iteration drop from memory (from vector) cells that are quiescent (completely surrounded). If the cell stays quiescent for prolonged time we save some computational time by skipping its neighborhood search. However, after each migration event we need to add some cells back to the vector – that is additional cost generated be searching the lattice again. The essential part of the tweaked code in C++ is:

   for (int i=0; i<nSteps; i++) {
        random_shuffle(cells.begin(), cells.end()); //shuffling cells
        while (!cells.empty()) {
            currCell=cells.back(); //pick the cell
            newSite = returnEmptyPlace(currCell);
            if (newSite) {//if there is a free spot in neigh
                if ((double)rand()/(double)RAND_MAX < pDiv) {//cell decides to proliferate
                    lattice[newSite] = true;
                } else if ((double)rand()/(double)RAND_MAX < pmig) {//cell decides to migrate
                    lattice[newSite] = true;
                } else {//cell is doing nothing
        //freeing spots after cell migration and adding cells to vector
        for(int j =0; j<toDelete.size(); j++)
            lattice[] = false;
        c = !toDelete.empty();
        while(!toDelete.empty()) {
            newSite = toDelete.back();
            //adding cells that can have space after migration event
            for(int j=0;j<8;j++) {//searching through neighborhood
                if (newSite+indcNeigh[j]>=0) {
        //removing duplicates in cells vector
        if (c) {
            sort( cells.begin(), cells.end() );
            cells.erase( unique( cells.begin(), cells.end() ), cells.end() );

Depending on the proliferation/migration rates we can imagine that the tweaked code will have different performance. For low  probability of migration event only cells on the tumor boundary will have some space (green cells on the left figure below) and we save a lot of time by dropping quiescent cells (blue cells in figures below). For higher migration rates, however, less than 30% of the cells are quiescent at each proliferation step (compare right figure below) and the benefit of dropping cells can be outweighed by the after migration update steps.
solid diffusive

The plot below shows the computational time needed to reach 500,000 cells when using the baseline code (all cells, blue line) and tweaked one (only proliferation, dashed green line) for a fixed proliferation probability. For lower proliferation rates change in the performance would occur even earlier.


I think that this simple example show that we always need to think how our model will behave before choosing the way to implement it.