# Gillespie algorithm for stochastic simulations of signaling pathways – vectorization in MATLAB

Modeling of signaling pathways is an important part of cancer research, as it is essential to understand how proteins interact with each other to provide or impair a specific cell function. There are two main approaches to tackle that problem mathematically: 1) deterministic approach, in which we assume that the number of molecules is large and stochastic effects are negligible, 2) stochastic approach (typically using Gillespie algorithm), in which we model exact number of molecules and take into account inherent system noise. Today I’ll focus on the stochastic approach. You can find online plenty of packages to perform the stochastic simulations using Gillespie algorithm, but I haven’t seen any that is harnessing the vectorization in MATLAB. Today I will show that using vectorization we can increase the speed of the simulations about 100 fold. Such an increase in computational speed is very important, as more and more frequently agent based models incorporate signaling pathways on the cell level.

Let us start with basic information about the Gillespie algorithm. Using the algorithm we describe the change in the number of molecules from K different species (x1(t),…,xK(t)) that at each moment can undergo one of L different biochemical reactions. Each reaction has a specific propensity function, e.g. biding of two molecules from different species can have propensity r*x1(t)*x2(t) where r is biding rate. The propensities determine the time to next reaction (exponentially distributed) and which reaction occurs next (discrete distribution with propensity based probabilites). After a reaction occurs the system is updated according to stoichiometric matrix, i.e. matrix describing how many reactants are removed from the system and what products are created.

As a working example we will consider a very basic gene expression pathway – we will describe the number of mRNAs and corresponding proteins. Four considered reactions will be: transcription, translation, mRNA degradation, and protein degradation. Ok, let us start coding by defining considered pathway in the main script.

```%DEFINING REACTION RATES
par.kr = 1/10;     %transcription rate
par.kp = 10;   %translation rate
par.gr = 1/150; %mRNA degradation rate
par.gp = 1/30;  %protein degradation rate

%DEFINING PROPENSITY FUNCTIONS AS A FUNCTION HANDLE
prop = @(x,par)([par.kr,...      %transcription, one mRNA molecule created
par.kp*x(1),... %translation, one protein created
par.gr*x(1),... %mRNA degradation, one mRNA molecule removed
par.gp*x(2)]);  %protein degradation, one protein molecule removed

%DEFINING INITIAL CONDITION, order [mRNA, Protein]
init = [0;1];

%DEFINING STOICHIOMETRIC MATRIX
%column corresponds to the reaction, row corresponds to the molecule
%order as in prop and init variables
stoch = [1 0 -1 0;... 0 1 0 -1];

%DEFINING TIME MESH FOR THE OUTPUT TRAJECTORY
tmesh = linspace(0,1000,100) ;
```

Now in a few lies we can code the whole Gillespie algorithm in a separate function file (without using any vectorization techniques first).

```function trajectory = SSA(tmesh, par,prop,stoch, init )
%tmesh - time mesh on which solution should be returned
%per - parameters of the pathway
%prop - definition of propensity functions
%stoch - stochiometric matrix
%init - initial condition for the pathway

t = 0;                                          %current time
state = init(:);                                %variable with current system state
trajectory = zeros(length(init),length(tmesh)); %preparing output trajectory
trajectory(:,1) = init(:);                      %setting initial value as the first element in trajectory
cindx = 2;                                      %current trajectory index
N = length(tmesh);                              %number of time points

while t<tmesh(end)
Q = feval(prop,state,par);          %calculating propensities of the reactions
Qs = sum(Q);                        %total propensity
dt = -log(rand())/Qs;               %generating time to the next reaction
R = sum(rand >= cumsum([0 Q])/Qs);  %selecting reaction
state = state + stoch(:,R);         %updating state
t = t + dt;                         %updating time

%writing the output
while cindx<=N && t>tmesh(cindx)
trajectory(:,cindx) = state;
cindx = cindx+1;
end
end
end
```

and add the following lines in the main script to simulate and plot trajectory.

```%simulating
traj = SSA(tmesh, par,prop,stoch, init );

%plotting
figure(1)
plot(tmesh,traj(1,:))
xlabel('Time, min')
ylabel('#mRNAs')

figure(2)
plot(tmesh,traj(2,:),'r')
xlabel('Time, min')
ylabel('#Proteins')
```

Below are the resulting plots after one stochastic realization of the whole pathway.

So now we can simulate our simple pathway, but there is one problem: it takes about 3.5 seconds to generate trajectory presented in the above plot. If we want to simulate that pathway for each agent present in the system and we reach the level of 2000 agents, then simulating the pathway for each of the agents using simple for loop will take about 2 hours… What can we do to decrease computational time? We could of course use multiple CPUs – using 100 of them will allow to do the computations in about a minute. But who has access to 100 CPUs? Only few of us. What about using the vectorization option in MATLAB and modifying the SSA function to perform those 2000 simulations simultaneously? That is definitely worth trying. We need only to add additional argument to SSAv function – number of trajectories to generate, and modify few lines to do vectorized calculations:

```function trajectory = SSAv(tmesh, par,prop,stoch, init, nSim )
%tmesh - time mesh on which solution should be returned
%par - parameters of the pathway
%prop - definition of propensity functions
%stoch - stochiometric matrix
%init - initial condition for the pathway
%nSim - number of simulations to perform

tmesh = tmesh(:);   %reshaping mesh to be vertical vector
t = zeros(nSim,1);  %current time for each simulation
state = repmat(init(:)',nSim,1); %current state variable for each simulation
trajectory = zeros(nSim,length(init),length(tmesh)); %preparing output trajectory
trajectory(:,:,1) = state;%setting initial value as the first element in trajectory
cindx = 2*ones(nSim,1);%current trajectory indices
N = length(tmesh); %number of time points
aux = 1:nSim; %

while ~isempty(t)
Q = feval(prop,state,par);         %calculating propensities of the reactions
Qs = sum(Q,2);                     %total propensities
dt = -log(rand(size(Qs,1),1))./Qs; %generating time to the next reaction
P = bsxfun(@rdivide, cumsum([zeros(size(Qs,1),1) Q],2),Qs); %probabilities for each reaction
R = sum(bsxfun(@ge,rand(size(Qs,1),1),P),2);                %selecting reaction
state = state + stoch(:,R)';       %updating state
t = t + dt;                        %updating time

%writing the output
update = t > tmesh(cindx);
while any(update)
%updating state
iupdt = find(update);
for i = 1:length(iupdt)
trajectory(aux(iupdt(i)),:,cindx(iupdt(i))) = state(iupdt(i),:);
end
cindx = cindx+update;

%removing finished simulations from the variables
indx = cindx > N;
if any(indx)
cindx(indx) = [];
t(indx) = [];
aux(indx) = [];
state(indx,:) = [];
if isempty(cindx)
break;
end
end
update = t > tmesh(cindx);
end
end
end
```

We need also to make sure that in the main script that the propensity functions are defined in the vectorized manner (everything else remains the same).

```%DEFINING PROPENSITY FUNCTIONS AS A FUNCTION HANDLE IN VECTORIZED MANNER
prop = @(x,par)([par.kr*ones(size(x,1),1),...
par.kp*x(:,1),...
par.gr*x(:,1),...
par.gp*x(:,2)]);
```

Below are the plots of trajectories generated using the vectorized function for nSim = 100.

Now the important question is how much time did it take to simulate those 100 trajectories. We know that using the simple non-vectorized code and a for loop it would take about 360 seconds. The vectorized function did it in 16 seconds! That is 22 fold decrease in computational time. Plot below shows how computational time changes with the number of trajectories to simulate. Vectorized code generated 2000 trajectories in 62 seconds! So we don’t need those 100 CPUs to obtain reasonable time of computations.

I hope that this post will convince some people that MATLAB is not bad after all.