Modeling of signaling pathways is an important part of cancer research, as it is essential to understand how proteins interact with each other to provide or impair a specific cell function. There are two main approaches to tackle that problem mathematically: 1) deterministic approach, in which we assume that the number of molecules is large and stochastic effects are negligible, 2) stochastic approach (typically using Gillespie algorithm), in which we model exact number of molecules and take into account inherent system noise. Today I’ll focus on the stochastic approach. You can find online plenty of packages to perform the stochastic simulations using Gillespie algorithm, but I haven’t seen any that is harnessing the vectorization in MATLAB. Today I will show that using vectorization we can increase the speed of the simulations about 100 fold. Such an increase in computational speed is very important, as more and more frequently agent based models incorporate signaling pathways on the cell level.
Let us start with basic information about the Gillespie algorithm. Using the algorithm we describe the change in the number of molecules from K different species (x1(t),…,xK(t)) that at each moment can undergo one of L different biochemical reactions. Each reaction has a specific propensity function, e.g. biding of two molecules from different species can have propensity r*x1(t)*x2(t) where r is biding rate. The propensities determine the time to next reaction (exponentially distributed) and which reaction occurs next (discrete distribution with propensity based probabilites). After a reaction occurs the system is updated according to stoichiometric matrix, i.e. matrix describing how many reactants are removed from the system and what products are created.
As a working example we will consider a very basic gene expression pathway – we will describe the number of mRNAs and corresponding proteins. Four considered reactions will be: transcription, translation, mRNA degradation, and protein degradation. Ok, let us start coding by defining considered pathway in the main script.
%DEFINING REACTION RATES par.kr = 1/10; %transcription rate par.kp = 10; %translation rate par.gr = 1/150; %mRNA degradation rate par.gp = 1/30; %protein degradation rate %DEFINING PROPENSITY FUNCTIONS AS A FUNCTION HANDLE prop = @(x,par)([par.kr,... %transcription, one mRNA molecule created par.kp*x(1),... %translation, one protein created par.gr*x(1),... %mRNA degradation, one mRNA molecule removed par.gp*x(2)]); %protein degradation, one protein molecule removed %DEFINING INITIAL CONDITION, order [mRNA, Protein] init = [0;1]; %DEFINING STOICHIOMETRIC MATRIX %column corresponds to the reaction, row corresponds to the molecule %order as in prop and init variables stoch = [1 0 -1 0;... 0 1 0 -1]; %DEFINING TIME MESH FOR THE OUTPUT TRAJECTORY tmesh = linspace(0,1000,100) ;
Now in a few lies we can code the whole Gillespie algorithm in a separate function file (without using any vectorization techniques first).
function trajectory = SSA(tmesh, par,prop,stoch, init ) %tmesh - time mesh on which solution should be returned %per - parameters of the pathway %prop - definition of propensity functions %stoch - stochiometric matrix %init - initial condition for the pathway t = 0; %current time state = init(:); %variable with current system state trajectory = zeros(length(init),length(tmesh)); %preparing output trajectory trajectory(:,1) = init(:); %setting initial value as the first element in trajectory cindx = 2; %current trajectory index N = length(tmesh); %number of time points while t<tmesh(end) Q = feval(prop,state,par); %calculating propensities of the reactions Qs = sum(Q); %total propensity dt = -log(rand())/Qs; %generating time to the next reaction R = sum(rand >= cumsum([0 Q])/Qs); %selecting reaction state = state + stoch(:,R); %updating state t = t + dt; %updating time %writing the output while cindx<=N && t>tmesh(cindx) trajectory(:,cindx) = state; cindx = cindx+1; end end end
and add the following lines in the main script to simulate and plot trajectory.
%simulating traj = SSA(tmesh, par,prop,stoch, init ); %plotting figure(1) plot(tmesh,traj(1,:)) xlabel('Time, min') ylabel('#mRNAs') figure(2) plot(tmesh,traj(2,:),'r') xlabel('Time, min') ylabel('#Proteins')
Below are the resulting plots after one stochastic realization of the whole pathway.
So now we can simulate our simple pathway, but there is one problem: it takes about 3.5 seconds to generate trajectory presented in the above plot. If we want to simulate that pathway for each agent present in the system and we reach the level of 2000 agents, then simulating the pathway for each of the agents using simple for loop will take about 2 hours… What can we do to decrease computational time? We could of course use multiple CPUs – using 100 of them will allow to do the computations in about a minute. But who has access to 100 CPUs? Only few of us. What about using the vectorization option in MATLAB and modifying the SSA function to perform those 2000 simulations simultaneously? That is definitely worth trying. We need only to add additional argument to SSAv function – number of trajectories to generate, and modify few lines to do vectorized calculations:
function trajectory = SSAv(tmesh, par,prop,stoch, init, nSim ) %tmesh - time mesh on which solution should be returned %par - parameters of the pathway %prop - definition of propensity functions %stoch - stochiometric matrix %init - initial condition for the pathway %nSim - number of simulations to perform tmesh = tmesh(:); %reshaping mesh to be vertical vector t = zeros(nSim,1); %current time for each simulation state = repmat(init(:)',nSim,1); %current state variable for each simulation trajectory = zeros(nSim,length(init),length(tmesh)); %preparing output trajectory trajectory(:,:,1) = state;%setting initial value as the first element in trajectory cindx = 2*ones(nSim,1);%current trajectory indices N = length(tmesh); %number of time points aux = 1:nSim; % while ~isempty(t) Q = feval(prop,state,par); %calculating propensities of the reactions Qs = sum(Q,2); %total propensities dt = -log(rand(size(Qs,1),1))./Qs; %generating time to the next reaction P = bsxfun(@rdivide, cumsum([zeros(size(Qs,1),1) Q],2),Qs); %probabilities for each reaction R = sum(bsxfun(@ge,rand(size(Qs,1),1),P),2); %selecting reaction state = state + stoch(:,R)'; %updating state t = t + dt; %updating time %writing the output update = t > tmesh(cindx); while any(update) %updating state iupdt = find(update); for i = 1:length(iupdt) trajectory(aux(iupdt(i)),:,cindx(iupdt(i))) = state(iupdt(i),:); end cindx = cindx+update; %removing finished simulations from the variables indx = cindx > N; if any(indx) cindx(indx) = []; t(indx) = []; aux(indx) = []; state(indx,:) = []; if isempty(cindx) break; end end update = t > tmesh(cindx); end end end
We need also to make sure that in the main script that the propensity functions are defined in the vectorized manner (everything else remains the same).
%DEFINING PROPENSITY FUNCTIONS AS A FUNCTION HANDLE IN VECTORIZED MANNER prop = @(x,par)([par.kr*ones(size(x,1),1),... par.kp*x(:,1),... par.gr*x(:,1),... par.gp*x(:,2)]);
Below are the plots of trajectories generated using the vectorized function for nSim = 100.
Now the important question is how much time did it take to simulate those 100 trajectories. We know that using the simple non-vectorized code and a for loop it would take about 360 seconds. The vectorized function did it in 16 seconds! That is 22 fold decrease in computational time. Plot below shows how computational time changes with the number of trajectories to simulate.
Vectorized code generated 2000 trajectories in 62 seconds! So we don’t need those 100 CPUs to obtain reasonable time of computations.
I hope that this post will convince some people that MATLAB is not bad after all.