Java, C++, MATLAB, Julia or Python for cellular automatons? Speed comparison.

Thanks to @spuri4096 and @chandlergatenbee, the basic cancer stem cell driven model of tumor growth that I implemented in the very first Compute Cancer blog is now programmed in 4 languages:

  1. C++ (post with the code here)
  2. MATLAB (post with the code here)
  3. Java (post with the code here)
  4. Python (post with the code here)
  5. Octave – the same code as for MATLAB. Works without any modifications.
  6. Julia (post with the code here)

Isn’t that the perfect opportunity to see which one is the fastest?

First of all, codes are not completely identical. There are small differences between C++ and MATLAB codes in how the death of a cell is updated, but that shouldn’t affect the comparison too much. The code in Java is much more sophisticated – it uses coded lattice concept and dynamically expanding lattice. Python code (as much as I can see…) does pretty much the same job as the C++ code, so that is the fairest comparison out of all.

I will compare the time needed to simulate cancer from a single cancer stem cell to 100, 250 and 500 thousand of cells. The domain size is set to 1000×1000 grid points (not for Java code, in which lattice expands on demand), so 1 million is the maximal cell number. Of course all simulation parameters are also the same for all codes: probability of symmetric division (ps) = 0.3; proliferation probability (pdiv) = 1/24; migration probability = 15/24; and probability of spontaneous death (alpha) = 0.05. Because of the stochastic nature of the model I will run 20 simulations for each code and report the average time (+- standard deviation).

The results are summarized on the plot below.


As probably everyone expected C++ outcompeted other codes.
Java also did great.
What might surprise some people, MATLAB didn’t perform that bad (it doesn’t suck that bad as people usually think).
The speed of Octave was a surprise for me – I had patience only to run simulations for 100K cell limit…

Of course in each case there is definitely a room for improvement using even more sophisticated language specific programming tricks.

However, from my perspective – if you don’t want to spend too much time on learning programing language, MATLAB (Octave) might be not such a bad idea.

Tumor growth under angiogenic signaling – data fitting (part 1)

In todays post I will show how to fit the model to the experimental data using MATLAB with Optimization Toolbox. It will be one of the many posts in which I will try to illustrate problems with parameter estimation. Let’s start with introducing the model.

Hahnfeldt and colleagues proposed a mathematical model of the concept that tumor growth and host blood vessel support is bidirectionally modulated (Hahnfeldt et al., Cancer Research, 1999). Tumor volume (V) and effective vascular support (K) that defines the tumor carrying capacity are time-dependent variables described by a set of coupled ordinary differential equations (ODEs). Tumor growth is assumed to be governed by the Gompertz law:


With constant effective vascular support K=Kmax, initial rapid tumor growth is followed by a slowdown as the tumor volume approaches carrying capacity Kmax. To account for reciprocal interaction of the tumor with the host vasculature, carrying capacity through vascular support can be described by a variable K modulated by the tumor V:


where -\lambda_2K represents spontaneous loss of functional vasculature, bV represents vessels growth stimulation due to factors secreted by the tumor proportionally to its size, and dKV^{2/3} describes endogenous inhibition of previously generated vasculature due to factors secreted by the tumor proportionally to the tumor surface-to-volume ratio.

We will try to see how well the model works on the actual experimental data (not the one from the paper). Quick Google image search using the term “mouse bevacizumab” (bavacizumab is antiangiogenic drug) returned a figure from the paper by Tsukihara an others (Tsukihara et al., Anticancer Research, 2015) in which we have a plot of relative tumor volume in time in the presence or absence of bevacizumab. We will use that figure to grab the experimental data. There is a very good tool available on MATLAB Central called “grabit” that allows to easily collect the data from the plot (screenshot below).


After calibrating the the axis, we can grab the average values and the standard deviation (for each point first grab the average then the top of the standard deviation bar, skip the point at day 0) first for the control experiment and than for Bevacizumab alone. After that it is enough to save the data into the file e.g. measurements.mat. First we need to write function that will read saved file and restructure it to the format that we want to work on.

function data = processMeasurements( file )
    %load the file with the measurements
    tmp = load(file); 
    %x values are in the first column and we need to take 
    %every second point (we have taken average and standard deviation at the same point).
    %We also round the values to the full days
    data.t = round(tmp.measurements(1:2:end/2,1))'; 
    %now we grab the averages. In first row we will have data for control
    %and in the second row for bevacizumab
    data.average = reshape(tmp.measurements(1:2:end,2),[],2)';
    %now we grab the standard deviations. In first row we will have data for control
    %and in the second row for bevacizumab
    data.std = reshape(tmp.measurements(2:2:end,2)-tmp.measurements(1:2:end,2),[],2)';

To make sure that everything is structured properly we can write the function that will plot grabbed data,

function plotData(data)
    hold on
    errorbar(data.t,data.average(1,:), data.std(1,:),'Color','r','LineStyle','none','Marker','o');
    errorbar(data.t,data.average(2,:), data.std(2,:),'Color','b','LineStyle','none','Marker','s');
    hold off
    ylabel('RTV (mean and std)');

which should result in the following plot


Now we can start working on the code that will fit the Hahnfeldt model to the data. In the part 1 of the post we will focus only on fitting to the control data – fitting both curves will be covered in part 2. The main script that we will run is the following

%read and restructure experimental data
data = processMeasurements('measurements.mat');

%set initial values for parameters and initial conditions
params = initializeParams();

%set which params should be estimated
%names need to be consistent with the params structure!
%in most of the studies based on Hahnfeldt model lambda2 is assumed to be 0
%we will do the same and won't fit its value
paramsToFit = {'lambda1','b','d'};

%perform data fitting
[params, err] = fitParameters(params,data,paramsToFit);

%solve the model for new parameters
sol = solveModel(data.t(end),params);

%plot data together with solution
plotDataAndSol(data, sol);

We have already implemented processMeasurements function, so we can write down the initializeParams function.

function params = initializeParams()

    %model parameters (we take the values from Hahnfeldt et al. paper)
    params.lambda1 = 0.192;
    params.lambda2 = 0;
    params.b = 5.85;
    params.d = 0.00873;
    %in the experimental paper implanted tumors had about 8 cubic milimiters
    params.V0 = 8;
    %initial tumor is not visualized. It is typically assumed that avascular tumor can grow to about 1 mm in radius
    params.K0 = 4/3*pi;

Now we need to code the fitParameters function. We will use the lsqnonlin function that is provided in Optimization Toolbox, as it handles by default constrains on the parameters values (we assume that all parameters are non-negative).

function [params, err] = fitParameters(params,data,paramsToFit)

    %preparing initial parameters values
    x0 = zeros(size(paramsToFit));
    for i = 1:length(paramsToFit)
        x0(i) = params.(paramsToFit{i});
    opt = optimset('Display','iter');
    %Fmin is the function that returns the fit error for a given set of parameters
    %it will need data, params and paramsToFit variables so we pass them
    %as additional arguments to the lsqnonlin function 
    %(they will be passed forward to Fmin)
    flag = 0;
    while flag<1 %repeat until solver reached tolerance or converged
        opt = optimset('Display','iter');
        [x, fval, ~, flag] = lsqnonlin(@Fmin,x0,zeros(size(x0,[],opt,data,params,paramsToFit);
        x0 = x;
    %rewritng x (vector of fitted values) back into params structure
    for i = 1:length(paramsToFit)
        params.(paramsToFit{i}) = x(i);

The Fmin function will calculate the sum of squared differences between the model solution and the experimental data.

function err = Fmin( x, data, params,paramsToFit )

    %rewritng x into params structure
    for i = 1:length(paramsToFit)
        params.(paramsToFit{i}) = x(i);
        %we solve the model
        sol = solveModel([0 data.t],params);
        %we calculate relative tumor volume at points of measurements
        RTV = sol.y(1,2:end)./sol.y(1,1);
        %we calculate the error
        err = RTV - data.average(1,:);


Now we need to write the solveModel function that returns model solution for given set of parameters and initial conditions.

function sol = solveModel( T, params )
    %we assume that initial conditions are in the params structure

    %we use MATLAB built-in ODE solver to calculate the solution
    sol = ode45(@ODE,[0 T(end)],[params.V0 params.K0]);

    %if user provided time mesh instead of only final point
    %we evaluate the solution on that mesh
    if length(T)>1
        sol.y = deval(sol,T);
        sol.x = T;

    %model definition
    function dy = ODE(~,y)
       dy = zeros(2,1);
       dy(1) = -params.lambda1*y(1)*log(y(1)/y(2));
       dy(2) = -params.lambda2*y(2)+params.b*y(1)-params.d*y(2)*y(1)^(2/3);


Finally, the function that will plot the solution and the data on the same plot.

function plotDataAndSol(data,sol)

    hold on
    errorbar(data.t,data.average(1,:), data.std(1,:),'Color','r','LineStyle','none','Marker','o');
    hold off
    ylabel('RTV (mean and std)');


Now we are ready to run the main script. This results in the following plot, which shows nice correspondence of the model solution to the experimental data,


with estimated parameters values \lambda_1 = 0.0785, b =1509.7 , d = 13.95. It seems that there is nothing more to do – we estimated the parameters and nicely fitted model to the experimental data. However…

Let’s assume that we didn’t grab the data from the plot – we don’t consider that source of the noise. We all know, however, that there is uncertainty in experimental measurements, which is hard to estimate. Let us perform the same fitting procedure, but for the data that is perturbed by up to 1% (each point, uniformly) – quite a small measurement error!

Plotting the estimated values of parameters (see below) shows that even such a small error introduces huge variability with estimated parameters values. Parameter b has the values in the range from 384 to 1720 and d in the range from 3 to 11. How certain can we be with our parameters estimations then? That is quite a simple model and “clean” experimental data…


Quick implementation of hexagonal lattice for ABM

In the agent based modeling we are typically more interested in the rules governing the cell fate rather than the basic setting of the lattice (if we don’t look at the off lattice model). However, the particular setting of the computational domain can have an effect on the model dynamics. In 2D we typically consider the following lattices and neighborhoods:


with the rectangular grid with Moore neighborhood being probably most frequently utilized. With von Neumann neighborhood we have the fewest number of neighbors and cells are spatially saturated earlier. The problem with Moore neighborhood is that the distance to all of the sites is not the same. Hexagonal grid has good properties (same distance, 6 neighbors), but is less frequently utilized, because implementation is more involved. In todays post I will show that essentially there is no difference in implementation between all of those lattices.

We will use the codes for basic ABM model posted before as a template (MATLAB version here and C++ version here). In both cases we considered rectangular lattice with Moore neighborhood. If we set migration probability to zero, set large value of proliferation capacity and set the spontaneous death rate to zero, i.e. we simulate essentially only division events, we will see in visualizations of simulated tumors what kind of neighborhood we assumed:


Let us see how the visualization looks like for different types of neighborhoods/lattices.

In both implementations we had a separate array defining the neighborhood of the cell. Thus, in order to modify the code to von Neumann neighborhood we need to change only two consecutive lines in the MATLAB code in which we define the neighborhood and the permutations table:

aux = int32([-N -1 1 N])'; %indices to heighborhood
Pms = perms(uint8(1:4))'; %permutations

In C++ implementation we need to modify the definition of the neighborhood:

static const int indcNeigh[] = {-N, -1, 1, N};//neighborhood

and the for loop in the returnEmptyPlace function:

for(int j=0;j<4;j++) {//searching through neighborhood
        if (!lattice[indx+indcNeigh[j]]) {
            neigh[nF] = indx+indcNeigh[j];

As it could be expected, switching to von Neumann neighborhood made the simulated tumor diamond shaped.


Let us now consider hexagonal lattice. The very first observation that we need to make is that hexagonal lattice is essentially an rectangular lattice with shifted odd (or even) rows and two definitions of neighborhood (one for cells in odd rows, second for cells in even rows), see picture below


Thus, in the modified implementation we again only need to change the definition of the neighborhood and add the condition for choosing the proper one. In MATLAB it can be achieved by introducing two definitions of neighborhoods

auxR = int32([-N -1 1 N-1 N N+1])'; %indices to heighborhood
auxL = int32([-N -1 1 -N-1 N -N+1])'; %indices to heighborhood

Pms = perms(uint8(1:6))'; %permutations

and modify the lines in the main loop in which we create the neighborhood for all viable cells (create variable S)

    odd = mod(cells,2) ~= 0; %selecting cells in the odd rows
    S = zeros(6,length(cells));
    if any(odd) %creating neighborhood for the cells in the odd rows
        SR = bsxfun(@plus,cells(odd),auxR(Pms(:,randi(nP,1,sum(odd)))));
        S(:,odd) = SR;
    even = ~odd;
    if any(even) %creating neighborhood for the cells in the even rows
        SL = bsxfun(@plus,cells(even),auxL(Pms(:,randi(nP,1,sum(even)))));
        S(:,even) = SL;

In C++ the changes are even easier. We again introduce two neighborhoods:

static const int indcNeighR[] = {-N,-1,1,N-1,N,N+1};//neighborhood
static const int indcNeighL[] = {-N,-1,1,-N-1,N,-N+1};//neighborhood

and modify the returnEmptyPlace function

for(int j=0;j<6;j++) {//searching through neighborhood
        if (indx % 2) {//if odd row
            if (!lattice[indx+indcNeighR[j]]) {
                neigh[nF] = indx+indcNeighR[j];
        } else {//if even row
            if (!lattice[indx+indcNeighL[j]]) {
                neigh[nF] = indx+indcNeighL[j];

We might also want to change the code for visualization. In the previous settings it was enough to represent each cell as a pixel in the image. In case of hexagonal lattice in order to be correct we need to somehow shift the pixels in odd (or even rows). The easiest way to achieve that is to represent each cell as 4 pixels and adjust position according to the row number:


The modified visualization code in MATLAB is the following

function visualizeHex( N, cells, cellsIsStem, cellsPmax, pmax )

    %select cells in the odd rows
    odd = mod(cells,2) ~= 0;

    M = ones(2*N,2*N,3); %matrix for image, we expand it
    %disperse cell by one spot
    i = mod(double(cells)-1,N)+1; %row, REMEMBER THAT CELLS ARE UINTs!
    j = ceil(double(cells)/N); %column
    cells = (i*2-1)+(2*j-1)*2*N;
    %add cells to the top right in odd rows
    cells = [cells reshape(bsxfun(@plus,cells(odd),[-1; 2*N-1; 2*N]),1,[])];
    cellsIsStem = [cellsIsStem reshape(repmat(cellsIsStem(odd),3,1),1,[])];
    cellsPmax = [cellsPmax reshape(repmat(cellsPmax(odd),3,1),1,[])];
    %add cells to top left in even rows
    even = [~odd false(1,3*sum(odd))];
    cells = [cells reshape(bsxfun(@plus,cells(even),[-1; -2*N-1; -2*N]),1,[])];
    cellsIsStem = [cellsIsStem reshape(repmat(cellsIsStem(even),3,1),1,[])];
    cellsPmax = [cellsPmax reshape(repmat(cellsPmax(even),3,1),1,[])];
    color = hot(3*pmax);
    M(cells(~cellsIsStem)) = color(cellsPmax(~cellsIsStem)+1,1);
    M(cells(~cellsIsStem)+4*N*N) = color(cellsPmax(~cellsIsStem)+1,2);
    M(cells(~cellsIsStem)+8*N*N) = color(cellsPmax(~cellsIsStem)+1,3);

    CSCs = cells(cellsIsStem);
    M(CSCs) = color(2*pmax,1);
    M(CSCs+4*N*N) = color(2*pmax,2);
    M(CSCs+8*N*N) = color(2*pmax,3);


Finally, simulating the tumor with exactly the same parameters settings on the hexagonal lattice results in way more circular tumor.


Gillespie algorithm for stochastic simulations of signaling pathways – vectorization in MATLAB

Modeling of signaling pathways is an important part of cancer research, as it is essential to understand how proteins interact with each other to provide or impair a specific cell function. There are two main approaches to tackle that problem mathematically: 1) deterministic approach, in which we assume that the number of molecules is large and stochastic effects are negligible, 2) stochastic approach (typically using Gillespie algorithm), in which we model exact number of molecules and take into account inherent system noise. Today I’ll focus on the stochastic approach. You can find online plenty of packages to perform the stochastic simulations using Gillespie algorithm, but I haven’t seen any that is harnessing the vectorization in MATLAB. Today I will show that using vectorization we can increase the speed of the simulations about 100 fold. Such an increase in computational speed is very important, as more and more frequently agent based models incorporate signaling pathways on the cell level.

Let us start with basic information about the Gillespie algorithm. Using the algorithm we describe the change in the number of molecules from K different species (x1(t),…,xK(t)) that at each moment can undergo one of L different biochemical reactions. Each reaction has a specific propensity function, e.g. biding of two molecules from different species can have propensity r*x1(t)*x2(t) where r is biding rate. The propensities determine the time to next reaction (exponentially distributed) and which reaction occurs next (discrete distribution with propensity based probabilites). After a reaction occurs the system is updated according to stoichiometric matrix, i.e. matrix describing how many reactants are removed from the system and what products are created.

As a working example we will consider a very basic gene expression pathway – we will describe the number of mRNAs and corresponding proteins. Four considered reactions will be: transcription, translation, mRNA degradation, and protein degradation. Ok, let us start coding by defining considered pathway in the main script.

%DEFINING REACTION RATES = 1/10;     %transcription rate = 10;   %translation rate = 1/150; %mRNA degradation rate = 1/30;  %protein degradation rate

prop = @(x,par)([,...      %transcription, one mRNA molecule created
       *x(1),... %translation, one protein created
       *x(1),... %mRNA degradation, one mRNA molecule removed
       *x(2)]);  %protein degradation, one protein molecule removed

init = [0;1];

%column corresponds to the reaction, row corresponds to the molecule
%order as in prop and init variables
stoch = [1 0 -1 0;... 0 1 0 -1];

tmesh = linspace(0,1000,100) ;

Now in a few lies we can code the whole Gillespie algorithm in a separate function file (without using any vectorization techniques first).

function trajectory = SSA(tmesh, par,prop,stoch, init )
   %tmesh - time mesh on which solution should be returned
   %per - parameters of the pathway
   %prop - definition of propensity functions
   %stoch - stochiometric matrix
   %init - initial condition for the pathway

   t = 0;                                          %current time
   state = init(:);                                %variable with current system state
   trajectory = zeros(length(init),length(tmesh)); %preparing output trajectory
   trajectory(:,1) = init(:);                      %setting initial value as the first element in trajectory
   cindx = 2;                                      %current trajectory index
   N = length(tmesh);                              %number of time points

   while t<tmesh(end)
      Q = feval(prop,state,par);          %calculating propensities of the reactions
      Qs = sum(Q);                        %total propensity
      dt = -log(rand())/Qs;               %generating time to the next reaction
      R = sum(rand >= cumsum([0 Q])/Qs);  %selecting reaction
      state = state + stoch(:,R);         %updating state
      t = t + dt;                         %updating time

      %writing the output
      while cindx<=N && t>tmesh(cindx)
         trajectory(:,cindx) = state;
         cindx = cindx+1;

and add the following lines in the main script to simulate and plot trajectory.

traj = SSA(tmesh, par,prop,stoch, init );

xlabel('Time, min')

xlabel('Time, min')

Below are the resulting plots after one stochastic realization of the whole pathway.

mRNA protein

So now we can simulate our simple pathway, but there is one problem: it takes about 3.5 seconds to generate trajectory presented in the above plot. If we want to simulate that pathway for each agent present in the system and we reach the level of 2000 agents, then simulating the pathway for each of the agents using simple for loop will take about 2 hours… What can we do to decrease computational time? We could of course use multiple CPUs – using 100 of them will allow to do the computations in about a minute. But who has access to 100 CPUs? Only few of us. What about using the vectorization option in MATLAB and modifying the SSA function to perform those 2000 simulations simultaneously? That is definitely worth trying. We need only to add additional argument to SSAv function – number of trajectories to generate, and modify few lines to do vectorized calculations:

function trajectory = SSAv(tmesh, par,prop,stoch, init, nSim )
   %tmesh - time mesh on which solution should be returned
   %par - parameters of the pathway
   %prop - definition of propensity functions
   %stoch - stochiometric matrix
   %init - initial condition for the pathway
   %nSim - number of simulations to perform

   tmesh = tmesh(:);   %reshaping mesh to be vertical vector
   t = zeros(nSim,1);  %current time for each simulation
   state = repmat(init(:)',nSim,1); %current state variable for each simulation
   trajectory = zeros(nSim,length(init),length(tmesh)); %preparing output trajectory
   trajectory(:,:,1) = state;%setting initial value as the first element in trajectory
   cindx = 2*ones(nSim,1);%current trajectory indices
   N = length(tmesh); %number of time points
   aux = 1:nSim; %

   while ~isempty(t)
      Q = feval(prop,state,par);         %calculating propensities of the reactions
      Qs = sum(Q,2);                     %total propensities
      dt = -log(rand(size(Qs,1),1))./Qs; %generating time to the next reaction
      P = bsxfun(@rdivide, cumsum([zeros(size(Qs,1),1) Q],2),Qs); %probabilities for each reaction
      R = sum(bsxfun(@ge,rand(size(Qs,1),1),P),2);                %selecting reaction
      state = state + stoch(:,R)';       %updating state
      t = t + dt;                        %updating time

     %writing the output
     update = t > tmesh(cindx);
     while any(update)
        %updating state
        iupdt = find(update);
        for i = 1:length(iupdt)
           trajectory(aux(iupdt(i)),:,cindx(iupdt(i))) = state(iupdt(i),:);
        cindx = cindx+update;

        %removing finished simulations from the variables
        indx = cindx > N;
        if any(indx)
           cindx(indx) = [];
           t(indx) = [];
           aux(indx) = [];
           state(indx,:) = [];
           if isempty(cindx)
        update = t > tmesh(cindx);

We need also to make sure that in the main script that the propensity functions are defined in the vectorized manner (everything else remains the same).

prop = @(x,par)([*ones(size(x,1),1),...

Below are the plots of trajectories generated using the vectorized function for nSim = 100.

mRNA100 protein100

Now the important question is how much time did it take to simulate those 100 trajectories. We know that using the simple non-vectorized code and a for loop it would take about 360 seconds. The vectorized function did it in 16 seconds! That is 22 fold decrease in computational time. Plot below shows how computational time changes with the number of trajectories to simulate.


Vectorized code generated 2000 trajectories in 62 seconds! So we don’t need those 100 CPUs to obtain reasonable time of computations.

I hope that this post will convince some people that MATLAB is not bad after all.

Quick parallel implementation of local sensitivity analysis procedure for agent-based tumor growth model

In the last couple of posts I’ve shown how to implement agent-based model of cancer stem cell driven tumor growth, both in MATLAB and C++. Having the implementations we can go one step further and perform some analysis of the tumor growth characteristics predicted by the model. We will start with performing local sensitivity analysis, i.e. we will try to answer the question how the perturbation of parameter values impacts the growth dynamics. Typically it is being done by perturbing one of the parameters by a fixed amount and analyzing the response of the model to that change. In our case the response will be the percentage change in average tumor size after 3 months of growth. Sounds fairly simple, but…

We have 5 different parameters in the model: proliferation capacity, probability of division, probability of spontaneous death, probability of symmetric division, and probability of migration. Moreover, let us assume that we would like to investigate 3 different values of parameter perturbation magnitude (5%, 10% and 20%). In order to be able to analyze the change in the average size we need to have its decent estimator, i.e. we need sufficient number of simulated stochastic trajectories – let us assume that 100 simulations is enough to have a good estimation of “true” average. So in order to perform the sensitivity analysis we will need to perform 100 + 3*5*100 = 1600 simulations (remember about the growth for nominal parameters values). Even if single simulation takes typically 30 seconds, then we will wait more than 13 hours to obtain the result using single CPU – that is a lot!

After looking at the above numbers we can make a straightforward decision right now – we will use C++ instead of MATLAB, because the model implementation in C++ is a several times faster. However 1) we will need to write a lot of a code in order to perform sensitivity analysis, 2) using multiple CPU is not that straightforward as in MATLAB. Is there a better way to proceed?

Few weeks ago I’ve shown here how to wrap your C++ in order to use it from within MATLAB as a function without loosing the C++ performance. Why not to use it to make our lives easier by making sensitivity code short and harness easily the power of multiple CPUs?

We will start the coding (in MATLAB) with setting all simulation parameters

nSim = 100; %number of simulations to perform for a given set of parameters
tmax = 30*3; %number of days to simulate
N = 1000; %size of the simulation domain

%nominal parameter values [rhomax, pdiv, alpha, ps, pmig]
nominal = [10, 1/24, 0.05, 0.3,10/24];

perturb = [0.05 0.1 0.2]; %perturbation magnitudes, percent of initial value up

Now we just need to construct a loops that will iterate through all possible perturbations and simulations. If we don’t use Parallel Toolbox, i.e. don’t use multiple CPUs, it doesn’t really matter how we will do that – performance will be similar. Otherwise, implementation is not that straightforward even though from MATLABs documentation it seems that it is enough to change for to parfor. The most important thing is how will we divide the work between the CPUs. The simples idea is to spread the considered perturbations values among the CPUs – that will allow to use 15 CPUs in our setting. However, I’ve got machine with 24 CPUs, so that would be a waste of resources – bad idea. The other idea would be to use parfor loop to spread all 100 simulations for a given perturbation value on all 24 CPUs and go through all perturbation values in a simple loop – now we are using all available CPUs. But are we doing that efficiently? No. The thing is that CPUs need to be synchronized before proceeding to the next iteration of the loop going through perturbation values. So some of the CPUs will be idle while waiting for other ones to finish with parfor loop. In order to make the code even more efficient we will just use one parfor loop and throw all 1600 simulation on 24 CPUs at the same time. Let us first prepare the output variable.

HTCG = zeros(1,nSim + length(nominal)*length(perturb)*nSim);

Before writing the final piece of the code we need to solve one more issue. Namely, in the C++ implementation we used srand(time(NULL)) to initiate the seed for random number generator. It is perfectly fine when we use single CPU – each simulation will take some time and we don’t need to worry about uniqueness of the seed. The problem is when we want to use multiple CPUs – all initial parallel simulations will start with exactly the same seed. One way to solve that is to pass the current loop iteration number (i) to C++ and use srand(time(NULL)+i) – that is what I have done. After solving that issue we can write the final piece of the code.

parfor i = 1:length(HTCG)
    params = nominal; %setting parameters to nominal values
    if i>nSim %simulation is for perturbed parameters
        %translating linear index to considered parameter and perturbation value
        j = ceil((i-nSim)/(nSim*length(perturb)));
        k = ceil((mod((i-nSim)-1,nSim*length(perturb))+1)/nSim);
        %updating parameters
        params(j) = params(j)*(1+perturb(k));
        if k == 1 %if proliferation capacity parameter we need to round it
           params(j) = round(params(j)); 

    [~, cells] = CA(params,[tmax*24,N,i]);
    HTCG(i) = length(cells)/3;
    clear mex %important! without that the internal 
              %variables in CA functions won't be cleared and next simulation
              %won't begin with single initial cell

Then in the command line we start the parallel pool with 24 workers (CPUs), by typing parpool(24) command, and run the code. Screenshot below shows nicely how all of the 24 CPUs are being used – no resource wasted!

CPUs occupancy

We can then add few additional lines of the code to plot the results.

nom = mean(HTCG(1:100)); %average for nominal parameters value
%calculating averages for perturbed sets values
av = squeeze(mean(reshape(HTCG(101:end),nSim,length(perturb),length(nominal))));

%plotting results of sensitivity analysis
legend({'\rho_{max}', 'p_{div}', '\alpha', 'p_s', 'p_{mig}'})
xlabel('% perturbation')
ylabel('% change')

And “voila!” – the resulting figure shows that the perturbation in the proliferation capacity has the highest impact on the tumor growth dynamics.


Wrapping C++ code of agent-based tumor growth model into MATLAB

Last week I posted a C++ implementation of basic cancer stem cell driven tumor growth model. About 100 lines of a code allowed to perform simulation, but there was nothing about exporting the results, doing more complicated analysis, visualization, or performing data fitting. Of course each of those tasks can be performed by writing another large piece of the code.

Two weeks ago I posted a MATLAB code for the very similar model, which was quite fast but a lot slower than C++ implementation. On the other hand it was straightforward to visualize the tumor and it’s not much of a hassle to perform more complicated analysis or data fitting using variety of MATLABs built-in functions.

Each of those two approaches has their own advantages: C++ is fast, MATLAB has a lot of built-in in tools.

Today I will show how to take the best out of both approaches, i.e. I’ll show how to easily wrap our C++ code into MATLAB. The whole C++ based simulation will be invoked from MATLAB as a function and results will be visible straight from it.

First we need to add necessary libraries.

#include "mex.h"
#include <matrix.h>

Then we need to modify our original definition of global variables (see previous post) so they can be adjusted based on the input parameters (for example lattice has to be dynamically allocated).

int N;
bool* lattice;
vector<cell> cells;

char pmax;
double pDiv, alpha, ps, pmig; 
int* indcNeigh;

Then the only thing that we need to do is to delete old main() function and use mexFunction() instead. In the mexFunction() we need to read first the parameters that will be passed from MATLAB (input parameters to the function). Then we need to assign the memory for the lattice and set other auxiliary variables. Finally, after performing the simulation we need to write variables to the output, so the MATLAB can see the results.

void mexFunction(
		 int          nlhs, //number of output arguments
		 mxArray      *plhs[], //output arrays
		 int          nrhs, //number of input arguments
		 const mxArray *prhs[] //input arrays
    double *par=mxGetPr(prhs[0]); //model parameters, first input variable
    double *settings=mxGetPr(prhs[1]); //simulation settings, second input variable

    int Nsteps = settings[0]; //number of simulation steps to perform
    N=(int)settings[1]; //size of the lattice
    pmax = par[0]; //divisions before proliferative capacity exhaustion 
    pDiv = par[1]; //division probability 
    alpha = par[2]; //spontaneous death probability 
    ps = par[3]; //probability of symmetric division 
    pmig = par[4]; //probability of migration
    //creating lattice with a given size
    lattice = new bool [N*N];
    fill_n(lattice, N*N, false);
    indcNeigh = new int [8];//cell neighborhood
    //assigning values
    indcNeigh[0] = -N-1; indcNeigh[1] = -N; indcNeigh[2] = -N+1;
    indcNeigh[3] = -1; indcNeigh[4] = 1; indcNeigh[5] = N-1;
    indcNeigh[6] = N; indcNeigh[7] = N+1;

    srand(time(NULL)); //initialize random number generator
    initialize(); //initialize CA
    simulate(Nsteps); //simulate CA
    //1) writing the lattice as a logical matrix
	plhs[0] = mxCreateLogicalMatrix(N,N); //lattice
    mxLogical *ptr = mxGetLogicals(plhs[0]);
    for(int i=0; i<N*N; i++) 
        ptr[i] = lattice[i];

    //2) writing the cells vector as a vector
    plhs[1] = mxCreateDoubleMatrix(1,cells.size()*3,mxREAL);
    double* ptr2 = mxGetPr(plhs[1]);
    for(int i=0; i<cells.size(); i++) {
        ptr2[3*i] =;
        ptr2[3*i+1] =;
        ptr2[3*i+2] =;

After making all of the changes we can compile our .cpp file using the MATLAB command “mex CA.cpp”. Remember all other pieces of C++ associated with model itself remain the same – there is no need to change anything.

From now on we can perform our simulation in MATLAB environment by invoking the following command

[lattice, cells] = CA([pmax, pdiv, alpha, ps,pmig],[nSteps, N]);

and further use of all MATLABs benefits, especially data fitting features. At the same time all of the simulations will be performed at C++ speed! Imagine using Parallel Toolbox in MATLAB to harness multiple CPUs with only few lines of the code.

Setting complex domains for agent based models using bitmaps

In the previous posts (CA in MATLAB and C++) I’ve shown how to implement cancer stem sell driven tumor growth model. In both codes I have set the boundaries of the lattice to true without adding those sites to additional cells vector, so the boundary was treated as an occupied site, but not as a viable cell. This way of coding the system makes it easy to implement more complex domains. Today, I’ll show how to read the complex domains from bitmap (.bmp) file and use them for simulations using previously posted codes. The basic idea is that the image will be loaded into the program and the pixels with values “close” to white will be treated as free sites.

First we need to prepare the image. The posted codes will assume that the image is a square (image=width). We can draw anything we want, remembering that white pixels will be interpreted as free sites. We will write C++ code that will be able to read properly 24 bits bitmap images without the embedded information about the color space. In order to export image in that format we can use GIMP program with the following settings during the export.

Screen Shot 2015-06-09 at 11.20.46 PM

I’ve prepared two exemplary images that I will use further in the simulations. The first one is adapted from the paper by Enderling et al. and the second is a generated text using PowerPoint.
tissue text

Firt, the basic C++ code. We could use additional libraries to load images, but we want to make code as platform independent as possible. We assume that the array lattice and its size are already defined as the global variables (see previous code).

#include <math.h>  

void domainFromImage(char* fileName) {

    FILE* f = fopen(fileName, "rb"); //open to read as binary
    unsigned char head[54];
    fread(head, sizeof(unsigned char), 54, f); // read the header
    //in BMP format the size of each row is rounded up to multiple of 4 bytes
    int Nb = ceil(3.*(double)N/4.)*4; //true size of the row in bytes, including padding

    unsigned char* img = new unsigned char[Nb]; // allocate one row of pixels
    for(int i = 0; i < N; i++) { //for each row
        fread(img, sizeof(unsigned char), Nb, f);//read one row
        for (int j=0; j<N; j++)
            //set to free only those that have high intensity close to white (>240 in all three channels)
            lattice[i*N+j] = !(img[3*j]>250 && img[3*j+1]>250 && img[3*j+2]>250);
    fclose(f);//close file
    //filling boundary
    for (int i=0; i<N; i++) {lattice[i]=true;};//left
    for (int i=0; i<N*N; i+=N) {lattice[i]=true;};//top
    for (int i=N-1; i<N*N; i+=N) {lattice[i]=true;};//bottom
    for (int i=N*(N-1); i<N*N; i++) {lattice[i]=true;};//right

The same task in MATLAB is way easier to implement…

function [L, N] = domainFromImage( filename )
    L = imread(filename); %reading image
    L = ~(L(:,:,1)>250 & L(:,:,2)>250 & L(:,:,3)>250); %setting those values...
    N = size(L,1);%size of the domain (assuming square)
    %filling border
    L([1:N 1:N:N*N N:N:N*N N*(N-1):N*N]) = true; %boundary

What is most important MATLAB code is not so much dependent on the format of the image – it can be easily modified to other image types (imread function is very flexible).

Below is the exemplary result of simulation using image representing the tissue.


Tumor growth can be also simulated for the domain generated using PowerPoint text.


Visualization of 3D tumor using isosurfaces and simple blur

Today, I’ll show how to simply visualize 3D tumor using MATLAB’s isosurface function combined with a simple blurring technique.

First, we will modify the code from the previous post in order to simulate tumor in 3D. There are only a few changes: redefinition of the domain and the neighborhood; generating permutations on the fly instead of storing them in Pms varaible; and the way in which the initial cell is placed.

N = 200; %cubic domain dimension
nSteps = 30*24; %number of simulation steps

pprol = 1/24; %probability of proliferating
pmig = 10/24; %probability of migrating
pdeath = 1/100; %probability of dying
pmax = 10; %proliferation capacity
ps = 3/10; %probability of symmetric division

L = false(N,N,N); %domain definition
L(1,:,:) = true; L(end,:,:) = true; %filling boundary
L(:,1,:) = true; L(:,end,:) = true; %filling boundary
L(:,:,1) = true; L(:,:,end) = true; %filling boundary

L(N*N*round(N/2)+N*round(N/2)+round(N/2)) = true;
cells = int32(N*N*round(N/2)+N*round(N/2)+round(N/2));
cellsIsStem = true;
cellsPmax = uint8(pmax);

aux = int32([[-N-1 -N -N+1 -1 1 N-1 N N+1] ...
             [-N-1 -N -N+1 0 -1 1 N-1 N N+1]-N*N ...
             [-N-1 -N -N+1 0 -1 1 N-1 N N+1]+N*N])'; %indices to heighborhood

for i = 1:nSteps
    sh = randperm(length(cells));
    cells = cells(sh);
    cellsIsStem = cellsIsStem(sh);
    cellsPmax = cellsPmax(sh);

    Pms = cell2mat(arrayfun(@(x)randperm(26),(1:length(cells))','UniformOutput',0))';
    S = bsxfun(@plus,cells,aux(Pms));
    S(L(S)) = 0; %setting occupied spots to 0
    indxF = find(any(S)); %selecting cells with at least one free spot
    nC = length(indxF); %number of cells with free spot
    P = rand(1,nC)<pprol; %proliferation
    Ps = P & rand(1,nC)<ps & cellsIsStem(indxF);%symmetric division
    De = P & (cellsPmax(indxF) == 0);%proliferation exhaution
    D = P & (rand(1,nC)<pdeath) & ~cellsIsStem(indxF); %death at proliferation attempt
    M = ~P & (rand(1,nC)<pmig); %go when no grow
    del = D | De; %cells to delete
    act = find((P | M) & ~del); %indices to the cells that will do something
    for ii = act %only for those that will do anything
        ngh = S(:,indxF(ii)); 
        ngh(ngh==0) = [];
        indO = find(~L(ngh),1,'first'); %selecting free spot
        if ~isempty(indO) %if there is still a free spot
            L(ngh(indO)) = true;
            if P(ii) %proliferation
                cells = [cells uint32(ngh(indO))];
                if Ps(ii) %symmetric division
                   cellsIsStem = [cellsIsStem true];
                   cellsPmax = [cellsPmax cellsPmax(indxF(ii))];  
                   cellsIsStem = [cellsIsStem false];
                   cellsPmax = [cellsPmax cellsPmax(indxF(ii))-1];
                   if ~cellsIsStem(indxF(ii))
                    cellsPmax(indxF(ii)) = cellsPmax(indxF(ii))-1;
            else %migration
                L(cells(indxF(ii))) = false;
                cells(indxF(ii)) = uint32(ngh(indO));
    if ~isempty(del) %updating death
        L(cells(indxF(del))) = false;
        cells(indxF(del)) = [];
        cellsIsStem(indxF(del)) = [];
        cellsPmax(indxF(del)) = [];

Using the above code I simulated two tumors that have different growth characteristics, both with pmig = 5/24 and one with ps = 3/10 and the other one with ps = 5/100. Now we need to visualize them.

Having the whole tumor stored in the L variable we can use isosurface function straight away and plot the tumor as it is. The only thing that we need to do is to remove the cells from the L boundary.

function visualize( N, L)

%clearing cells from the boundary
L(1,:,:) = false; L(end,:,:) = false;
L(:,1,:) = false; L(:,end,:) = false;
L(:,:,1) = false; L(:,:,end) = false;

%calculating isosurfaces and plotting
p = patch(isosurface(1:N,1:N,1:N,L,0.25));

xlim([1 N]);
ylim([1 N]);
zlim([1 N]);
view([90 0]);
lighting gouraud    

The results of the above function are plotted below.


As you can see the visualization doesn’t really allow to see the 3D structure of the tumors, as individual separated cells are plotted and there is no clear 3D surface. We will fix that by using a simple blurring technique. The basic idea is that we will replace each site value by the average value from its neighborhood (several times in a loop). That will allow to delete separated cells and make 3D surface smoother. We add an input variable blLevels to the function in order to define how may iterations of that procedure should be performed.

function visualize( N, L, cells, blLevels )

%clearing cells from the boundary
L(1,:,:) = false; L(end,:,:) = false;
L(:,1,:) = false; L(:,end,:) = false;
L(:,:,1) = false; L(:,:,end) = false;

if blLevels %if perform basic blur

%auxilary variable with indices to the cell negiborhood
aux = int32([[-N-1 -N -N+1 -1 1 N-1 N N+1] ...
             [-N-1 -N -N+1 0 -1 1 N-1 N N+1]-N*N ...
             [-N-1 -N -N+1 0 -1 1 N-1 N N+1]+N*N])';

%creating indices to cells and their beignorhoods
S = [cells unique(reshape(bsxfun(@plus,cells,aux),1,[]))];
%creating indices to neigborhood of indices in S
S2 = bsxfun(@plus,S,aux);
%making sure that indices are still within the lattice
S2(S2<1) = []; S2(S2>N*N*N) = [];

    %changing lattice from logical variable to float
    L = single(L);
    for i = 1:blLevels %for number of blurs
        L(S) = mean(L(S2)); %taking the average of neighborhood

%calculating isosurfaces and plotting
p = patch(isosurface(1:N,1:N,1:N,L,0.25));

xlim([1 N]);
ylim([1 N]);
zlim([1 N]);
view([90 0]);
lighting gouraud    

If we apply blLevels = 1 we get the following, nicer plot of the tumor.


For blLevels = 2 we get the following plots.


Finally, when applying three iterations of blur (blLevels = 3) we get 3D plots nicely showing the tumor shape.


What is also great it takes only about 3 seconds to plot a tumor with more than quarter of a million of cells (left one).

Cancer stem cell driven tumor growth model in less than 70 lines of code

Today I’ve decided to show how to efficiently code a cancer stem cell driven tumor growth model in less then 70 lines of code in MATLAB.

Quick guide to the model: A cell, either cancer stem cell (CSC) or non-stem cancer cell (CC), occupies a single grid point on a two-dimensional square lattice. CSCs have unlimited proliferation potential and at each division they produce either another CSC (symmetric division) or a CC (asymmetric division). CCs are eroding their proliferation potential at each division and die when its value reaches 0. Moreover, at each proliferation attempt, CCs may undergo spontaneous death and then be removed from the system.

We start with defining initial settings of a domain and simulation timespan.

N = 1000; %square domain dimension
nSteps = 6*30*24; %number of simulation steps

Next we define the values of all parameters used in the simulation.

pprol = 1/24; %probability of proliferation
pmig = 10/24; %probability of migrating
pdeath = 1/100; %probability of death
pmax = 10; %initial proliferation capacity of CC
ps = 3/10; %probability of symmetric division

Now we can intialize domain and place initial CSC in its center. Our computational domain is represented by N x N Boolean matrix (a true value indicates the lattice point is occupied). An additional integer vector cells is maintained to store indices for all viable cells present in the system (to avoid costly search for cells on the lattice).

L = false(N,N);
L([1:N 1:N:N*N N:N:N*N N*(N-1):N*N]) = true; %boundary
L(N*round(N/2)+round(N/2)) = true;
cells = int32(N*round(N/2)+round(N/2));
cellsIsStem = true;
cellsPmax = uint8(pmax);

To reduce the amount of used memory we utilized int32 and unit8 types as they occupy less memory than double (double is default type in MATLAB). We also set all the boundary values of L to true in order to make sure that we will not address any site out of the lattice. Typically one would use an if statement when addressing the lattice site to make sure that index to a site is within the domain. Here, because we have set the boundaries to true without adding those sites to cells vector, we don’t need to do that – boundary will be treated as an occupied site, but not as a viable cell.

We can now define auxiliary variables that will be utilized further.

aux = int32([-N-1 -N -N+1 -1 1 N-1 N N+1])'; %indices to heighborhood
Pms = perms(uint8(1:8))'; %permutations
nP = size(Pms,2); %number of permutations

Variable aux simply defines cell’s neighborhood and Pms is a variable in which we store all possible permutations of variable aux.

Now we can start the main loop of the program and randomly shuffle cells at its beginning.

for i = 1:nSteps

sh = randperm(length(cells));
cells = cells(sh);
cellsIsStem = cellsIsStem(sh);
cellsPmax = cellsPmax(sh);

Now few lines in which we first create indices to neighborhoods of all of the cells already in random order (variable S), then we flag by 0 all of the spots that are occupied and finally select only indices to those cells that have at least one free spot in the neighborhood (variable indxF).

S = bsxfun(@plus,cells,aux(Pms(:,randi(nP,1,length(cells)))));
S(L(S)) = 0; %setting occupied spots to 0
indxF = find(any(S)); %selecting cells with at least one free spot
nC = length(indxF); %number of cells with free spot

Now we can decide about the faith of the cells that can perform action (are not completely surrounded by other cells).

    P = rand(1,nC)<pprol; %proliferation
    Ps = P & rand(1,nC)<ps & cellsIsStem(indxF); %symmetric division
    De = P & (cellsPmax(indxF) == 0); %proliferation capacity exhaution
    D = P & (rand(1,nC)<pdeath) & ~cellsIsStem(indxF); %death at proliferation attempt
    M = ~P & (rand(1,nC)<pmig); %go when no grow

In the additional loop we perform update of the system.

    del = D | De; %cells to delete
    act = find((P | M) & ~del); %indices to the cells that will perform action
    for ii = act %only for those that will do anything
        ngh = S(:,indxF(ii)); %cells neighborhood
        ngh(ngh==0) = []; %erasing places that were previously occupied
        indO = find(~L(ngh),1,'first'); %selecting free spot
        if ~isempty(indO) %if there is still a free spot
            L(ngh(indO)) = true; %updating occupancy
            if P(ii) %proliferation
                cells = [cells uint32(ngh(indO))]; %adding new cell
                if Ps(ii) %symmetric division
                   cellsIsStem = [cellsIsStem true];
                   cellsPmax = [cellsPmax cellsPmax(indxF(ii))];  
                   cellsIsStem = [cellsIsStem false];
                   cellsPmax = [cellsPmax cellsPmax(indxF(ii))-1];
                   if ~cellsIsStem(indxF(ii))
                    cellsPmax(indxF(ii)) = cellsPmax(indxF(ii))-1;
            else %migration
                L(cells(indxF(ii))) = false; %freeing spot
                cells(indxF(ii)) = uint32(ngh(indO));

At the end we need to erase cells that died.

    if ~isempty(del) %updating death
        L(cells(indxF(del))) = false;
        cells(indxF(del)) = [];
        cellsIsStem(indxF(del)) = [];
        cellsPmax(indxF(del)) = [];

This is it. Whole CA is coded and we are ready to run simulations!

In order to visualize simulations results we can implement short function.

function visualize( N, cells, cellsIsStem, cellsPmax, pmax )

    M = ones(N,N,3); %matrix for image
    color = hot(3*pmax); %colormap
    %drawing CCs
    M(cells(~cellsIsStem)) = color(cellsPmax(~cellsIsStem)+1,1);
    M(cells(~cellsIsStem)+N*N) = color(cellsPmax(~cellsIsStem)+1,2);
    M(cells(~cellsIsStem)+2*N*N) = color(cellsPmax(~cellsIsStem)+1,3);
    %drawing CSCs, we want to draw them as 3x3 points
    aux = int32([-N-1 -N -N+1 -1 1 N-1 N N+1])';
    CSCs = cells(cellsIsStem);
    plusSurr = bsxfun(@plus,CSCs,aux);
    M(plusSurr) = color(2*pmax,1);
    M(plusSurr+N*N) = color(2*pmax,2);
    M(plusSurr+2*N*N) = color(2*pmax,3);

Below is the visualization of the tumor simulated using the above code. It is consisted of 369,504 cells in total (8563 CSCs). The whole simulation took about 12 minutes on my laptop, what taking into account the lattice size and number of cells is a reasonable time.


Loop-free search for proliferative cells in MATLAB

When thinking about implementation of agent-based model only few of us consider MATLAB as a coding environment, even though it has plenty of built-in visualization and analysis tools. I get that. It’s a high level language that is not very efficient with for loops, so a basic loop-based code will execute much faster when coded in C++ (and others). However, there is a very cool feature of MATLAB – vectorization, that allows to skip plenty of loops and significantly speed-up the computations. In this post I’ll show how to utilize this feature to make a loop free search for a cells that have a free spot in the neighborhood.

Our computational domain is represented by N x N Boolean matrix L (a true value indicates the lattice point is occupied). An additional integer vector cells is maintained to store indices for all viable cells present in the system (to avoid costly search for cells on the lattice). Our task is to write a code that returns indices to those cells that have at least one free spot in the 8 spot neighborhood.

A basic loop-based code in MATLAB that will do the task is the following:

cellsInd = false(size(cells));
    for j = 1:length(cells)
        loop = true; 
        for ii = -1:1:1
            if ~loop
            for kk = -N:N:N
                if ~L(cells(j)+ii+kk)
                   loop = false; 
                   cellsInd(j) = true;
    cellsInd = find(cellsInd);

The code is similar to the one that would be written in C++. For 2000×2000 lattice with 800,000 cells evaluation of 200 iterations of the above code take about 83 seconds. Not very impressive… Same task for a code written in C++ takes about 3 seconds (when using vectors from STL library). Can we do something about that?

As I wrote, vectorization is a very cool MATLAB feature. When using it, coding the whole task takes only one two lines:

aux = [-N-1; -N; -N+1; -1; 1; N-1; N; N+1]';
cellsInd = find(~all(L(bsxfun(@plus,cells,aux))));

Moreover, the evaluation time is reduced from about 83 seconds to about 9.5 seconds. Impressive speed-up and elegant code in two lines.

Still, both codes are outperformed by C++, but the difference when using vectorized code is bearable.