Avoiding boundary effects – dynamically expanding lattice for ABM

In a typical setting, agent-based model (ABM) simulations take place on a domain of fixed size, e.g. square lattice of size 500 x 500. Depending on the model, this can introduce so-called boundary effects, i.e. model predictions that are in part caused by the limited amount of space. Some time ago me and @theheikman published a paper in which we investigated the impact of evolution of cancer stem cells on the rate of tumor growth (link to the paper here). Because of the ABM formulation, we had to implement domain that can freely expand on demand. Otherwise, the tumor evolution would stop once the cells fill all the space. Of course, we could set the initial domain size to be so big that the space wouldn’t be filled in the simulated timeframe. However, it is hard to know a priori what domain size will be sufficient. In this post I will show how to implement in C++ an ABM lattice that dynamically expands if one of the cells touches its boundary. We will be working on pointers and memory allocating/freeing routines.

Our lattice will be a boolean array in which value of true will indicate that the spot is occupied by the cell. First, we need to create lattice of initial size NxN.

lattice=new bool *[N];
for (int i=0; i<N; i++) {
lattice[i] = new bool [N] ();
fill_n(lattice[i], N, false);
}

An element of the lattice can be easily accessed using double indexing, i.e. lattice[i][j] = true.

Now we need to have a procedure that will expand our lattice, by fixed amount of rows (gN) from top, bottom and fixed amount of columns (also gN) to right and left.

void expandLattice() {
bool **aux;
aux=new bool *[N+2*gN];

for (int i=0; i<gN; i++) { //adding empty columns to the left
aux[i] = new bool [N+2*gN] ();
fill_n(aux[i], N+2*gN, false);
}

for (int i=N+gN; i<N+2*gN; i++) { //adding empty columns to the right
aux[i] = new bool [N+2*gN] ();
fill_n(aux[i], N+2*gN, false);
}

//copying the interior columns
for (int i=gN; i<N+gN; i++) {
bool *aux2;
aux2 = new bool [N+2*gN] ();
fill_n(aux2, N+2*gN, false); //filling with false values
memcpy(aux2+gN, lattice[i-gN],N*sizeof(bool)); //copying existing values
free(lattice[i-gN]);
aux[i] = aux2;
}

free(lattice);
lattice=aux;
N += 2*gN;
}

Very important part of the above procedure is invoking free() function in order to deallocate the memory previously occupied by the lattice.

And that is about it: if an event of touching the boundary is detected, we just invoke the expandLattice() function. We also need to remember to free the allocated memory at the very end of the simulation, by putting

for (int i=0; i<N; i++)
free(lattice[i]);

free(lattice);

at the end of main() function.

In CAexpandP file you can find the code for the cancer stem cell driven model of tumor growth considered in the previous posts that uses dynamically expanding lattice (change extension from .doc to .cpp). Plot below nicely shows how the domain size grows together with the tumor when using that code with initial N = 100 and gN = 100 (plot shows the average of 20 simulations).

results

 

Advertisements

Julia – excellent opensource alternative for MATLAB

After my last post about ABM speed comparison @marcjwills_ suggested that Julia programming language could be a good alternative for slower, but easy to code programming languages – I’ve decided to give it a try. On the project webpage I found promising speed comparisons showing that Julia can indeed be my new tool of choice. After easy download and installation I saw the terminal window that brought a huge smile to my face.

Screen Shot 2015-10-04 at 3.00.41 PM

Quick web search allowed me to find nice IDE for Julia called Juno – simple and not overloaded with features. What might be a great news to some people (especially @dbasanta) is that Julia after few easy tweaks can use matplotlib.

The obvious first test to do is to update speed comparison post with Julia. First, I’ve decided to rewrite the partially vectorized MATLAB code in Julia without making any essential changes. After browsing the web I found couple of useful webpages that helped me with doing that without much effort:

The differences between Julia and MATLAB are really minor and most of the functions work in exactly the same way. I only had to be careful with logical operations between matrices and scalars – in Julia you need to add “.” to the operator to make it work as in MATLAB. Other changes that I had to make were:

  1. To create logical lattice: “false(N,N)” to “falses(N,N)”.
  2. Array elements are accessed with “[i]” instead of “(i)”.
  3. To create array of all permutations: “perms(uint8(1:8))'” to “hcat(collect(permutations(1:8))…)”.
  4. “bsxfun()” to “broadcast()”
  5. Adding elements to vectors using “push!()” function instead of concatenation.
  6. Deleting elements by using “deleteat!()” function.

The working code in Julia is the following:

N = 1000; #square domain dimension

pprol = 1/24; #probability of proliferating
pmig = 15/24; #probability of migrating
pdeath = 5/100; #probability of dying
pmax = int8(10); #proliferation capacity
ps = 0.3; #probability of symmetric division

L = falses(N,N);
L[[1:N, 1:N:N*N, N:N:N*N, N*(N-1):N*N]] = true; #boundary

L[N*round(N/2)+round(N/2)] = true;
cells = int32([N*round(N/2)+round(N/2)]);
cellsIsStem = [true];
cellsPmax = int8([pmax]);

aux = int32([-N-1 -N -N+1 -1 1 N-1 N N+1])'; #indices to heighborhood
Pms = hcat(collect(permutations(1:8))...); #permutations
nP = size(Pms,2); #number of permutations
nSteps = 0;

while(length(cells)<10^4)
    sh = randperm(length(cells));
    cells = cells[sh];
    cellsIsStem = cellsIsStem[sh];
    cellsPmax = cellsPmax[sh];
    
    S = broadcast(+,cells',aux[Pms[:,rand(1:nP,length(cells))]])
    
    S[L[S]] = 0; #setting occupied spots to 0
    indxF = int32(find(any(S,1))); #selecting cells with at least one free spot
    nC = length(indxF); #number of cells with free spot
    
    P = rand(nC).<pprol; #proliferation
    Ps = P & (rand(nC).<ps) & (cellsIsStem[indxF]);#symmetric division
    De = P & (cellsPmax[indxF] .== 0);#proliferation exhaution
    D = P & (rand(nC).<pdeath) & (~cellsIsStem[indxF]); #death at proliferation attempt
    M = (~P) & (rand(nC).<pmig); #go when no grow del = D | De; #cells to delete act = find((P | M) & (~del)); #indices to the cells that will do something for ii = act #only for those that will do anything ngh = S[:,indxF[ii]]; ngh = ngh[ngh.>0];
        indO = find(~L[ngh]); #selecting free spot
        if ~isempty(indO) #if there is still a free spot
            indO = indO[1];
            L[ngh[indO]] = true;
            if P[ii] #proliferation
                push!(cells,int32(ngh[indO]));
                if Ps[ii] #symmetric division
                   push!(cellsIsStem,true);
                   push!(cellsPmax,cellsPmax[indxF[ii]]);  
                else
                   push!(cellsIsStem,false);
                   push!(cellsPmax, cellsPmax[indxF[ii]]-1);
                   if ~cellsIsStem[indxF[ii]]
                    cellsPmax[indxF[ii]] = cellsPmax[indxF[ii]]-1;
                   end
                end
            else #migration
                L[cells[indxF[ii]]] = false;
                cells[indxF[ii]] = int32(ngh[indO]);
            end
        end
    end

    if any(del) #updating death
        L[cells[indxF[del]]] = false;
        deleteat!(cells,indxF[del]);
        deleteat!(cellsIsStem,indxF[del]);
        deleteat!(cellsPmax,indxF[del]);
    end 
end

To visualize the results of simulations we can use Gadfly plotting package and use simple spy() function (analog to MATLAB).

using Gadfly
spy(L)

Screen Shot 2015-10-04 at 3.41.52 PM

Ok, so modifications were easy and straightforward, but what about speed? Amazingly Julia performed almost exactly the same as MATLAB (see plot below).

speed

But that is not all… MATLAB code was using vectorization feature. Without that, when using only loops like in C++, it would be several times slower. Is that also the case in Julia?

Below is the C++ style code in Julia – only loops, no vectorization. One of the awesome feature of Julia, that MATLAB lacks, is definition of functions within the script (see “returnEmptySpace” function below).

N = 1000; #square domain dimension

aux = int32([-N-1 -N -N+1 -1 1 N-1 N N+1]); #indices to heighborhood

pprol = 1/24; #probability of proliferating
pmig = 15/24; #probability of migrating
pdeath = 5/100; #probability of dying
pmax = int8(10); #proliferation capacity
ps = 0.3;

L = falses(N,N);
L[[1:N, 1:N:(N*N), N:N:(N*N), (N*(N-1)):(N*N)]] = true; #boundary

L[N*round(N/2)+round(N/2)] = true;
cells = [int32(N*round(N/2)+round(N/2))];
cellsIsStem = [true];
cellsPmax = [int8(pmax)];

#defining function that returns random empty place around cell
function returnEmptyPlace(indx::Int32)
   neigh = int32(zeros(1,8))
   nF = 0
   for i = 1:8
      if ~L[indx+aux[i]]
         neigh[nF+1] = indx+aux[i]
         nF+=1
      end
   end
   if nF>0
      return neigh[rand(1:nF,1)];
   else
      return 0;
   end
end
      

while length(cells)<10^5

   sh = randperm(length(cells));
   cells = cells[sh];
   cellsIsStem = cellsIsStem[sh];
   cellsPmax = cellsPmax[sh];

   newCells = Int32[]
   newCellsIsStem = Bool[]
   newCellsPmax = Int8[]

   i = 1;
   while i<=length(cells) deleted = false; newSite = returnEmptyPlace(cells[i])[1]; if newSite > 0 #if there is a new spot
        if rand()<pprol
            if cellsIsStem[i] #is stem cell
                push!(newCells, newSite);
                L[newSite] = true;
                push!(newCellsPmax,cellsPmax[i]);
                if (rand()<ps) #symmetric division push!(newCellsIsStem, true); else push!(newCellsIsStem, false); end else if (cellsPmax[i]>0) && rand()>pdeath
                    cellsPmax[i]-=1
                    L[newSite]=true;
                    push!(newCells,newSite);
                    push!(newCellsPmax,cellsPmax[i]);
                    push!(newCellsIsStem, false);
                else
                    L[cells[i]]=false;
                    splice!(cells,i)
                    splice!(cellsPmax,i)
                    splice!(cellsIsStem,i)
                    deleted = true;
                end
            end
        elseif rand() < pmig
            L[newSite]=true;
            L[cells[i]]=false;
            push!(newCells,newSite);
            push!(newCellsPmax,cellsPmax[i]);
            push!(newCellsIsStem, cellsIsStem[i]);
            splice!(cells,i)
            splice!(cellsPmax,i)
            splice!(cellsIsStem,i)
            deleted = true;
        end
      end

      if ~deleted
        i+=1;
      end
   end

   cells = [cells, newCells];
   cellsPmax = [cellsPmax, newCellsPmax];
   cellsIsStem = [cellsIsStem, newCellsIsStem]
end

How about the speed of a new code? I was amazed with the results…

speed2

Java, C++, MATLAB, Julia or Python for cellular automatons? Speed comparison.

Thanks to @spuri4096 and @chandlergatenbee, the basic cancer stem cell driven model of tumor growth that I implemented in the very first Compute Cancer blog is now programmed in 4 languages:

  1. C++ (post with the code here)
  2. MATLAB (post with the code here)
  3. Java (post with the code here)
  4. Python (post with the code here)
  5. Octave – the same code as for MATLAB. Works without any modifications.
  6. Julia (post with the code here)

Isn’t that the perfect opportunity to see which one is the fastest?

First of all, codes are not completely identical. There are small differences between C++ and MATLAB codes in how the death of a cell is updated, but that shouldn’t affect the comparison too much. The code in Java is much more sophisticated – it uses coded lattice concept and dynamically expanding lattice. Python code (as much as I can see…) does pretty much the same job as the C++ code, so that is the fairest comparison out of all.

I will compare the time needed to simulate cancer from a single cancer stem cell to 100, 250 and 500 thousand of cells. The domain size is set to 1000×1000 grid points (not for Java code, in which lattice expands on demand), so 1 million is the maximal cell number. Of course all simulation parameters are also the same for all codes: probability of symmetric division (ps) = 0.3; proliferation probability (pdiv) = 1/24; migration probability = 15/24; and probability of spontaneous death (alpha) = 0.05. Because of the stochastic nature of the model I will run 20 simulations for each code and report the average time (+- standard deviation).

The results are summarized on the plot below.

speed2

As probably everyone expected C++ outcompeted other codes.
Java also did great.
What might surprise some people, MATLAB didn’t perform that bad (it doesn’t suck that bad as people usually think).
The speed of Octave was a surprise for me – I had patience only to run simulations for 100K cell limit…

Of course in each case there is definitely a room for improvement using even more sophisticated language specific programming tricks.

However, from my perspective – if you don’t want to spend too much time on learning programing language, MATLAB (Octave) might be not such a bad idea.

Quick implementation of hexagonal lattice for ABM

In the agent based modeling we are typically more interested in the rules governing the cell fate rather than the basic setting of the lattice (if we don’t look at the off lattice model). However, the particular setting of the computational domain can have an effect on the model dynamics. In 2D we typically consider the following lattices and neighborhoods:

picture1

with the rectangular grid with Moore neighborhood being probably most frequently utilized. With von Neumann neighborhood we have the fewest number of neighbors and cells are spatially saturated earlier. The problem with Moore neighborhood is that the distance to all of the sites is not the same. Hexagonal grid has good properties (same distance, 6 neighbors), but is less frequently utilized, because implementation is more involved. In todays post I will show that essentially there is no difference in implementation between all of those lattices.

We will use the codes for basic ABM model posted before as a template (MATLAB version here and C++ version here). In both cases we considered rectangular lattice with Moore neighborhood. If we set migration probability to zero, set large value of proliferation capacity and set the spontaneous death rate to zero, i.e. we simulate essentially only division events, we will see in visualizations of simulated tumors what kind of neighborhood we assumed:

square

Let us see how the visualization looks like for different types of neighborhoods/lattices.

In both implementations we had a separate array defining the neighborhood of the cell. Thus, in order to modify the code to von Neumann neighborhood we need to change only two consecutive lines in the MATLAB code in which we define the neighborhood and the permutations table:

aux = int32([-N -1 1 N])'; %indices to heighborhood
Pms = perms(uint8(1:4))'; %permutations

In C++ implementation we need to modify the definition of the neighborhood:

static const int indcNeigh[] = {-N, -1, 1, N};//neighborhood

and the for loop in the returnEmptyPlace function:

for(int j=0;j<4;j++) {//searching through neighborhood
        if (!lattice[indx+indcNeigh[j]]) {
            neigh[nF] = indx+indcNeigh[j];
            nF++;
        }
}

As it could be expected, switching to von Neumann neighborhood made the simulated tumor diamond shaped.

diamond

Let us now consider hexagonal lattice. The very first observation that we need to make is that hexagonal lattice is essentially an rectangular lattice with shifted odd (or even) rows and two definitions of neighborhood (one for cells in odd rows, second for cells in even rows), see picture below

concept

Thus, in the modified implementation we again only need to change the definition of the neighborhood and add the condition for choosing the proper one. In MATLAB it can be achieved by introducing two definitions of neighborhoods

auxR = int32([-N -1 1 N-1 N N+1])'; %indices to heighborhood
auxL = int32([-N -1 1 -N-1 N -N+1])'; %indices to heighborhood

Pms = perms(uint8(1:6))'; %permutations

and modify the lines in the main loop in which we create the neighborhood for all viable cells (create variable S)

    odd = mod(cells,2) ~= 0; %selecting cells in the odd rows
    S = zeros(6,length(cells));
    if any(odd) %creating neighborhood for the cells in the odd rows
        SR = bsxfun(@plus,cells(odd),auxR(Pms(:,randi(nP,1,sum(odd)))));
        S(:,odd) = SR;
    end
    even = ~odd;
    if any(even) %creating neighborhood for the cells in the even rows
        SL = bsxfun(@plus,cells(even),auxL(Pms(:,randi(nP,1,sum(even)))));
        S(:,even) = SL;
    end

In C++ the changes are even easier. We again introduce two neighborhoods:

static const int indcNeighR[] = {-N,-1,1,N-1,N,N+1};//neighborhood
static const int indcNeighL[] = {-N,-1,1,-N-1,N,-N+1};//neighborhood

and modify the returnEmptyPlace function

for(int j=0;j<6;j++) {//searching through neighborhood
        if (indx % 2) {//if odd row
            if (!lattice[indx+indcNeighR[j]]) {
                neigh[nF] = indx+indcNeighR[j];
                nF++;
            }
        } else {//if even row
            if (!lattice[indx+indcNeighL[j]]) {
                neigh[nF] = indx+indcNeighL[j];
                nF++;
            }
        }
}

We might also want to change the code for visualization. In the previous settings it was enough to represent each cell as a pixel in the image. In case of hexagonal lattice in order to be correct we need to somehow shift the pixels in odd (or even rows). The easiest way to achieve that is to represent each cell as 4 pixels and adjust position according to the row number:

drawing

The modified visualization code in MATLAB is the following

function visualizeHex( N, cells, cellsIsStem, cellsPmax, pmax )

    %select cells in the odd rows
    odd = mod(cells,2) ~= 0;

    M = ones(2*N,2*N,3); %matrix for image, we expand it
    
    %disperse cell by one spot
    i = mod(double(cells)-1,N)+1; %row, REMEMBER THAT CELLS ARE UINTs!
    j = ceil(double(cells)/N); %column
    cells = (i*2-1)+(2*j-1)*2*N;
   
    %add cells to the top right in odd rows
    cells = [cells reshape(bsxfun(@plus,cells(odd),[-1; 2*N-1; 2*N]),1,[])];
    cellsIsStem = [cellsIsStem reshape(repmat(cellsIsStem(odd),3,1),1,[])];
    cellsPmax = [cellsPmax reshape(repmat(cellsPmax(odd),3,1),1,[])];
    
    %add cells to top left in even rows
    even = [~odd false(1,3*sum(odd))];
    cells = [cells reshape(bsxfun(@plus,cells(even),[-1; -2*N-1; -2*N]),1,[])];
    cellsIsStem = [cellsIsStem reshape(repmat(cellsIsStem(even),3,1),1,[])];
    cellsPmax = [cellsPmax reshape(repmat(cellsPmax(even),3,1),1,[])];
    
    %plotting
    color = hot(3*pmax);
    M(cells(~cellsIsStem)) = color(cellsPmax(~cellsIsStem)+1,1);
    M(cells(~cellsIsStem)+4*N*N) = color(cellsPmax(~cellsIsStem)+1,2);
    M(cells(~cellsIsStem)+8*N*N) = color(cellsPmax(~cellsIsStem)+1,3);

    CSCs = cells(cellsIsStem);
    M(CSCs) = color(2*pmax,1);
    M(CSCs+4*N*N) = color(2*pmax,2);
    M(CSCs+8*N*N) = color(2*pmax,3);

    figure(1)
    clf
    imshow(M);
    
end

Finally, simulating the tumor with exactly the same parameters settings on the hexagonal lattice results in way more circular tumor.

hexagonal

Agent-Based Modeling (ABM) of Tumors in Java

Before I begin I would like to quickly introduce myself — my name is Sameer Puri and I am a high school student working in an internship under Dr. Heiko Enderling through the HIP IMO program at Moffitt Cancer Center.

Over the past 5 weeks, I have been developing an ABM in Java for my research. My principal goal in developing this ABM was to quickly generate tumors with 1 million cells. (And prove that Java can perform better in some cases…!)

When I started, I was working with an ABM written by @jpoleszczuk and Heiko in C++ (available here). Having a strong background in Java and little past experience with C++, it was difficult for me to understand. Structs, vectors, memcpy, and other features of C++ are not present in Java. Instead of trying to learn C++, I converted the code to Java, my language of expertise.

Optimization

I decided to optimize the code before making any extensions for my research. In Java, finding slow-down points is simple, since an array of profiling tools are available to find which part of the code is slowing down the ABM. I utilized the web-based profiler WarmRoast, developed by sk89q (on GitHub here).

  • One of the biggest time hogs was the Random class which was implemented in the 1990s, and after 20 years, much more efficient pseudo-random number generators have emerged. I replaced it with the XorShift generator, developed by George Marsaglia. The XorShift is theoretically 3 times faster than the default generator. However, the speed-up was far more significant — the call for a random double was so fast that it disappeared from the WarmRoast profiler’s results.
  • The returnEmptyPlace method was optimized with a strategy known as pregeneration, or exchanging memory for lower CPU time. The returnEmptyPlace method looks for an empty spot around a cell in the ABM, shuffling a list of 8 directions each time. Instead, I pregenerated all 40,320 permutations of the 8 direction vectors surrounding a lattice square. Now, the returnEmptyPlace method simply uses the speedy XorShift generator to select a random permutation of the directions list to use.
  • I also optimized the shuffling of the list of cells in the main simulation loop. I created a new class called SPCollections which uses the XorRandom generator for shuffling. There was a large speed-up here as well.
  • Using NetBeans’ built-in hashcode generator, I added hashcodes for the Cell and Indx classes, which will speed up the usage of HashMaps or other hash-based structures.
  • Since memory was not a serious concern for me, I made one optimization. The original model in C++ used the char primitive type for its lattice. I turned the lattice into a byte lattice, which has half the memory footprint (chars are 16 bits, bytes are 8 bits).

Additions

  • For my research, I had to grow 10 tumors in silico with one million cells each. To speed up simulation I utilized parallel computing with an ExecutorService. With my ABM, you can specify the number of “trials” you want to run. For each trial run, a simulator instance is created and run on a separate thread. Most CPUs today have at least 4 cores. Code in most programming languages tend to make use of only one core unless programmers specify methods that allow for parallel execution of code. By creating a separate thread for each simulation, the threads can operate on separate cores and maximize CPU usage!
  • To universalize my data, I saved it utilizing Java serialization. With serialization, Java can output objects to files, which can be loaded later on without needing to code a loader. Others will be able to easily load data generated using my ABM and run their own operations on it.
  • I coded an image generator to automatically generate tumor images. Once a simulation is finished, the image generator takes the serialized output of the simulation and generates three types of PNG images for each dumped instance: proliferation, quiescence, and nutrition. It also creates an animated GIF to show the growth of the tumor over time. Each image has the time step (1 step = 1 hour), number of stem cells and number of regular cancer cells at the upper left corner.

Proliferation during Development


Yellow = Glioma Stem Cell Reddish = few divisions ==> Black = many divisions

Nutrition during Development (on rim)

Nutrients of lattice on the outer edge is displayed. Unoccupied (0) and occupied (9) lattice spots are white

Quiescence during Development

Orange = quiescent Red = can divide/proliferate


 

 

 

 

Download

You can download my ABM here

Setting complex domains for agent based models using bitmaps

In the previous posts (CA in MATLAB and C++) I’ve shown how to implement cancer stem sell driven tumor growth model. In both codes I have set the boundaries of the lattice to true without adding those sites to additional cells vector, so the boundary was treated as an occupied site, but not as a viable cell. This way of coding the system makes it easy to implement more complex domains. Today, I’ll show how to read the complex domains from bitmap (.bmp) file and use them for simulations using previously posted codes. The basic idea is that the image will be loaded into the program and the pixels with values “close” to white will be treated as free sites.

First we need to prepare the image. The posted codes will assume that the image is a square (image=width). We can draw anything we want, remembering that white pixels will be interpreted as free sites. We will write C++ code that will be able to read properly 24 bits bitmap images without the embedded information about the color space. In order to export image in that format we can use GIMP program with the following settings during the export.

Screen Shot 2015-06-09 at 11.20.46 PM

I’ve prepared two exemplary images that I will use further in the simulations. The first one is adapted from the paper by Enderling et al. and the second is a generated text using PowerPoint.
tissue text

Firt, the basic C++ code. We could use additional libraries to load images, but we want to make code as platform independent as possible. We assume that the array lattice and its size are already defined as the global variables (see previous code).

#include <math.h>  

void domainFromImage(char* fileName) {

    FILE* f = fopen(fileName, "rb"); //open to read as binary
    unsigned char head[54];
    fread(head, sizeof(unsigned char), 54, f); // read the header
    
    //in BMP format the size of each row is rounded up to multiple of 4 bytes
    int Nb = ceil(3.*(double)N/4.)*4; //true size of the row in bytes, including padding

    unsigned char* img = new unsigned char[Nb]; // allocate one row of pixels
    
    for(int i = 0; i < N; i++) { //for each row
        fread(img, sizeof(unsigned char), Nb, f);//read one row
        for (int j=0; j<N; j++)
            //set to free only those that have high intensity close to white (>240 in all three channels)
            lattice[i*N+j] = !(img[3*j]>250 && img[3*j+1]>250 && img[3*j+2]>250);
    }
    fclose(f);//close file
    
    //filling boundary
    for (int i=0; i<N; i++) {lattice[i]=true;};//left
    for (int i=0; i<N*N; i+=N) {lattice[i]=true;};//top
    for (int i=N-1; i<N*N; i+=N) {lattice[i]=true;};//bottom
    for (int i=N*(N-1); i<N*N; i++) {lattice[i]=true;};//right
}

The same task in MATLAB is way easier to implement…

function [L, N] = domainFromImage( filename )
    L = imread(filename); %reading image
    L = ~(L(:,:,1)>250 & L(:,:,2)>250 & L(:,:,3)>250); %setting those values...
    N = size(L,1);%size of the domain (assuming square)
    
    %filling border
    L([1:N 1:N:N*N N:N:N*N N*(N-1):N*N]) = true; %boundary
end

What is most important MATLAB code is not so much dependent on the format of the image – it can be easily modified to other image types (imread function is very flexible).

Below is the exemplary result of simulation using image representing the tissue.

tissueTumor

Tumor growth can be also simulated for the domain generated using PowerPoint text.

ht4

Cancer stem cell driven tumor growth model in C++

Because of the feedback that I’ve received after publishing first three posts, I’ve decided to change a tool of interest from MATLAB to C++. Today, I’ll show how to quickly implement a cancer stem cell driven tumor growth model in C++. It is almost the same model as implemented in my previous post (I’ll explain why it is “almost” the same at the end of this post).

Quick guide to the model: A cell, either cancer stem cell (CSC) or non-stem cancer cell (CC), occupies a single grid point on a two-dimensional square lattice. CSCs have unlimited proliferation potential and at each division they produce either another CSC (symmetric division) or a CC (asymmetric division). CCs are eroding their proliferation potential at each division and die when its value reaches 0. Moreover, at each proliferation attempt, CCs may undergo spontaneous death and then be removed from the system.

First we start with including the necessary headers and setting the namespace. Apart from the standard functions and datatypes, we will use the vector datatype to store the cells and a shuffling function from algorithm library.

#include <vector>
#include <algorithm>
#include <stdlib.h>
#include <time.h>

using namespace std;

Now we can define a cell. It will be defined by three variables: index to the place on the lattice (integer), remaining proliferation capacity (char), and boolean variable defining stemness.

struct cell {//defining cell
    int place;
    char p;
    bool is_stem;
};

Next step is to define the lattice size, the lattice itself, vector that will contain all viable cells, and auxiliary variable defining the cells neighborhood.

static const int N = 2000; //lattice size
bool lattice[N*N] = {false}; //empty lattice
vector<cell> cells; //vector containing all cells present in the system

static const int indcNeigh[] = {-N-1, -N, -N+1, -1, 1, N-1, N, N+1};//neighborhood

The parameters of the model are defined as follows.

char pmax=10; //proliferation capacity
double pDiv=1./24.; //division probability
double alpha=0.05; //spontaneous death probability
double ps=0.05; //probability of symmetric division
double pmig=10./24.; //probability of migration

Having made all of those initial definitions we can finally start coding the main program.
Let us start with writing the function that will initialize the whole simulation, i.e. fill the lattice boundary and put the initial stem cell.

void initialize() {
    for (int i=0; i<N; i++) {lattice[i]=true;};//filling left
    for (int i=0; i<N*N; i=i+N) {lattice[i]=true;};//filling top
    for (int i=N-1; i<N*N; i=i+N) {lattice[i]=true;};//filling bottom
    for (int i=N*(N-1); i<N*N; i++) {lattice[i]=true;};//filling right
    
    lattice[N/2*N+N/2] = true; //initial cell in the middle
    cell initialCell = {N/2*N+N/2,pmax,true};//initial cell definition
    cells.push_back(initialCell);
}

As in the previous post, we set all the boundary values of lattice to true in order to make sure that we will not address any site out of the lattice. Typically one would use an if statement when addressing the lattice site to make sure that index to a site is within the domain. Here, because we have set the boundaries to true without adding those sites to cells vector, we don’t need to do that – boundary will be treated as an occupied site, but not as a viable cell.

The second auxiliary function that we will use returns the index to the randomly selected empty space around a given spot.

int returnEmptyPlace(int indx) {
    int neigh[8], nF = 0;
    for(int j=0;j<8;j++) {//searching through neighborhood
        if (!lattice[indx+indcNeigh[j]]) {//if free spot
            neigh[nF] = indx+indcNeigh[j]; //save the index
            nF++; //increase the number of found free spots
        }
    }
    if(nF) {//selecting free spot at random
        return neigh[rand() % nF];
    } else {//no free spot
        return 0;
    }
}

If there is no free spot the function returns 0.
Finally we can code the part in which all the magic happens – main simulation procedure. It is hard to dissect the whole procedures into parts, so I did my best to explain everything in the comments.

void simulate(int nSteps) {
    
    vector<cell> cellsTmp;
    int newSite;
    cell currCell, newCell;
    
    for (int i=0; i<nSteps; i++) {
       random_shuffle(cells.begin(), cells.end()); //shuffling cells
       while (!cells.empty()) {
           currCell=cells.back(); //pick the cell
           cells.pop_back();
           newSite = returnEmptyPlace(currCell.place);
           
           if (newSite) {//if there is a new spot
               newCell = currCell;
               newCell.place = newSite;
               if ((double)rand()/(double)RAND_MAX < pDiv) {
                   if (currCell.is_stem) {
                       lattice[newSite]=true;
                       cellsTmp.push_back(currCell);
                       if ((double)rand()/(double)RAND_MAX > ps) {//asymmetric division
                           newCell.is_stem = false;
                       }
                       cellsTmp.push_back(newCell);
                   } else if (currCell.p > 0 && (double)rand()/(double)RAND_MAX > alpha) {
                       currCell.p--;
                       newCell.p--;
                       lattice[newSite] = true;
                       cellsTmp.push_back(currCell);
                       cellsTmp.push_back(newCell);
                   } else {
                       lattice[currCell.place] = false;
                   }
               } else if ((double)rand()/(double)RAND_MAX < pmig) {
                   lattice[currCell.place] = false;
                   lattice[newSite] = true;
                   cellsTmp.push_back(newCell);
               } else {//doing nothing
                 cellsTmp.push_back(currCell);
               }
           } else {//no free spot
               cellsTmp.push_back(currCell);
           }
       }
       cells.swap(cellsTmp);
    }
}

Now we wrap everything in the main function, compile and run.

int main() {
    srand(time(NULL)); //initialize random number generator
    initialize(); //initialize CA
    simulate(24*30*6);
    return 0;
}

Everything in about 100 lines of the code.
What about the speed of the code? It took about 40 seconds to simulate the tumor presented below, which is consisted of about 460,000 cells (on 2000×2000 lattice).

Screen Shot 2015-06-06 at 12.24.53 AM

Why is it “almost” the same model as the one presented in previous post? The answer is in details. In the above C++ implementation lattice is updated after every single cell movement/death, i.e. because of the random shuffling a new free spot can be created for a cell that at the beginning of the iteration was quiescent (without a free spot), so it can successfully proliferate in the same iteration. In the MATLAB implementation that kind of behavior was avoided.

Cancer stem cell driven tumor growth model in less than 70 lines of code

Today I’ve decided to show how to efficiently code a cancer stem cell driven tumor growth model in less then 70 lines of code in MATLAB.

Quick guide to the model: A cell, either cancer stem cell (CSC) or non-stem cancer cell (CC), occupies a single grid point on a two-dimensional square lattice. CSCs have unlimited proliferation potential and at each division they produce either another CSC (symmetric division) or a CC (asymmetric division). CCs are eroding their proliferation potential at each division and die when its value reaches 0. Moreover, at each proliferation attempt, CCs may undergo spontaneous death and then be removed from the system.

We start with defining initial settings of a domain and simulation timespan.

N = 1000; %square domain dimension
nSteps = 6*30*24; %number of simulation steps

Next we define the values of all parameters used in the simulation.

pprol = 1/24; %probability of proliferation
pmig = 10/24; %probability of migrating
pdeath = 1/100; %probability of death
pmax = 10; %initial proliferation capacity of CC
ps = 3/10; %probability of symmetric division

Now we can intialize domain and place initial CSC in its center. Our computational domain is represented by N x N Boolean matrix (a true value indicates the lattice point is occupied). An additional integer vector cells is maintained to store indices for all viable cells present in the system (to avoid costly search for cells on the lattice).

L = false(N,N);
L([1:N 1:N:N*N N:N:N*N N*(N-1):N*N]) = true; %boundary
L(N*round(N/2)+round(N/2)) = true;
cells = int32(N*round(N/2)+round(N/2));
cellsIsStem = true;
cellsPmax = uint8(pmax);

To reduce the amount of used memory we utilized int32 and unit8 types as they occupy less memory than double (double is default type in MATLAB). We also set all the boundary values of L to true in order to make sure that we will not address any site out of the lattice. Typically one would use an if statement when addressing the lattice site to make sure that index to a site is within the domain. Here, because we have set the boundaries to true without adding those sites to cells vector, we don’t need to do that – boundary will be treated as an occupied site, but not as a viable cell.

We can now define auxiliary variables that will be utilized further.

aux = int32([-N-1 -N -N+1 -1 1 N-1 N N+1])'; %indices to heighborhood
Pms = perms(uint8(1:8))'; %permutations
nP = size(Pms,2); %number of permutations

Variable aux simply defines cell’s neighborhood and Pms is a variable in which we store all possible permutations of variable aux.

Now we can start the main loop of the program and randomly shuffle cells at its beginning.

for i = 1:nSteps

sh = randperm(length(cells));
cells = cells(sh);
cellsIsStem = cellsIsStem(sh);
cellsPmax = cellsPmax(sh);

Now few lines in which we first create indices to neighborhoods of all of the cells already in random order (variable S), then we flag by 0 all of the spots that are occupied and finally select only indices to those cells that have at least one free spot in the neighborhood (variable indxF).

S = bsxfun(@plus,cells,aux(Pms(:,randi(nP,1,length(cells)))));
S(L(S)) = 0; %setting occupied spots to 0
indxF = find(any(S)); %selecting cells with at least one free spot
nC = length(indxF); %number of cells with free spot

Now we can decide about the faith of the cells that can perform action (are not completely surrounded by other cells).

   
    P = rand(1,nC)<pprol; %proliferation
    Ps = P & rand(1,nC)<ps & cellsIsStem(indxF); %symmetric division
    De = P & (cellsPmax(indxF) == 0); %proliferation capacity exhaution
    D = P & (rand(1,nC)<pdeath) & ~cellsIsStem(indxF); %death at proliferation attempt
    M = ~P & (rand(1,nC)<pmig); %go when no grow

In the additional loop we perform update of the system.

   
    del = D | De; %cells to delete
    act = find((P | M) & ~del); %indices to the cells that will perform action
    for ii = act %only for those that will do anything
        ngh = S(:,indxF(ii)); %cells neighborhood
        ngh(ngh==0) = []; %erasing places that were previously occupied
        indO = find(~L(ngh),1,'first'); %selecting free spot
        if ~isempty(indO) %if there is still a free spot
            L(ngh(indO)) = true; %updating occupancy
            if P(ii) %proliferation
                cells = [cells uint32(ngh(indO))]; %adding new cell
                if Ps(ii) %symmetric division
                   cellsIsStem = [cellsIsStem true];
                   cellsPmax = [cellsPmax cellsPmax(indxF(ii))];  
                else
                   cellsIsStem = [cellsIsStem false];
                   cellsPmax = [cellsPmax cellsPmax(indxF(ii))-1];
                   if ~cellsIsStem(indxF(ii))
                    cellsPmax(indxF(ii)) = cellsPmax(indxF(ii))-1;
                   end
                end
            else %migration
                L(cells(indxF(ii))) = false; %freeing spot
                cells(indxF(ii)) = uint32(ngh(indO));
            end
        end
    end

At the end we need to erase cells that died.

    
    if ~isempty(del) %updating death
        L(cells(indxF(del))) = false;
        cells(indxF(del)) = [];
        cellsIsStem(indxF(del)) = [];
        cellsPmax(indxF(del)) = [];
    end 
end

This is it. Whole CA is coded and we are ready to run simulations!

In order to visualize simulations results we can implement short function.

    
function visualize( N, cells, cellsIsStem, cellsPmax, pmax )

    M = ones(N,N,3); %matrix for image
    color = hot(3*pmax); %colormap
    %drawing CCs
    M(cells(~cellsIsStem)) = color(cellsPmax(~cellsIsStem)+1,1);
    M(cells(~cellsIsStem)+N*N) = color(cellsPmax(~cellsIsStem)+1,2);
    M(cells(~cellsIsStem)+2*N*N) = color(cellsPmax(~cellsIsStem)+1,3);
    %drawing CSCs, we want to draw them as 3x3 points
    aux = int32([-N-1 -N -N+1 -1 1 N-1 N N+1])';
    CSCs = cells(cellsIsStem);
    plusSurr = bsxfun(@plus,CSCs,aux);
    M(plusSurr) = color(2*pmax,1);
    M(plusSurr+N*N) = color(2*pmax,2);
    M(plusSurr+2*N*N) = color(2*pmax,3);
   
    figure(1)
    imshow(M);
end

Below is the visualization of the tumor simulated using the above code. It is consisted of 369,504 cells in total (8563 CSCs). The whole simulation took about 12 minutes on my laptop, what taking into account the lattice size and number of cells is a reasonable time.

higherps

Loop-free search for proliferative cells in MATLAB

When thinking about implementation of agent-based model only few of us consider MATLAB as a coding environment, even though it has plenty of built-in visualization and analysis tools. I get that. It’s a high level language that is not very efficient with for loops, so a basic loop-based code will execute much faster when coded in C++ (and others). However, there is a very cool feature of MATLAB – vectorization, that allows to skip plenty of loops and significantly speed-up the computations. In this post I’ll show how to utilize this feature to make a loop free search for a cells that have a free spot in the neighborhood.

Our computational domain is represented by N x N Boolean matrix L (a true value indicates the lattice point is occupied). An additional integer vector cells is maintained to store indices for all viable cells present in the system (to avoid costly search for cells on the lattice). Our task is to write a code that returns indices to those cells that have at least one free spot in the 8 spot neighborhood.

A basic loop-based code in MATLAB that will do the task is the following:

cellsInd = false(size(cells));
    for j = 1:length(cells)
        loop = true; 
        for ii = -1:1:1
            if ~loop
                break;
            end
            for kk = -N:N:N
                if ~L(cells(j)+ii+kk)
                   loop = false; 
                   cellsInd(j) = true;
                   break;
                end
            end
        end
    end
    cellsInd = find(cellsInd);
end

The code is similar to the one that would be written in C++. For 2000×2000 lattice with 800,000 cells evaluation of 200 iterations of the above code take about 83 seconds. Not very impressive… Same task for a code written in C++ takes about 3 seconds (when using vectors from STL library). Can we do something about that?

As I wrote, vectorization is a very cool MATLAB feature. When using it, coding the whole task takes only one two lines:

aux = [-N-1; -N; -N+1; -1; 1; N-1; N; N+1]';
cellsInd = find(~all(L(bsxfun(@plus,cells,aux))));

Moreover, the evaluation time is reduced from about 83 seconds to about 9.5 seconds. Impressive speed-up and elegant code in two lines.

Still, both codes are outperformed by C++, but the difference when using vectorized code is bearable.