Cancer Stem Cell CA in Python

Python doesn’t seem to be the first programming language people go to when developing cellular automata models. However, given that Python is an object-orientated language that is easy to read and write, it might actually be ideal for such models, especially if you prefer to think from the perspective of the agent (if you’d rather model using matrices you can do that too by using Python’s NumPy package). As we’ll see, it can also be pretty fast. Below I’ll cover one way to implement the Cancer Stem cell CA described in the Cancer Stem Cell CA in Matlab post.

GETTING PYTHON

If you don’t have Python, the easiest way to get it and nearly all of its scientific packages is to download the Anaconda distribution from https://store.continuum.io/cshop/anaconda/. This installation and all modules will be kept in it’s own folder, so it won’t interfere with the Python that comes bundled in your OS. It also means that uninstalling is just a matter of dropping the folder in the trash. You may also want to download an IDE, a list of which can be found on the Anaconda website (http://docs.continuum.io/anaconda/ide_integration ;my personal favorite is PyCharm).

As far as versions are concerned, the “go-to” version is Python 2.7. However, 2.7 is no longer being developed so you will eventually have to migrate to Python 3. For a long time many avoided this upgrade because several of the most important packages hadn’t been ported over, but that has since changed and most of the critical packages like Numpy, SciPy, Matplotlib, Pandas, etc… are now available in Python 3. If you choose to go with Python 2, it’s important to note that when dividing you need to use a float in the denominator, as division by an integer returns a rounded down integer. This isn’t the case with Python 3, however.

OVERVIEW

Before we begin coding, I’d like to provide an overview of how the model will be implemented. Instead of using a matrix to track cells, we’ll create a dictionary of cells. The keys to this dictionary will be the cell’s position, while the value will be the cell itself. Each cell we create will know the positions of it’s neighbors, and the cell will use the dictionary of cells to look up its neighbors (a dictionary in Python is really just a hash table, so this look-up is fairly fast). A nice advantage of using a dictionary is that we can easily add and remove entries from it, giving us the ability to iterate only across cells, not empty positions in a matrix. This effectively allows the domain to grow and shrink according to how many cells are present. During each round of this model, we’ll shuffle the values in this dictionary and iterate through them, having each cell execute it’s actions.

One final note. I’ve taken screenshots of the code to make it easier to read, but the raw code can be found in this .doc file: CSC_CA_in_python. It’s not possible to upload .py files, so if you download the code, just change the extension from .doc to .py and then run the CA. Or, just copy and paste into your own file and run it.

GETTING THINGS READY

Let’s start coding by importing the packages we will be using.The first is NumPy, Python’s package for scientific computing (http://www.numpy.org), a package you’ll almost always need. Second, we’ll import the package we’ll use to visualize our results, matplotlib (http://matplotlib.org). Next, we’ll import Numba, a ‘just in time’ compiler than can dramatically speed up sections of code in which numerical calculations are called for (http://numba.pydata.org). Finally, importing the random module will allow us to randomly select cells.

imports

We’ll begin writing up our CA by defining a few methods that we’ll use to construct the world of our CA. First, lets construct that dictionary which will contain the list of positions in a central position’s Moore neighborhood. To do this, we can use both list comprehension and dictionary comprehension. Basically, these approaches allow us to build lists or dictionaries without creating an empty list and messy for loop to populate that list/dictionary. Not only is list comprehension more convenient and cleaner, but it is also faster than the alternative.

builders

SPEEDUP TRICKS

Here we’ll setup a few methods to speed up our CA. We’ll frequently request a random binomial number and shuffle cells, so we’ll call those methods once, here at the top, so they don’t have to be reevaluated each time we call them, something that would slow down our model. Next, we’ll create a few methods to speed up the generation of random binomial numbers (something we’ll do frequently) by creating methods to call binomial using numba’s just in time compiler (jit). To do this, all we have to do is create a wrapper method, and then add the @jit decorator. This small modification provides a dramatic speed-up; without it the CA and plotting takes 16 min, but adding the simple @jit decorator reduces the time it takes down to around 5 minutes! Who says Python is always slow 🙂

speedups

CANCER CELL CLASS

cancer_cell_class

OK, now that everything is set up, we’ll start creating our Cancer Cell class, which defines all of the cancer cell’s attributes and behaviors. First, we define the mandatory __init__ method, which defines the attributes each newly created cancer cell will have. In this case, each new cancer cell will be assigned a position (pos), the number of divisions it has remaining (divisions_remaining), and the list containing the positions in its neighborhood (neighbor_pos_list). This list is assigned simply by using the cell’s position to look up that list in the dictionary_of_neighbor_pos_lists that we created earlier.

Now that we’ve defined our cell’s attributes, we can start defining what exactly our cells will do. Because this is a fairly simple CA, we’ll just define all of the behaviors in one method (sort of…), which we’ll call act. First we’ll write up the bit of code for cell division. Recall that a cancer cell can only divide if that the cell is lucky and has an empty space in its neighborhood. In this case, if divide_q is 1, the cell is lucky and has a chance to divide. If the cell is indeed lucky, it then uses the locate_empty_neighbor_position method to find all of the free spaces in its neighborhood (the second requirement for division). Jumping to that locate_empty_neighbor_position method, the first thing we’ll do is create a list of positions that are NOT in the cell dictionary, that is, spaces which are currently unoccupied. If there are actually empty positions, a randomly selected empty position in the neighborhood is returned. Moving back to the act method, a new Cancer Cell is created, assigned that random position, made to look up its neighbor list in the dictionary_of_neighbor_pos_lists, and then added to the cell dictionary. Finally, the dividing cell decreases the number of divisions that it can undergo by one.

After a cell has tried to divide, it will determine if it needs to die. Recall that a cell will die for one of two reasons: 1) it dies because it has exceed its maximum proliferative potential, that is,  divisions_remaining <= 0; or 2) it dies spontaneously, that is, if die_q = 1 . If either of these conditions are true, the cell and it’s position are deleted from the cell dictionary, thus removing it from the CA and freeing up space.

CANCER STEM CELL CLASS

cancer_stem_cell_class

Now that our Cancer Cell class is created, we can start defining the behaviors of Cancer Stem Cells. We could simply clog up our code by adding a bunch of if/else statements to our Cancer Cell’s act method, but instead we’ll create a Cancer Stem Cell class, which will be a subclass of the Cancer Cell class. Classes in Python have inheritance, which means that our new Cancer Stem Cell class has all of attributes and methods of the Cancer Cell class, and so we don’t need to redefine all attributes or re-write the locate_empty_neighbor_position method. The only attribute we’ll redefine is the PLOT_ID, which we’ll use when visualizing the results of our CA. Next, we’ll redefine the act method. The process of division is the same as before, except that the stem cell can either divide symmetrically or asymmetrically. Which type of division occurs is determined by calling the divide_symmetrically_q method. If divide_symmetrically = 1, a new Cancer Stem Cell is created and added to the cell dictionary; if divide_symmetrically = 0, a normal Cancer Cell is created and added to the dictionary instead. Now we have our Cancer Cells and Cancer Stem Cells setup and ready for action!

RUN THE MODEL

run2_7

The last thing we have to do is setup the initial conditions of our CA, which we’ll do under the if __name__ == “__main__” line. First, we’ll define the constants at the top. Next, we’ll create the initial Cancer Stem Cell, which will be positioned in the middle of world. We do this by initializing this first stem cell, using the position in the center of the world and DICTIONARY_OF_NEIGHBOR_POS_LISTS, and then add it to the cell dictionary. Next, we’ll copy the cells in the cell dictionary to a list (line 189 for Python 2, or line 192 for Python 3). While copying the values to a list isn’t ideal, it does provide two advantages. First, we can shuffle this list, allowing us to move through our cells randomly. Second, this list is what allows us to change the cell dictionary on the fly, as the length of a dictionary cannot be changed while iterating through it. Finally, we loop through our list of randomly ordered cells, having each one conduct their actions by calling their act method, a process that is repeated until maximum number of reps has been met.

VISUALIZE RESULTS

visualization

After we’ve completed our simulation, we would like to visualize our results, so we create a visualization_matrix, which is just matrix of zeros. Because we don’t care about the order of our dictionary, and won’t be modifying it’s length, we just iterate through the cells in our cell dictionary. Each cell then adds it’s PLOT_ID number to the matrix, so that stem cells will be one color, and normal cells another color. Now we use matplotlib’s imshow and show methods to visualize the results:

CSC_CA

CONCLUSION

Using Python has its pros and cons. The pros are that it’s easy to read and write, reducing development time and making it easier to share code. The con is that, in its native form, Python is not the fastest. However, as we’ve seen, there are tools to increase performance. Numba is only one of those tools, but there are others, with the most popular probably being Numba Pro, CUDA, Cython, and PyPy. There are also ways to take advantage of multithreading and mulitprocessing to speed up your models even more. So, if its possible to use these tools, you can have both fast development and execution speed! I hope you found this post helpful, and that maybe you’ll even consider using Python for your next project.

Advertisement

Cancer stem cell driven tumor growth model in C++

Because of the feedback that I’ve received after publishing first three posts, I’ve decided to change a tool of interest from MATLAB to C++. Today, I’ll show how to quickly implement a cancer stem cell driven tumor growth model in C++. It is almost the same model as implemented in my previous post (I’ll explain why it is “almost” the same at the end of this post).

Quick guide to the model: A cell, either cancer stem cell (CSC) or non-stem cancer cell (CC), occupies a single grid point on a two-dimensional square lattice. CSCs have unlimited proliferation potential and at each division they produce either another CSC (symmetric division) or a CC (asymmetric division). CCs are eroding their proliferation potential at each division and die when its value reaches 0. Moreover, at each proliferation attempt, CCs may undergo spontaneous death and then be removed from the system.

First we start with including the necessary headers and setting the namespace. Apart from the standard functions and datatypes, we will use the vector datatype to store the cells and a shuffling function from algorithm library.

#include <vector>
#include <algorithm>
#include <stdlib.h>
#include <time.h>

using namespace std;

Now we can define a cell. It will be defined by three variables: index to the place on the lattice (integer), remaining proliferation capacity (char), and boolean variable defining stemness.

struct cell {//defining cell
    int place;
    char p;
    bool is_stem;
};

Next step is to define the lattice size, the lattice itself, vector that will contain all viable cells, and auxiliary variable defining the cells neighborhood.

static const int N = 2000; //lattice size
bool lattice[N*N] = {false}; //empty lattice
vector<cell> cells; //vector containing all cells present in the system

static const int indcNeigh[] = {-N-1, -N, -N+1, -1, 1, N-1, N, N+1};//neighborhood

The parameters of the model are defined as follows.

char pmax=10; //proliferation capacity
double pDiv=1./24.; //division probability
double alpha=0.05; //spontaneous death probability
double ps=0.05; //probability of symmetric division
double pmig=10./24.; //probability of migration

Having made all of those initial definitions we can finally start coding the main program.
Let us start with writing the function that will initialize the whole simulation, i.e. fill the lattice boundary and put the initial stem cell.

void initialize() {
    for (int i=0; i<N; i++) {lattice[i]=true;};//filling left
    for (int i=0; i<N*N; i=i+N) {lattice[i]=true;};//filling top
    for (int i=N-1; i<N*N; i=i+N) {lattice[i]=true;};//filling bottom
    for (int i=N*(N-1); i<N*N; i++) {lattice[i]=true;};//filling right
    
    lattice[N/2*N+N/2] = true; //initial cell in the middle
    cell initialCell = {N/2*N+N/2,pmax,true};//initial cell definition
    cells.push_back(initialCell);
}

As in the previous post, we set all the boundary values of lattice to true in order to make sure that we will not address any site out of the lattice. Typically one would use an if statement when addressing the lattice site to make sure that index to a site is within the domain. Here, because we have set the boundaries to true without adding those sites to cells vector, we don’t need to do that – boundary will be treated as an occupied site, but not as a viable cell.

The second auxiliary function that we will use returns the index to the randomly selected empty space around a given spot.

int returnEmptyPlace(int indx) {
    int neigh[8], nF = 0;
    for(int j=0;j<8;j++) {//searching through neighborhood
        if (!lattice[indx+indcNeigh[j]]) {//if free spot
            neigh[nF] = indx+indcNeigh[j]; //save the index
            nF++; //increase the number of found free spots
        }
    }
    if(nF) {//selecting free spot at random
        return neigh[rand() % nF];
    } else {//no free spot
        return 0;
    }
}

If there is no free spot the function returns 0.
Finally we can code the part in which all the magic happens – main simulation procedure. It is hard to dissect the whole procedures into parts, so I did my best to explain everything in the comments.

void simulate(int nSteps) {
    
    vector<cell> cellsTmp;
    int newSite;
    cell currCell, newCell;
    
    for (int i=0; i<nSteps; i++) {
       random_shuffle(cells.begin(), cells.end()); //shuffling cells
       while (!cells.empty()) {
           currCell=cells.back(); //pick the cell
           cells.pop_back();
           newSite = returnEmptyPlace(currCell.place);
           
           if (newSite) {//if there is a new spot
               newCell = currCell;
               newCell.place = newSite;
               if ((double)rand()/(double)RAND_MAX < pDiv) {
                   if (currCell.is_stem) {
                       lattice[newSite]=true;
                       cellsTmp.push_back(currCell);
                       if ((double)rand()/(double)RAND_MAX > ps) {//asymmetric division
                           newCell.is_stem = false;
                       }
                       cellsTmp.push_back(newCell);
                   } else if (currCell.p > 0 && (double)rand()/(double)RAND_MAX > alpha) {
                       currCell.p--;
                       newCell.p--;
                       lattice[newSite] = true;
                       cellsTmp.push_back(currCell);
                       cellsTmp.push_back(newCell);
                   } else {
                       lattice[currCell.place] = false;
                   }
               } else if ((double)rand()/(double)RAND_MAX < pmig) {
                   lattice[currCell.place] = false;
                   lattice[newSite] = true;
                   cellsTmp.push_back(newCell);
               } else {//doing nothing
                 cellsTmp.push_back(currCell);
               }
           } else {//no free spot
               cellsTmp.push_back(currCell);
           }
       }
       cells.swap(cellsTmp);
    }
}

Now we wrap everything in the main function, compile and run.

int main() {
    srand(time(NULL)); //initialize random number generator
    initialize(); //initialize CA
    simulate(24*30*6);
    return 0;
}

Everything in about 100 lines of the code.
What about the speed of the code? It took about 40 seconds to simulate the tumor presented below, which is consisted of about 460,000 cells (on 2000×2000 lattice).

Screen Shot 2015-06-06 at 12.24.53 AM

Why is it “almost” the same model as the one presented in previous post? The answer is in details. In the above C++ implementation lattice is updated after every single cell movement/death, i.e. because of the random shuffling a new free spot can be created for a cell that at the beginning of the iteration was quiescent (without a free spot), so it can successfully proliferate in the same iteration. In the MATLAB implementation that kind of behavior was avoided.

Cancer stem cell driven tumor growth model in less than 70 lines of code

Today I’ve decided to show how to efficiently code a cancer stem cell driven tumor growth model in less then 70 lines of code in MATLAB.

Quick guide to the model: A cell, either cancer stem cell (CSC) or non-stem cancer cell (CC), occupies a single grid point on a two-dimensional square lattice. CSCs have unlimited proliferation potential and at each division they produce either another CSC (symmetric division) or a CC (asymmetric division). CCs are eroding their proliferation potential at each division and die when its value reaches 0. Moreover, at each proliferation attempt, CCs may undergo spontaneous death and then be removed from the system.

We start with defining initial settings of a domain and simulation timespan.

N = 1000; %square domain dimension
nSteps = 6*30*24; %number of simulation steps

Next we define the values of all parameters used in the simulation.

pprol = 1/24; %probability of proliferation
pmig = 10/24; %probability of migrating
pdeath = 1/100; %probability of death
pmax = 10; %initial proliferation capacity of CC
ps = 3/10; %probability of symmetric division

Now we can intialize domain and place initial CSC in its center. Our computational domain is represented by N x N Boolean matrix (a true value indicates the lattice point is occupied). An additional integer vector cells is maintained to store indices for all viable cells present in the system (to avoid costly search for cells on the lattice).

L = false(N,N);
L([1:N 1:N:N*N N:N:N*N N*(N-1):N*N]) = true; %boundary
L(N*round(N/2)+round(N/2)) = true;
cells = int32(N*round(N/2)+round(N/2));
cellsIsStem = true;
cellsPmax = uint8(pmax);

To reduce the amount of used memory we utilized int32 and unit8 types as they occupy less memory than double (double is default type in MATLAB). We also set all the boundary values of L to true in order to make sure that we will not address any site out of the lattice. Typically one would use an if statement when addressing the lattice site to make sure that index to a site is within the domain. Here, because we have set the boundaries to true without adding those sites to cells vector, we don’t need to do that – boundary will be treated as an occupied site, but not as a viable cell.

We can now define auxiliary variables that will be utilized further.

aux = int32([-N-1 -N -N+1 -1 1 N-1 N N+1])'; %indices to heighborhood
Pms = perms(uint8(1:8))'; %permutations
nP = size(Pms,2); %number of permutations

Variable aux simply defines cell’s neighborhood and Pms is a variable in which we store all possible permutations of variable aux.

Now we can start the main loop of the program and randomly shuffle cells at its beginning.

for i = 1:nSteps

sh = randperm(length(cells));
cells = cells(sh);
cellsIsStem = cellsIsStem(sh);
cellsPmax = cellsPmax(sh);

Now few lines in which we first create indices to neighborhoods of all of the cells already in random order (variable S), then we flag by 0 all of the spots that are occupied and finally select only indices to those cells that have at least one free spot in the neighborhood (variable indxF).

S = bsxfun(@plus,cells,aux(Pms(:,randi(nP,1,length(cells)))));
S(L(S)) = 0; %setting occupied spots to 0
indxF = find(any(S)); %selecting cells with at least one free spot
nC = length(indxF); %number of cells with free spot

Now we can decide about the faith of the cells that can perform action (are not completely surrounded by other cells).

   
    P = rand(1,nC)<pprol; %proliferation
    Ps = P & rand(1,nC)<ps & cellsIsStem(indxF); %symmetric division
    De = P & (cellsPmax(indxF) == 0); %proliferation capacity exhaution
    D = P & (rand(1,nC)<pdeath) & ~cellsIsStem(indxF); %death at proliferation attempt
    M = ~P & (rand(1,nC)<pmig); %go when no grow

In the additional loop we perform update of the system.

   
    del = D | De; %cells to delete
    act = find((P | M) & ~del); %indices to the cells that will perform action
    for ii = act %only for those that will do anything
        ngh = S(:,indxF(ii)); %cells neighborhood
        ngh(ngh==0) = []; %erasing places that were previously occupied
        indO = find(~L(ngh),1,'first'); %selecting free spot
        if ~isempty(indO) %if there is still a free spot
            L(ngh(indO)) = true; %updating occupancy
            if P(ii) %proliferation
                cells = [cells uint32(ngh(indO))]; %adding new cell
                if Ps(ii) %symmetric division
                   cellsIsStem = [cellsIsStem true];
                   cellsPmax = [cellsPmax cellsPmax(indxF(ii))];  
                else
                   cellsIsStem = [cellsIsStem false];
                   cellsPmax = [cellsPmax cellsPmax(indxF(ii))-1];
                   if ~cellsIsStem(indxF(ii))
                    cellsPmax(indxF(ii)) = cellsPmax(indxF(ii))-1;
                   end
                end
            else %migration
                L(cells(indxF(ii))) = false; %freeing spot
                cells(indxF(ii)) = uint32(ngh(indO));
            end
        end
    end

At the end we need to erase cells that died.

    
    if ~isempty(del) %updating death
        L(cells(indxF(del))) = false;
        cells(indxF(del)) = [];
        cellsIsStem(indxF(del)) = [];
        cellsPmax(indxF(del)) = [];
    end 
end

This is it. Whole CA is coded and we are ready to run simulations!

In order to visualize simulations results we can implement short function.

    
function visualize( N, cells, cellsIsStem, cellsPmax, pmax )

    M = ones(N,N,3); %matrix for image
    color = hot(3*pmax); %colormap
    %drawing CCs
    M(cells(~cellsIsStem)) = color(cellsPmax(~cellsIsStem)+1,1);
    M(cells(~cellsIsStem)+N*N) = color(cellsPmax(~cellsIsStem)+1,2);
    M(cells(~cellsIsStem)+2*N*N) = color(cellsPmax(~cellsIsStem)+1,3);
    %drawing CSCs, we want to draw them as 3x3 points
    aux = int32([-N-1 -N -N+1 -1 1 N-1 N N+1])';
    CSCs = cells(cellsIsStem);
    plusSurr = bsxfun(@plus,CSCs,aux);
    M(plusSurr) = color(2*pmax,1);
    M(plusSurr+N*N) = color(2*pmax,2);
    M(plusSurr+2*N*N) = color(2*pmax,3);
   
    figure(1)
    imshow(M);
end

Below is the visualization of the tumor simulated using the above code. It is consisted of 369,504 cells in total (8563 CSCs). The whole simulation took about 12 minutes on my laptop, what taking into account the lattice size and number of cells is a reasonable time.

higherps