Some ideas about how to efficently store simulation data

After my last post about visualization of 3D simulated tumors @jc_atlantis made an excellent point that another reason why 3D agent-based simulations are not performed frequently is the large amount of generated data. Of course, one approach to overcome that problem would be to store only information that is really essential and not the 1-1 simulation snapshot. However, in some cases we can’t know in advance what information will be important and thus, we need better ways to save our simulation outputs.

In this post I will show few tricks (in C++) how to efficiently store output of 2D agent-based model presented in one of the previous posts. As a result we will reduce the size of generated simulation output from about 1Gb to reasonable 35Mb. Of course, presented ideas can be utilized in 3D agent-based model and in other programing languages like Python, Java or MATLAB.

First of all let us define the task: we want to store information about spatial distribution of 1,000,000 cells on 2D lattice. Each cell is described by two variables: remaining proliferation potential (p, unsigned char variable) and if it is cancer stem cell (is_stem, boolean variable). All of the cells to be saved are stored in the STL vector. The above assumptions are defined by the following piece of the C++ code:

struct cell {//defining a cell
    unsigned int place; //varible keeping linear index to cell's place on
    unsigned char p; //remaining proliferation potential
    bool is_stem; //is the cell a cancer stem cell?

    cell(unsigned int a=0, unsigned char b=0, bool c=0):place(a), p(b), is_stem(c){}; //constructor
};

static const int N = 2000; //lattice size
vector<cell> cells; //vector containing all cells present in the system

Each cell is occupying  (usigned int = 4 bytes) + (unsigned char = 1 byte) + (bool  = 1 byte) = 6 bytes of memory. Hence, as we want to save 1 mln cells,  our output file shouldn’t exceed 6 megabytes (Mb) of memory.

1. Outputting to text file

Information about cells is typically written to a text file in which each line describes one cell and variables values are separated by special character (standard .csv format). That format allows simulation output to be easily loaded to other programs for further analysis. Moreover, the code generating and loading the output files is very simple:

void save_cells_ASCII(string fileName) {
    ofstream file_op(fileName);

    for(int i=0; i<cells.size();i++)
        file_op << cells.at(i).place << " " << (int)cells.at(i).p << " " << cells.at(i).is_stem <<  endl;          file_op.close(); } void load_cells_ASCII(string fileName) {     ifstream myfile (fileName);     string line;          unsigned int place;     int p;     bool is_stem;          while ( getline (myfile,line) ) {             istringstream iss(line);             iss >> place >> p >> is_stem;
            cells.push_back(cell(place, (unsigned char)p, is_stem));
    }

    myfile.close();
}

Code above generated a file of 11Mb of size – way more to be expected from the cell definition. Why is it so? Explanation is quite straightforward. In the generated text file each character occupies 1 byte of disk space – it is OK for the proliferation capacity (p) and boolean variable (is_stem), as they occupy the same amount of space in the memory. However, the cell position 2234521, which when stored as unsigned int occupies 4 bytes of space in the memory, occupies 7 bytes of space on the disk. In addition, each space between the written value in the text file occupies additional byte of space. All of the above together generates the wasteful 11Mb file on the disk.

2. Binary files

In order to to have a file on disk occupying exactly the same amount of space as the variables in the memory, we need to use binary files. It is a little bit more complicated to operate on binary files, but the idea is simple: we just write each variable byte by byte (1 byte = 8 bits). First, we write functions that 1) prepare array of pointers to char (byte) variables that will be saved; 2) read simulation snapshot from array of pointers to char variables.

void save_cells_binary(char** data, unsigned long* sizeData) {
    unsigned long Ncells = cells.size(); //number of cells
    int sizeOfCell = sizeof(int)+sizeof(char)+sizeof(bool);
    *sizeData = Ncells*sizeOfCell;
    *data = (char*)malloc( *sizeData );

    memcpy( *data, (char*)&Ncells, sizeof(unsigned long));
    memcpy((char*)&Ncells, *data, sizeof(unsigned long));

    for(int i=0; i<Ncells; i++) {
        memcpy(*data+i*sizeOfCell+sizeof(unsigned long), (char*)&cells.at(i).place, sizeof(int));
        memcpy(*data+i*sizeOfCell+sizeof(int)+sizeof(unsigned long), (char*)&cells.at(i).p, sizeof(char));
        memcpy(*data+i*sizeOfCell+sizeof(int)+sizeof(char)+sizeof(unsigned long), (char*)&cells.at(i).is_stem, sizeof(bool));
    }
}

void load_cells_binary(char* data) {
    unsigned long Ncells;
    memcpy((char*)&Ncells, data, sizeof(unsigned long));
    int sizeOfCell = sizeof(int)+sizeof(char)+sizeof(bool);

    unsigned int place;
    unsigned char p;
    bool is_stem;

    for(int i=0; i<Ncells; i++) {         memcpy((char*)&place, data+sizeof(unsigned long) + i*sizeOfCell, sizeof(unsigned long));         memcpy((char*)&p, data+sizeof(unsigned long) + i*sizeOfCell+sizeof(int), sizeof(char));         memcpy((char*)&is_stem, data+sizeof(unsigned long) + i*sizeOfCell+sizeof(int)+sizeof(char), sizeof(bool));                  cells.push_back(cell(place, p, is_stem));     } } 

Now we need only functions that will operate on the binary files and 1) read the char array from the binary file; 2) save char array to binary file.

 void readWholeBinary(string fileName, char** data, unsigned long* sizeData) {     ifstream file_op(fileName, ios::binary | ios::ate);     *sizeData = file_op.tellg();     file_op.seekg(0, ios::beg);     *data = (char*)malloc( *sizeData );     file_op.read(*data, *sizeData);     file_op.close(); } void saveWholeBinary(string fileName, char* data, unsigned long sizeData, unsigned long originalSize) {     ofstream file_op(fileName, ios::binary | ios::out);     if (originalSize>sizeData) //there was compression, we ass original file size to the beginning of the file
        file_op.write((char*)&originalSize, sizeof(unsigned long));
    file_op.write(data, sizeData);
    file_op.close();
}

Executing the above code generates 5.51 Mb file on the disk (we had about 950,000 cells to save). This is about half of the space that is occupied by the text file!

3. Using zip compression

All of us probably used zip compression to store or send files through e-mail. Why won’t we write files generated by our simulation already compressed with zip? There are quite a few C++ zip libraries, but I’ve chosen zlib library as it is quite easy to use (an excellent tutorial can be found here). It operates on the char arrays that we already generate on the way to save binary files, so compressing the file before writing takes only few lines of the code.

void compressFunction (char** dataCompressed, char* dataOriginal, unsigned long* sizeDataCompressed, unsigned long sizeDataOriginal) {
    *sizeDataCompressed  = ((sizeDataOriginal) * 1.1) + 12;
    *dataCompressed = (char*)malloc(*sizeDataCompressed);
    compress((Bytef*)(*dataCompressed),sizeDataCompressed,(Bytef*)dataOriginal,sizeDataOriginal );// size of source data in bytes
}

void uncompressFunction (char** dataUncompressed, char* dataCompressed, unsigned long sizeDataCompressed, unsigned long* sizeDataUncompressed) {
    memcpy((char*)sizeDataUncompressed, dataCompressed, sizeof(unsigned long));
    *dataUncompressed = (char*)malloc( *sizeDataUncompressed );
    uncompress((Bytef*)(*dataUncompressed), sizeDataUncompressed, (Bytef*)(dataCompressed+sizeof(unsigned long)), sizeDataCompressed-sizeof(unsigned long));
}

The binary file after compression takes 3.57 Mb of space on disk – way better.

4. Collapsing the variables

We can easily save a byte of memory per cell if we notice that the proliferation capacity variable in our simulations has the value smaller than 50. Unsigned char, however, can store the value up to 255. Thus, we can add the value 100 to the remaining proliferation capacity if the cell is cancer stem cell and forget about saving boolean variable.

 memcpy(*data+i*sizeOfCell+sizeof(unsigned long), (char*)&cells.at(i).place, sizeof(int));
        val = cells.at(i).p+(unsigned char)cells.at(i).is_stem*100;
        memcpy(*data+i*sizeOfCell+sizeof(int)+sizeof(unsigned long), (char*)&val, sizeof(char));

5. Using information about the space to reduce the file size

The most amount of disk space is used by the information about the cells location on the lattice (unsigned int = 4 bytes). Can we skip writing that information for most of the cells? Yes, we can. In the code below we just write the cells row by row, and store the information about the location of the first cell in the lattice row. If there is an empty space between the cells in the row we put the value 255 and it the remainder of the row is empty we put value 254.

void save_cells_usingSpace(char** data, unsigned long* sizeData) {

    unsigned long Ncells = cells.size(); //number of cells
    int sizeOfCell = sizeof(int)+sizeof(char)+sizeof(bool);
    *sizeData = 0; //we will update that value
    *data = (char*)malloc( Ncells*sizeOfCell ); //we alloc more than we need

    unsigned char lat[N][N] = {0};
    int x, y;
    for (int i = 0; i<cells.size(); i++) {
        x = cells.at(i).place % N;
        y = floor((double)cells.at(i).place/(double)N);
        lat[x][y] = (unsigned char)cells.at(i).is_stem*(unsigned char)100 + (unsigned char)cells.at(i).p + (unsigned char)1;
    }

    memcpy(*data, (char*)&N, sizeof(int));
    *sizeData += sizeof(int);

    //255 means empty space, 254 means end of line
    unsigned char es = 255, el = 254;

    for (unsigned short int i = 0; i<N; i++) {

        int sep = 0;
        bool st = false;

        for (unsigned short int j = 0; j<N; j++) {             if(st == false && lat[i][j]>0) {
                st = true;
                memcpy(*data+*sizeData,(char*)&i, sizeof(unsigned short int));
                memcpy(*data+*sizeData+sizeof(unsigned short int),(char*)&j, sizeof(unsigned short int));
                memcpy(*data+*sizeData+2*sizeof(unsigned short int),(char*)&lat[i][j], sizeof(unsigned char));
                *sizeData += 2*sizeof(unsigned short int)+sizeof(unsigned char);
            } else if (st == true && lat[i][j] == 0 ) { //we have empty space
                sep++;
            } else if (st==true && lat[i][j]>0){
                for (int k = 0; k<sep; k++) {
                     memcpy(*data+*sizeData,(char*)&es, sizeof(unsigned char));
                     *sizeData += sizeof(unsigned char);
                }

                sep = 0;
                memcpy(*data+*sizeData,(char*)&lat[i][j], sizeof(unsigned char));
                *sizeData += sizeof(unsigned char);
            }
        }

        if (st == true) {
            memcpy(*data+*sizeData,(char*)&el, sizeof(unsigned char));
            *sizeData += sizeof(unsigned char);
        }
    }
}

void load_cells_usingSpace(char* data, unsigned long dataSize) {
    unsigned short int x, y;
    unsigned char read;

    int Nm=0;
    memcpy((char*)&Nm, data, sizeof(int));

    unsigned long i = sizeof(int);

    while(i < dataSize) {

    memcpy((char*)&x,data+i, sizeof(unsigned short int));
    memcpy((char*)&y,data+i+sizeof(unsigned short int), sizeof(unsigned short int));
    i += 2*sizeof(unsigned short int);

        unsigned int add = 0;

        while (true) {
            memcpy((char*)&read,data+i, sizeof(unsigned char));
            i+=sizeof(unsigned char);
            if (read == 254) { //end of line
                break;
            } else if (read < 255){//actual cell                 cells.push_back(cell((unsigned int)x + (unsigned int)y*Nm + add*Nm,(read-1) % 100,(read - 1)>=100));
            }
            add++;
        }
    }

}

The above approach when combined with zip compression generated file that takes only 0.47 Mb of disk space. That is less then a byte per cell!

6. Overall comparison

Fig 1. shows the amount of used space by a file written using different approaches presented above. As it can be seen the last approach uses about 25x less disk space than the standard text files approach. When saving simulation snapshots every 10 days the total amount of generated data was reduced from 1 Gb when using standard text file approach to about 35 Mb!
fileSizeFigure 1. Comparison of the amount of used disk space.

What is important binary files approach is way faster that text file approach, even when using zip compression algorithm, see Fig. 2.

speed

Figure 2. File read/write speed for different save/read functions.

Of course, the trade-off is that it won’t be that easy to read our simulation files to other programs. However, we can always write wrapper functions that will allow other programs to interpret our files (see e.g. this post).

Advertisement

Some tips&tricks for 3D tumor visualization

There are several reasons why we rarely see 3D agent based models of tumor growth, with I think the two most important ones being: 1) 3D simulations are way more computationally expensive than 2D simulations; 2) it’s harder to visualize results of 3D simulations to present all necessary information. The other frequently used argument is that there is no need for 3D, because the crucial phenomena can be observed as well in 2D. However, when one considers e.g. stochastic models there might be a huge difference in the system behavior between 2D and 3D case (see very nice post Why is that a 2D random walk is recurrent, while a 3D random walk is transient?).

Possible improvements in the computational speed are highly dependent on the given model setting and typically require sophisticated techniques. Problem with nice visualizations, however, just requires proper tools. Today, I will show my attempts to make a good-looking 3D tumor visualizations, which I have taken over the last several years.

Several years ago when working with @heikman on the project about the impact of cellular senescence  I was using The Persistence of Vision Raytracer (POV-Ray) in order to obtain high quality 3D tumor visualization. After simulating the tumor up to a given cell number I was parsing the information about cells locations and their basic properties into a POV-Ray specific format. In my approach, each cell was separate actor in the scene with possible different surface/texture/color properties. As it can be expected renderings with raytracing software are very good (see Fig.1 below), but the whole procedure to prepare input files to POV-Ray, to set-up properly camera and light was really time-consuming. In addition, rendering takes a long time, so there is no chance for “live” rendering during the simulation, which would allow to observe simulation behavior.

Figure 1. Exemplary 3D tumor renderings obtained using POV-Ray software. Bottom-left picture shows tumor consisted of 9.4 million of cells.

I really wanted to have a tool that would allow me to seamlessly simulate and visualize 3D tumor at the same time. I quickly realized that rendering each cell as a separate object is not a good approach, when you simulate more than 1 million cells. Hence, I started looking around for an algorithm that would allow to extract tumor surface from the cloud of cells. I quickly found well known Marching Cubes algorithm  that does the trick very well.  I’ve implemented the algorithm and visualized resulting surface using Visualization Library in C++. I was quite happy with the results (see Fig. 2 below), however, at some point code became so complicated and so model specific that I assumed that it doesn’t have any future.

Untitled-2

Figure 2. “Live” 3D tumor visualization during the simulation. Code developed in C++ using Visualization Library.

Quite recently I started using VTK library from within C++ (you can also use it form within Python) and I quickly realized that it might be the ultimate tool for 3D visualization. Now I will show how easy it is to produce great visualizations that can be used in “live” renderings during 3D simulations using VTK library.

In the following example I will use glyph mapper that basically takes my cloud of points (cells), creates cubes around them, and extracts the surface. First, let us include all necessary headers and define necessary variables.

#include "vtkCubeSource.h"
#include "vtkPolyDataMapper.h"
#include "vtkRenderer.h"
#include "vtkRenderWindow.h"
#include "vtkRenderWindowInteractor.h"
#include "vtkInteractorStyleTrackballCamera.h"
#include "vtkPolyData.h"
#include "vtkGlyph3DMapper.h"

vtkSmartPointer<vtkRenderer> renderer;
vtkSmartPointer<vtkRenderWindow> renderWindow;
vtkSmartPointer<vtkRenderWindowInteractor> renderWindowInteractor;
vtkSmartPointer<vtkPoints> points;

Now we can write the function that will initialize the renderer and rendering window together with its interactor object.

void initializeRendering() {
    renderer = vtkSmartPointer<vtkRenderer>::New(); //creating new renderer
    renderer->SetBackground(1., 1., 1.); //setting background color to white

    renderWindow = vtkSmartPointer<vtkRenderWindow>::New(); //creating renderer window
    renderWindow->SetSize(1000, 1000); //setting size of the window
    renderWindow->AddRenderer(renderer); //adding previously created renderer

    renderWindowInteractor = vtkSmartPointer<vtkRenderWindowInteractor>::New(); //creating render
    renderWindowInteractor->SetRenderWindow(renderWindow); //adding previously created renderer window

    points = vtkSmartPointer<vtkPoints>::New();         //vector with points
}

I assume that we have our cell objects are stored in STL vector structure and that each cell contains information about its (x,y,z) position in 3D space. Having that we can write a really short function that adds our cells to the scene.

void addCells(vector<Cell>::iterator begin, vector<Cell>::iterator end) {
    vector<Cell>::iterator cell = begin;
    for (cell; cell<end; cell++)
points->InsertNextPoint(cell->x,cell->y,cell->z);
}

Finally, we can write the rendering function that is invoked in the main part of the code.

void visualizeLesion(vector<Cell>::iterator begin, vector<Cell>::iterator end) {

    initializeRendering();

    addCells(begin, end);

    // Combine into a polydata
    vtkSmartPointer<vtkPolyData> polydata = vtkSmartPointer<vtkPolyData>::New();
    polydata->SetPoints(points);

    vtkSmartPointer<vtkCubeSource> cubeSource = vtkSmartPointer<vtkCubeSource>::New();//you can use sphere if you want

    vtkSmartPointer<vtkGlyph3D> glyph3D = vtkSmartPointer<vtkGlyph3D>::New();
    glyph3D->SetSourceConnection(cubeSource->GetOutputPort());
    glyph3D->ScalingOff();
    glyph3D->SetInputData(polydata);
    glyph3D->Update();

    // Create a mapper and actor
    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
    mapper->SetInputConnection(glyph3D->GetOutputPort());

    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
    actor->SetMapper(mapper);

    // Add the actor to the scene
    renderer->AddActor(actor);

    // Render
    renderWindow->Render();

    vtkSmartPointer<vtkInteractorStyleTrackballCamera> style = vtkSmartPointer<vtkInteractorStyleTrackballCamera>::New(); //like paraview

    renderWindowInteractor->SetInteractorStyle( style );

    renderWindowInteractor->Start();
}

Snapshot of resulting rendering window is presented in Fig. 3 – and yes, I agree, it’s not very impressive. However, we can easily tweak it and make it significantly better.

plain3D

Figure 3. Exemplary 3D tumor visualized using 3D glyph mapper in VTK.

First of all to have better a better perspective we can slice the tumor in half and take resulting parts slightly apart. This can be achieved by modification of the addCells function (note that in my setting center of the tumor is at (x,y,z) = (250,250,250)).

void addCells(vector<Cell>::iterator begin, vector<Cell>::iterator end) {
    vector<Cell>::iterator cell = begin;
    for (cell; cell<end; cell++) {         if (cell->z < 250) {             points->InsertNextPoint(cell->x,cell->y,cell->z);
        } else {
            points->InsertNextPoint(cell->x,cell->y,cell->z+50);
        }
    }
}

Fig. 4 shows the snapshot of the rendering window after slicing the tumor – much better, but there is still place for easy improvement. Let’s add some color!

sliced
Figure 4. Exemplary 3D tumor visualized using 3D glyph mapper in VTK.

I assume that each cell has also information about the number of free spots in its direct neighborhood. We will color the tumor based on that information.

First, we need to add additional headers and variables.

#include "vtkCubeSource.h"
#include "vtkPolyDataMapper.h"
#include "vtkRenderer.h"
#include "vtkRenderWindow.h"
#include "vtkRenderWindowInteractor.h"
#include "vtkInteractorStyleTrackballCamera.h"
#include "vtkPolyData.h"
#include "vtkGlyph3DMapper.h"
#include "vtkPointData.h"
#include "vtkFloatArray.h"
#include "vtkLookupTable.h"

vtkSmartPointer<vtkRenderer> renderer;
vtkSmartPointer<vtkRenderWindow> renderWindow;
vtkSmartPointer<vtkRenderWindowInteractor> renderWindowInteractor;

vtkSmartPointer<vtkPoints> points;

vtkSmartPointer<vtkFloatArray> scalar;

Then we need to modify the function that initializes rendering

void initializeRendering() {
    renderer = vtkSmartPointer<vtkRenderer>::New(); //creating new renderer
    renderer->SetBackground(1., 1., 1.); //setting background color to white

    renderWindow = vtkSmartPointer<vtkRenderWindow>::New(); //creating renderer window
    renderWindow->SetSize(1000, 1000); //setting size of the window
    renderWindow->AddRenderer(renderer); //adding previously created renderer

    renderWindowInteractor = vtkSmartPointer<vtkRenderWindowInteractor>::New(); //creating render
    renderWindowInteractor->SetRenderWindow(renderWindow); //adding previously created renderer window

    points = vtkSmartPointer<vtkPoints>::New();         //vector with points
    scalar = vtkSmartPointer<vtkFloatArray>::New(); //vector with scalar values of the points
    scalar->SetNumberOfComponents(1); //one value per point
}

and addCells function

void addCells(vector<Cell>::iterator begin, vector<Cell>::iterator end) {
    vector<Cell>::iterator cell = begin;
    for (cell; cell<end; cell++) {         if (cell->z < 250) {             points->InsertNextPoint(cell->x,cell->y,cell->z);
        } else {
            points->InsertNextPoint(cell->x,cell->y,cell->z+50);
        }

        scalar->InsertNextValue((double)cell->nSpots/26.);
    }
}

Finally, we can modify the visualization function.

void visualizeLesion(vector<Cell>::iterator begin, vector<Cell>::iterator end) {

    initializeRendering();

    addCells(begin, end);

    //creating new color table
    vtkSmartPointer<vtkLookupTable> colorLookupTable = vtkSmartPointer<vtkLookupTable>::New();
    colorLookupTable->SetNumberOfColors(256);
    colorLookupTable->SetHueRange( 0.667, 0.0);
    colorLookupTable->Build();

    // Combine into a polydata
    vtkSmartPointer<vtkPolyData> polydata = vtkSmartPointer<vtkPolyData>::New();
    polydata->SetPoints(points);
    polydata->GetPointData()->SetScalars(scalar);

    vtkSmartPointer<vtkCubeSource> cubeSource = vtkSmartPointer<vtkCubeSource>::New();//you can use sphere if you want

    vtkSmartPointer<vtkGlyph3D> glyph3D = vtkSmartPointer<vtkGlyph3D>::New();
    glyph3D->SetColorModeToColorByScalar();
    glyph3D->SetSourceConnection(cubeSource->GetOutputPort());
    glyph3D->ScalingOff();
    glyph3D->SetInputData(polydata);
    glyph3D->Update();

    // Create a mapper and actor
    vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New();
    mapper->SetInputConnection(glyph3D->GetOutputPort());
    mapper->SetLookupTable(colorLookupTable);

    vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New();
    actor->SetMapper(mapper);

    // Add the actor to the scene
    renderer->AddActor(actor);

    // Render
    renderWindow->Render();

    vtkSmartPointer<vtkInteractorStyleTrackballCamera> style = vtkSmartPointer<vtkInteractorStyleTrackballCamera>::New(); //like paraview

    renderWindowInteractor->SetInteractorStyle( style );

    renderWindowInteractor->Start();
}

Snapshot of sliced and colored tumor is presented in Fig. 5 – fast, good-looking and easy.
slicedColor
Figure 5. Exemplary 3D tumor visualized using 3D glyph mapper in VTK. Color represents the number of free spots in the direct neighborhood of each cell (jet colormap).

Avoiding boundary effects – dynamically expanding lattice for ABM

In a typical setting, agent-based model (ABM) simulations take place on a domain of fixed size, e.g. square lattice of size 500 x 500. Depending on the model, this can introduce so-called boundary effects, i.e. model predictions that are in part caused by the limited amount of space. Some time ago me and @theheikman published a paper in which we investigated the impact of evolution of cancer stem cells on the rate of tumor growth (link to the paper here). Because of the ABM formulation, we had to implement domain that can freely expand on demand. Otherwise, the tumor evolution would stop once the cells fill all the space. Of course, we could set the initial domain size to be so big that the space wouldn’t be filled in the simulated timeframe. However, it is hard to know a priori what domain size will be sufficient. In this post I will show how to implement in C++ an ABM lattice that dynamically expands if one of the cells touches its boundary. We will be working on pointers and memory allocating/freeing routines.

Our lattice will be a boolean array in which value of true will indicate that the spot is occupied by the cell. First, we need to create lattice of initial size NxN.

lattice=new bool *[N];
for (int i=0; i&lt;N; i++) {
lattice[i] = new bool [N] ();
fill_n(lattice[i], N, false);
}

An element of the lattice can be easily accessed using double indexing, i.e. lattice[i][j] = true.

Now we need to have a procedure that will expand our lattice, by fixed amount of rows (gN) from top, bottom and fixed amount of columns (also gN) to right and left.

void expandLattice() {
bool **aux;
aux=new bool *[N+2*gN];

for (int i=0; i&lt;gN; i++) { //adding empty columns to the left
aux[i] = new bool [N+2*gN] ();
fill_n(aux[i], N+2*gN, false);
}

for (int i=N+gN; i&lt;N+2*gN; i++) { //adding empty columns to the right
aux[i] = new bool [N+2*gN] ();
fill_n(aux[i], N+2*gN, false);
}

//copying the interior columns
for (int i=gN; i&lt;N+gN; i++) {
bool *aux2;
aux2 = new bool [N+2*gN] ();
fill_n(aux2, N+2*gN, false); //filling with false values
memcpy(aux2+gN, lattice[i-gN],N*sizeof(bool)); //copying existing values
free(lattice[i-gN]);
aux[i] = aux2;
}

free(lattice);
lattice=aux;
N += 2*gN;
}

Very important part of the above procedure is invoking free() function in order to deallocate the memory previously occupied by the lattice.

And that is about it: if an event of touching the boundary is detected, we just invoke the expandLattice() function. We also need to remember to free the allocated memory at the very end of the simulation, by putting

for (int i=0; i&lt;N; i++)
free(lattice[i]);

free(lattice);

at the end of main() function.

In CAexpandP file you can find the code for the cancer stem cell driven model of tumor growth considered in the previous posts that uses dynamically expanding lattice (change extension from .doc to .cpp). Plot below nicely shows how the domain size grows together with the tumor when using that code with initial N = 100 and gN = 100 (plot shows the average of 20 simulations).

results

 

Java, C++, MATLAB, Julia or Python for cellular automatons? Speed comparison.

Thanks to @spuri4096 and @chandlergatenbee, the basic cancer stem cell driven model of tumor growth that I implemented in the very first Compute Cancer blog is now programmed in 4 languages:

  1. C++ (post with the code here)
  2. MATLAB (post with the code here)
  3. Java (post with the code here)
  4. Python (post with the code here)
  5. Octave – the same code as for MATLAB. Works without any modifications.
  6. Julia (post with the code here)

Isn’t that the perfect opportunity to see which one is the fastest?

First of all, codes are not completely identical. There are small differences between C++ and MATLAB codes in how the death of a cell is updated, but that shouldn’t affect the comparison too much. The code in Java is much more sophisticated – it uses coded lattice concept and dynamically expanding lattice. Python code (as much as I can see…) does pretty much the same job as the C++ code, so that is the fairest comparison out of all.

I will compare the time needed to simulate cancer from a single cancer stem cell to 100, 250 and 500 thousand of cells. The domain size is set to 1000×1000 grid points (not for Java code, in which lattice expands on demand), so 1 million is the maximal cell number. Of course all simulation parameters are also the same for all codes: probability of symmetric division (ps) = 0.3; proliferation probability (pdiv) = 1/24; migration probability = 15/24; and probability of spontaneous death (alpha) = 0.05. Because of the stochastic nature of the model I will run 20 simulations for each code and report the average time (+- standard deviation).

The results are summarized on the plot below.

speed2

As probably everyone expected C++ outcompeted other codes.
Java also did great.
What might surprise some people, MATLAB didn’t perform that bad (it doesn’t suck that bad as people usually think).
The speed of Octave was a surprise for me – I had patience only to run simulations for 100K cell limit…

Of course in each case there is definitely a room for improvement using even more sophisticated language specific programming tricks.

However, from my perspective – if you don’t want to spend too much time on learning programing language, MATLAB (Octave) might be not such a bad idea.

Quick implementation of hexagonal lattice for ABM

In the agent based modeling we are typically more interested in the rules governing the cell fate rather than the basic setting of the lattice (if we don’t look at the off lattice model). However, the particular setting of the computational domain can have an effect on the model dynamics. In 2D we typically consider the following lattices and neighborhoods:

picture1

with the rectangular grid with Moore neighborhood being probably most frequently utilized. With von Neumann neighborhood we have the fewest number of neighbors and cells are spatially saturated earlier. The problem with Moore neighborhood is that the distance to all of the sites is not the same. Hexagonal grid has good properties (same distance, 6 neighbors), but is less frequently utilized, because implementation is more involved. In todays post I will show that essentially there is no difference in implementation between all of those lattices.

We will use the codes for basic ABM model posted before as a template (MATLAB version here and C++ version here). In both cases we considered rectangular lattice with Moore neighborhood. If we set migration probability to zero, set large value of proliferation capacity and set the spontaneous death rate to zero, i.e. we simulate essentially only division events, we will see in visualizations of simulated tumors what kind of neighborhood we assumed:

square

Let us see how the visualization looks like for different types of neighborhoods/lattices.

In both implementations we had a separate array defining the neighborhood of the cell. Thus, in order to modify the code to von Neumann neighborhood we need to change only two consecutive lines in the MATLAB code in which we define the neighborhood and the permutations table:

aux = int32([-N -1 1 N])'; %indices to heighborhood
Pms = perms(uint8(1:4))'; %permutations

In C++ implementation we need to modify the definition of the neighborhood:

static const int indcNeigh[] = {-N, -1, 1, N};//neighborhood

and the for loop in the returnEmptyPlace function:

for(int j=0;j<4;j++) {//searching through neighborhood
        if (!lattice[indx+indcNeigh[j]]) {
            neigh[nF] = indx+indcNeigh[j];
            nF++;
        }
}

As it could be expected, switching to von Neumann neighborhood made the simulated tumor diamond shaped.

diamond

Let us now consider hexagonal lattice. The very first observation that we need to make is that hexagonal lattice is essentially an rectangular lattice with shifted odd (or even) rows and two definitions of neighborhood (one for cells in odd rows, second for cells in even rows), see picture below

concept

Thus, in the modified implementation we again only need to change the definition of the neighborhood and add the condition for choosing the proper one. In MATLAB it can be achieved by introducing two definitions of neighborhoods

auxR = int32([-N -1 1 N-1 N N+1])'; %indices to heighborhood
auxL = int32([-N -1 1 -N-1 N -N+1])'; %indices to heighborhood

Pms = perms(uint8(1:6))'; %permutations

and modify the lines in the main loop in which we create the neighborhood for all viable cells (create variable S)

    odd = mod(cells,2) ~= 0; %selecting cells in the odd rows
    S = zeros(6,length(cells));
    if any(odd) %creating neighborhood for the cells in the odd rows
        SR = bsxfun(@plus,cells(odd),auxR(Pms(:,randi(nP,1,sum(odd)))));
        S(:,odd) = SR;
    end
    even = ~odd;
    if any(even) %creating neighborhood for the cells in the even rows
        SL = bsxfun(@plus,cells(even),auxL(Pms(:,randi(nP,1,sum(even)))));
        S(:,even) = SL;
    end

In C++ the changes are even easier. We again introduce two neighborhoods:

static const int indcNeighR[] = {-N,-1,1,N-1,N,N+1};//neighborhood
static const int indcNeighL[] = {-N,-1,1,-N-1,N,-N+1};//neighborhood

and modify the returnEmptyPlace function

for(int j=0;j<6;j++) {//searching through neighborhood
        if (indx % 2) {//if odd row
            if (!lattice[indx+indcNeighR[j]]) {
                neigh[nF] = indx+indcNeighR[j];
                nF++;
            }
        } else {//if even row
            if (!lattice[indx+indcNeighL[j]]) {
                neigh[nF] = indx+indcNeighL[j];
                nF++;
            }
        }
}

We might also want to change the code for visualization. In the previous settings it was enough to represent each cell as a pixel in the image. In case of hexagonal lattice in order to be correct we need to somehow shift the pixels in odd (or even rows). The easiest way to achieve that is to represent each cell as 4 pixels and adjust position according to the row number:

drawing

The modified visualization code in MATLAB is the following

function visualizeHex( N, cells, cellsIsStem, cellsPmax, pmax )

    %select cells in the odd rows
    odd = mod(cells,2) ~= 0;

    M = ones(2*N,2*N,3); %matrix for image, we expand it
    
    %disperse cell by one spot
    i = mod(double(cells)-1,N)+1; %row, REMEMBER THAT CELLS ARE UINTs!
    j = ceil(double(cells)/N); %column
    cells = (i*2-1)+(2*j-1)*2*N;
   
    %add cells to the top right in odd rows
    cells = [cells reshape(bsxfun(@plus,cells(odd),[-1; 2*N-1; 2*N]),1,[])];
    cellsIsStem = [cellsIsStem reshape(repmat(cellsIsStem(odd),3,1),1,[])];
    cellsPmax = [cellsPmax reshape(repmat(cellsPmax(odd),3,1),1,[])];
    
    %add cells to top left in even rows
    even = [~odd false(1,3*sum(odd))];
    cells = [cells reshape(bsxfun(@plus,cells(even),[-1; -2*N-1; -2*N]),1,[])];
    cellsIsStem = [cellsIsStem reshape(repmat(cellsIsStem(even),3,1),1,[])];
    cellsPmax = [cellsPmax reshape(repmat(cellsPmax(even),3,1),1,[])];
    
    %plotting
    color = hot(3*pmax);
    M(cells(~cellsIsStem)) = color(cellsPmax(~cellsIsStem)+1,1);
    M(cells(~cellsIsStem)+4*N*N) = color(cellsPmax(~cellsIsStem)+1,2);
    M(cells(~cellsIsStem)+8*N*N) = color(cellsPmax(~cellsIsStem)+1,3);

    CSCs = cells(cellsIsStem);
    M(CSCs) = color(2*pmax,1);
    M(CSCs+4*N*N) = color(2*pmax,2);
    M(CSCs+8*N*N) = color(2*pmax,3);

    figure(1)
    clf
    imshow(M);
    
end

Finally, simulating the tumor with exactly the same parameters settings on the hexagonal lattice results in way more circular tumor.

hexagonal

Parameter dependent implementations

When dealing with ordinary or partial differential equations we know that existing numerical solvers have different performance depending on the specific mathematical problem. In particular, the performance of any given solver can dramatically change if we change model parameters. Let us take the well known logistic equation as an example. Let C(t) be the current number of cancer cells, r the growth rate, and K the maximal number of cancer cells that can be sustained by the host. Then the growth dynamics can be described by the following equation:

\frac{dC(t)}{dt}=rC(t)\left(1-\frac{C(t)}{K}\right).

The above equation has of course analytical solution, but let us use a MATLAB ordinary differential equations solver to find a numerical solution. Generally it is a good idea to start with the solver that has the highest order – in our case it will be ode45 based on the Runge-Kutta method. We will also consider another solver, ode15s, which is designed for solving stiff systems. We fix the value of carrying capacity K, initial condition C(0), and time interval in which we solve the equation. We will investigate the performance of both solvers for different values of growth rate r. The plots below show that for lower values of growth rate ode45 outperforms the ode15s, but situation changes when the growth rate is large (while keeping small relative difference between the calculated solutions).

timesdifference

That transition in solvers performance can be explained when one looks how solution changes for different values of growth rates (see figure below) – the larger the growth rate the steeper the solution. Stiff solvers, like ode15s, are designed to effectively deal with such phase transitions.

solution

So, if we need to consider parameter dependent performance of the numerical solvers for differential equations, should we do the same for agent based models of cancer growth? Should we create parameter dependent implementations and switch between them on the fly? I think that yes, and I’ll try to convince you with the simple example below.

Let us consider a very simple CA: cell can either divide or migrate if there is a free spot – nothing else. We can adapt the very basic C++ implementation from the previous post as a baseline: we have a boolean matrix (in which value true indicates that the spot is occupied by the cell) and additional vector of cells keeping information about locations of all viable cells present in the system. At each simulation step we go through all of the cells in a vector in random order and decide about faith of each cell.

Let us tweak the code and at each iteration drop from memory (from vector) cells that are quiescent (completely surrounded). If the cell stays quiescent for prolonged time we save some computational time by skipping its neighborhood search. However, after each migration event we need to add some cells back to the vector – that is additional cost generated be searching the lattice again. The essential part of the tweaked code in C++ is:

   for (int i=0; i<nSteps; i++) {
        random_shuffle(cells.begin(), cells.end()); //shuffling cells
        while (!cells.empty()) {
            currCell=cells.back(); //pick the cell
            cells.pop_back();
            newSite = returnEmptyPlace(currCell);
            if (newSite) {//if there is a free spot in neigh
                if ((double)rand()/(double)RAND_MAX < pDiv) {//cell decides to proliferate
                    lattice[newSite] = true;
                    cellsTmp.push_back(currCell);
                    cellsTmp.push_back(newSite);
                    nC++;
                } else if ((double)rand()/(double)RAND_MAX < pmig) {//cell decides to migrate
                    toDelete.push_back(currCell);
                    lattice[newSite] = true;
                } else {//cell is doing nothing
                    cellsTmp.push_back(currCell);
                }
            } 
        }
        cells.swap(cellsTmp);
        
        //freeing spots after cell migration and adding cells to vector
        for(int j =0; j<toDelete.size(); j++)
            lattice[toDelete.at(j)] = false;
    
        c = !toDelete.empty();
        while(!toDelete.empty()) {
            newSite = toDelete.back();
            toDelete.pop_back();
            //adding cells that can have space after migration event
            for(int j=0;j<8;j++) {//searching through neighborhood
                if (newSite+indcNeigh[j]>=0) {
                    cells.push_back(newSite+indcNeigh[j]);
                }
            }
        }
        
        //removing duplicates in cells vector
        if (c) {
            sort( cells.begin(), cells.end() );
            cells.erase( unique( cells.begin(), cells.end() ), cells.end() );
        }
    }

Depending on the proliferation/migration rates we can imagine that the tweaked code will have different performance. For low  probability of migration event only cells on the tumor boundary will have some space (green cells on the left figure below) and we save a lot of time by dropping quiescent cells (blue cells in figures below). For higher migration rates, however, less than 30% of the cells are quiescent at each proliferation step (compare right figure below) and the benefit of dropping cells can be outweighed by the after migration update steps.
solid diffusive

The plot below shows the computational time needed to reach 500,000 cells when using the baseline code (all cells, blue line) and tweaked one (only proliferation, dashed green line) for a fixed proliferation probability. For lower proliferation rates change in the performance would occur even earlier.

timeCA

I think that this simple example show that we always need to think how our model will behave before choosing the way to implement it.

Quick parallel implementation of local sensitivity analysis procedure for agent-based tumor growth model

In the last couple of posts I’ve shown how to implement agent-based model of cancer stem cell driven tumor growth, both in MATLAB and C++. Having the implementations we can go one step further and perform some analysis of the tumor growth characteristics predicted by the model. We will start with performing local sensitivity analysis, i.e. we will try to answer the question how the perturbation of parameter values impacts the growth dynamics. Typically it is being done by perturbing one of the parameters by a fixed amount and analyzing the response of the model to that change. In our case the response will be the percentage change in average tumor size after 3 months of growth. Sounds fairly simple, but…

We have 5 different parameters in the model: proliferation capacity, probability of division, probability of spontaneous death, probability of symmetric division, and probability of migration. Moreover, let us assume that we would like to investigate 3 different values of parameter perturbation magnitude (5%, 10% and 20%). In order to be able to analyze the change in the average size we need to have its decent estimator, i.e. we need sufficient number of simulated stochastic trajectories – let us assume that 100 simulations is enough to have a good estimation of “true” average. So in order to perform the sensitivity analysis we will need to perform 100 + 3*5*100 = 1600 simulations (remember about the growth for nominal parameters values). Even if single simulation takes typically 30 seconds, then we will wait more than 13 hours to obtain the result using single CPU – that is a lot!

After looking at the above numbers we can make a straightforward decision right now – we will use C++ instead of MATLAB, because the model implementation in C++ is a several times faster. However 1) we will need to write a lot of a code in order to perform sensitivity analysis, 2) using multiple CPU is not that straightforward as in MATLAB. Is there a better way to proceed?

Few weeks ago I’ve shown here how to wrap your C++ in order to use it from within MATLAB as a function without loosing the C++ performance. Why not to use it to make our lives easier by making sensitivity code short and harness easily the power of multiple CPUs?

We will start the coding (in MATLAB) with setting all simulation parameters

nSim = 100; %number of simulations to perform for a given set of parameters
tmax = 30*3; %number of days to simulate
N = 1000; %size of the simulation domain

%nominal parameter values [rhomax, pdiv, alpha, ps, pmig]
nominal = [10, 1/24, 0.05, 0.3,10/24];

perturb = [0.05 0.1 0.2]; %perturbation magnitudes, percent of initial value up

Now we just need to construct a loops that will iterate through all possible perturbations and simulations. If we don’t use Parallel Toolbox, i.e. don’t use multiple CPUs, it doesn’t really matter how we will do that – performance will be similar. Otherwise, implementation is not that straightforward even though from MATLABs documentation it seems that it is enough to change for to parfor. The most important thing is how will we divide the work between the CPUs. The simples idea is to spread the considered perturbations values among the CPUs – that will allow to use 15 CPUs in our setting. However, I’ve got machine with 24 CPUs, so that would be a waste of resources – bad idea. The other idea would be to use parfor loop to spread all 100 simulations for a given perturbation value on all 24 CPUs and go through all perturbation values in a simple loop – now we are using all available CPUs. But are we doing that efficiently? No. The thing is that CPUs need to be synchronized before proceeding to the next iteration of the loop going through perturbation values. So some of the CPUs will be idle while waiting for other ones to finish with parfor loop. In order to make the code even more efficient we will just use one parfor loop and throw all 1600 simulation on 24 CPUs at the same time. Let us first prepare the output variable.

HTCG = zeros(1,nSim + length(nominal)*length(perturb)*nSim);

Before writing the final piece of the code we need to solve one more issue. Namely, in the C++ implementation we used srand(time(NULL)) to initiate the seed for random number generator. It is perfectly fine when we use single CPU – each simulation will take some time and we don’t need to worry about uniqueness of the seed. The problem is when we want to use multiple CPUs – all initial parallel simulations will start with exactly the same seed. One way to solve that is to pass the current loop iteration number (i) to C++ and use srand(time(NULL)+i) – that is what I have done. After solving that issue we can write the final piece of the code.

parfor i = 1:length(HTCG)
    %%PREPARING PARAMETERS VALUES
    params = nominal; %setting parameters to nominal values
    if i>nSim %simulation is for perturbed parameters
        %translating linear index to considered parameter and perturbation value
        j = ceil((i-nSim)/(nSim*length(perturb)));
        k = ceil((mod((i-nSim)-1,nSim*length(perturb))+1)/nSim);
        %updating parameters
        params(j) = params(j)*(1+perturb(k));
        if k == 1 %if proliferation capacity parameter we need to round it
           params(j) = round(params(j)); 
        end
    end

    %%RUNNING SIMULATION AND SAVING OUTPUT
    [~, cells] = CA(params,[tmax*24,N,i]);
    HTCG(i) = length(cells)/3;
    clear mex %important! without that the internal 
              %variables in CA functions won't be cleared and next simulation
              %won't begin with single initial cell
end

Then in the command line we start the parallel pool with 24 workers (CPUs), by typing parpool(24) command, and run the code. Screenshot below shows nicely how all of the 24 CPUs are being used – no resource wasted!

CPUs occupancy

We can then add few additional lines of the code to plot the results.


nom = mean(HTCG(1:100)); %average for nominal parameters value
%calculating averages for perturbed sets values
av = squeeze(mean(reshape(HTCG(101:end),nSim,length(perturb),length(nominal))));

%plotting results of sensitivity analysis
set(0,'DefaultAxesFontSize',18)
figure(1)
clf
bar(perturb*100,(av-nom)/nom*100)
legend({'\rho_{max}', 'p_{div}', '\alpha', 'p_s', 'p_{mig}'})
xlabel('% perturbation')
ylabel('% change')

And “voila!” – the resulting figure shows that the perturbation in the proliferation capacity has the highest impact on the tumor growth dynamics.

sensResults

Wrapping C++ code of agent-based tumor growth model into MATLAB

Last week I posted a C++ implementation of basic cancer stem cell driven tumor growth model. About 100 lines of a code allowed to perform simulation, but there was nothing about exporting the results, doing more complicated analysis, visualization, or performing data fitting. Of course each of those tasks can be performed by writing another large piece of the code.

Two weeks ago I posted a MATLAB code for the very similar model, which was quite fast but a lot slower than C++ implementation. On the other hand it was straightforward to visualize the tumor and it’s not much of a hassle to perform more complicated analysis or data fitting using variety of MATLABs built-in functions.

Each of those two approaches has their own advantages: C++ is fast, MATLAB has a lot of built-in in tools.

Today I will show how to take the best out of both approaches, i.e. I’ll show how to easily wrap our C++ code into MATLAB. The whole C++ based simulation will be invoked from MATLAB as a function and results will be visible straight from it.

First we need to add necessary libraries.

#include "mex.h"
#include <matrix.h>

Then we need to modify our original definition of global variables (see previous post) so they can be adjusted based on the input parameters (for example lattice has to be dynamically allocated).

int N;
bool* lattice;
vector<cell> cells;

char pmax;
double pDiv, alpha, ps, pmig; 
int* indcNeigh;

Then the only thing that we need to do is to delete old main() function and use mexFunction() instead. In the mexFunction() we need to read first the parameters that will be passed from MATLAB (input parameters to the function). Then we need to assign the memory for the lattice and set other auxiliary variables. Finally, after performing the simulation we need to write variables to the output, so the MATLAB can see the results.

void mexFunction(
		 int          nlhs, //number of output arguments
		 mxArray      *plhs[], //output arrays
		 int          nrhs, //number of input arguments
		 const mxArray *prhs[] //input arrays
		 )
{
    /*READING INPUT PARAMETERS*/
    double *par=mxGetPr(prhs[0]); //model parameters, first input variable
    double *settings=mxGetPr(prhs[1]); //simulation settings, second input variable

    int Nsteps = settings[0]; //number of simulation steps to perform
    N=(int)settings[1]; //size of the lattice
    pmax = par[0]; //divisions before proliferative capacity exhaustion 
    pDiv = par[1]; //division probability 
    alpha = par[2]; //spontaneous death probability 
    ps = par[3]; //probability of symmetric division 
    pmig = par[4]; //probability of migration
    
    /*CREATING LATTICE AND AUXILARY VARIABLES*/
    //creating lattice with a given size
    lattice = new bool [N*N];
    fill_n(lattice, N*N, false);
    
    indcNeigh = new int [8];//cell neighborhood
    //assigning values
    indcNeigh[0] = -N-1; indcNeigh[1] = -N; indcNeigh[2] = -N+1;
    indcNeigh[3] = -1; indcNeigh[4] = 1; indcNeigh[5] = N-1;
    indcNeigh[6] = N; indcNeigh[7] = N+1;

    /*OLD PIECE OF MAIN() FUNCTION*/	
    srand(time(NULL)); //initialize random number generator
    initialize(); //initialize CA
    simulate(Nsteps); //simulate CA
    
    /*WRITING TO OUTPUT VARIABLES*/
    //1) writing the lattice as a logical matrix
	plhs[0] = mxCreateLogicalMatrix(N,N); //lattice
    mxLogical *ptr = mxGetLogicals(plhs[0]);
    for(int i=0; i<N*N; i++) 
        ptr[i] = lattice[i];

    //2) writing the cells vector as a vector
    plhs[1] = mxCreateDoubleMatrix(1,cells.size()*3,mxREAL);
    double* ptr2 = mxGetPr(plhs[1]);
    for(int i=0; i<cells.size(); i++) {
        ptr2[3*i] = cells.at(i).place;
        ptr2[3*i+1] = cells.at(i).p;
        ptr2[3*i+2] = cells.at(i).is_stem;
    }
}

After making all of the changes we can compile our .cpp file using the MATLAB command “mex CA.cpp”. Remember all other pieces of C++ associated with model itself remain the same – there is no need to change anything.

From now on we can perform our simulation in MATLAB environment by invoking the following command

[lattice, cells] = CA([pmax, pdiv, alpha, ps,pmig],[nSteps, N]);

and further use of all MATLABs benefits, especially data fitting features. At the same time all of the simulations will be performed at C++ speed! Imagine using Parallel Toolbox in MATLAB to harness multiple CPUs with only few lines of the code.

Setting complex domains for agent based models using bitmaps

In the previous posts (CA in MATLAB and C++) I’ve shown how to implement cancer stem sell driven tumor growth model. In both codes I have set the boundaries of the lattice to true without adding those sites to additional cells vector, so the boundary was treated as an occupied site, but not as a viable cell. This way of coding the system makes it easy to implement more complex domains. Today, I’ll show how to read the complex domains from bitmap (.bmp) file and use them for simulations using previously posted codes. The basic idea is that the image will be loaded into the program and the pixels with values “close” to white will be treated as free sites.

First we need to prepare the image. The posted codes will assume that the image is a square (image=width). We can draw anything we want, remembering that white pixels will be interpreted as free sites. We will write C++ code that will be able to read properly 24 bits bitmap images without the embedded information about the color space. In order to export image in that format we can use GIMP program with the following settings during the export.

Screen Shot 2015-06-09 at 11.20.46 PM

I’ve prepared two exemplary images that I will use further in the simulations. The first one is adapted from the paper by Enderling et al. and the second is a generated text using PowerPoint.
tissue text

Firt, the basic C++ code. We could use additional libraries to load images, but we want to make code as platform independent as possible. We assume that the array lattice and its size are already defined as the global variables (see previous code).

#include <math.h>  

void domainFromImage(char* fileName) {

    FILE* f = fopen(fileName, "rb"); //open to read as binary
    unsigned char head[54];
    fread(head, sizeof(unsigned char), 54, f); // read the header
    
    //in BMP format the size of each row is rounded up to multiple of 4 bytes
    int Nb = ceil(3.*(double)N/4.)*4; //true size of the row in bytes, including padding

    unsigned char* img = new unsigned char[Nb]; // allocate one row of pixels
    
    for(int i = 0; i < N; i++) { //for each row
        fread(img, sizeof(unsigned char), Nb, f);//read one row
        for (int j=0; j<N; j++)
            //set to free only those that have high intensity close to white (>240 in all three channels)
            lattice[i*N+j] = !(img[3*j]>250 && img[3*j+1]>250 && img[3*j+2]>250);
    }
    fclose(f);//close file
    
    //filling boundary
    for (int i=0; i<N; i++) {lattice[i]=true;};//left
    for (int i=0; i<N*N; i+=N) {lattice[i]=true;};//top
    for (int i=N-1; i<N*N; i+=N) {lattice[i]=true;};//bottom
    for (int i=N*(N-1); i<N*N; i++) {lattice[i]=true;};//right
}

The same task in MATLAB is way easier to implement…

function [L, N] = domainFromImage( filename )
    L = imread(filename); %reading image
    L = ~(L(:,:,1)>250 & L(:,:,2)>250 & L(:,:,3)>250); %setting those values...
    N = size(L,1);%size of the domain (assuming square)
    
    %filling border
    L([1:N 1:N:N*N N:N:N*N N*(N-1):N*N]) = true; %boundary
end

What is most important MATLAB code is not so much dependent on the format of the image – it can be easily modified to other image types (imread function is very flexible).

Below is the exemplary result of simulation using image representing the tissue.

tissueTumor

Tumor growth can be also simulated for the domain generated using PowerPoint text.

ht4

Cancer stem cell driven tumor growth model in C++

Because of the feedback that I’ve received after publishing first three posts, I’ve decided to change a tool of interest from MATLAB to C++. Today, I’ll show how to quickly implement a cancer stem cell driven tumor growth model in C++. It is almost the same model as implemented in my previous post (I’ll explain why it is “almost” the same at the end of this post).

Quick guide to the model: A cell, either cancer stem cell (CSC) or non-stem cancer cell (CC), occupies a single grid point on a two-dimensional square lattice. CSCs have unlimited proliferation potential and at each division they produce either another CSC (symmetric division) or a CC (asymmetric division). CCs are eroding their proliferation potential at each division and die when its value reaches 0. Moreover, at each proliferation attempt, CCs may undergo spontaneous death and then be removed from the system.

First we start with including the necessary headers and setting the namespace. Apart from the standard functions and datatypes, we will use the vector datatype to store the cells and a shuffling function from algorithm library.

#include <vector>
#include <algorithm>
#include <stdlib.h>
#include <time.h>

using namespace std;

Now we can define a cell. It will be defined by three variables: index to the place on the lattice (integer), remaining proliferation capacity (char), and boolean variable defining stemness.

struct cell {//defining cell
    int place;
    char p;
    bool is_stem;
};

Next step is to define the lattice size, the lattice itself, vector that will contain all viable cells, and auxiliary variable defining the cells neighborhood.

static const int N = 2000; //lattice size
bool lattice[N*N] = {false}; //empty lattice
vector<cell> cells; //vector containing all cells present in the system

static const int indcNeigh[] = {-N-1, -N, -N+1, -1, 1, N-1, N, N+1};//neighborhood

The parameters of the model are defined as follows.

char pmax=10; //proliferation capacity
double pDiv=1./24.; //division probability
double alpha=0.05; //spontaneous death probability
double ps=0.05; //probability of symmetric division
double pmig=10./24.; //probability of migration

Having made all of those initial definitions we can finally start coding the main program.
Let us start with writing the function that will initialize the whole simulation, i.e. fill the lattice boundary and put the initial stem cell.

void initialize() {
    for (int i=0; i<N; i++) {lattice[i]=true;};//filling left
    for (int i=0; i<N*N; i=i+N) {lattice[i]=true;};//filling top
    for (int i=N-1; i<N*N; i=i+N) {lattice[i]=true;};//filling bottom
    for (int i=N*(N-1); i<N*N; i++) {lattice[i]=true;};//filling right
    
    lattice[N/2*N+N/2] = true; //initial cell in the middle
    cell initialCell = {N/2*N+N/2,pmax,true};//initial cell definition
    cells.push_back(initialCell);
}

As in the previous post, we set all the boundary values of lattice to true in order to make sure that we will not address any site out of the lattice. Typically one would use an if statement when addressing the lattice site to make sure that index to a site is within the domain. Here, because we have set the boundaries to true without adding those sites to cells vector, we don’t need to do that – boundary will be treated as an occupied site, but not as a viable cell.

The second auxiliary function that we will use returns the index to the randomly selected empty space around a given spot.

int returnEmptyPlace(int indx) {
    int neigh[8], nF = 0;
    for(int j=0;j<8;j++) {//searching through neighborhood
        if (!lattice[indx+indcNeigh[j]]) {//if free spot
            neigh[nF] = indx+indcNeigh[j]; //save the index
            nF++; //increase the number of found free spots
        }
    }
    if(nF) {//selecting free spot at random
        return neigh[rand() % nF];
    } else {//no free spot
        return 0;
    }
}

If there is no free spot the function returns 0.
Finally we can code the part in which all the magic happens – main simulation procedure. It is hard to dissect the whole procedures into parts, so I did my best to explain everything in the comments.

void simulate(int nSteps) {
    
    vector<cell> cellsTmp;
    int newSite;
    cell currCell, newCell;
    
    for (int i=0; i<nSteps; i++) {
       random_shuffle(cells.begin(), cells.end()); //shuffling cells
       while (!cells.empty()) {
           currCell=cells.back(); //pick the cell
           cells.pop_back();
           newSite = returnEmptyPlace(currCell.place);
           
           if (newSite) {//if there is a new spot
               newCell = currCell;
               newCell.place = newSite;
               if ((double)rand()/(double)RAND_MAX < pDiv) {
                   if (currCell.is_stem) {
                       lattice[newSite]=true;
                       cellsTmp.push_back(currCell);
                       if ((double)rand()/(double)RAND_MAX > ps) {//asymmetric division
                           newCell.is_stem = false;
                       }
                       cellsTmp.push_back(newCell);
                   } else if (currCell.p > 0 && (double)rand()/(double)RAND_MAX > alpha) {
                       currCell.p--;
                       newCell.p--;
                       lattice[newSite] = true;
                       cellsTmp.push_back(currCell);
                       cellsTmp.push_back(newCell);
                   } else {
                       lattice[currCell.place] = false;
                   }
               } else if ((double)rand()/(double)RAND_MAX < pmig) {
                   lattice[currCell.place] = false;
                   lattice[newSite] = true;
                   cellsTmp.push_back(newCell);
               } else {//doing nothing
                 cellsTmp.push_back(currCell);
               }
           } else {//no free spot
               cellsTmp.push_back(currCell);
           }
       }
       cells.swap(cellsTmp);
    }
}

Now we wrap everything in the main function, compile and run.

int main() {
    srand(time(NULL)); //initialize random number generator
    initialize(); //initialize CA
    simulate(24*30*6);
    return 0;
}

Everything in about 100 lines of the code.
What about the speed of the code? It took about 40 seconds to simulate the tumor presented below, which is consisted of about 460,000 cells (on 2000×2000 lattice).

Screen Shot 2015-06-06 at 12.24.53 AM

Why is it “almost” the same model as the one presented in previous post? The answer is in details. In the above C++ implementation lattice is updated after every single cell movement/death, i.e. because of the random shuffling a new free spot can be created for a cell that at the beginning of the iteration was quiescent (without a free spot), so it can successfully proliferate in the same iteration. In the MATLAB implementation that kind of behavior was avoided.