Java, C++, MATLAB, Julia or Python for cellular automatons? Speed comparison.

Thanks to @spuri4096 and @chandlergatenbee, the basic cancer stem cell driven model of tumor growth that I implemented in the very first Compute Cancer blog is now programmed in 4 languages:

1. C++ (post with the code here)
2. MATLAB (post with the code here)
3. Java (post with the code here)
4. Python (post with the code here)
5. Octave – the same code as for MATLAB. Works without any modifications.
6. Julia (post with the code here)

Isn’t that the perfect opportunity to see which one is the fastest?

First of all, codes are not completely identical. There are small differences between C++ and MATLAB codes in how the death of a cell is updated, but that shouldn’t affect the comparison too much. The code in Java is much more sophisticated – it uses coded lattice concept and dynamically expanding lattice. Python code (as much as I can see…) does pretty much the same job as the C++ code, so that is the fairest comparison out of all.

I will compare the time needed to simulate cancer from a single cancer stem cell to 100, 250 and 500 thousand of cells. The domain size is set to 1000×1000 grid points (not for Java code, in which lattice expands on demand), so 1 million is the maximal cell number. Of course all simulation parameters are also the same for all codes: probability of symmetric division (ps) = 0.3; proliferation probability (pdiv) = 1/24; migration probability = 15/24; and probability of spontaneous death (alpha) = 0.05. Because of the stochastic nature of the model I will run 20 simulations for each code and report the average time (+- standard deviation).

The results are summarized on the plot below.

As probably everyone expected C++ outcompeted other codes.
Java also did great.
What might surprise some people, MATLAB didn’t perform that bad (it doesn’t suck that bad as people usually think).
The speed of Octave was a surprise for me – I had patience only to run simulations for 100K cell limit…

Of course in each case there is definitely a room for improvement using even more sophisticated language specific programming tricks.

However, from my perspective – if you don’t want to spend too much time on learning programing language, MATLAB (Octave) might be not such a bad idea.

Cancer Stem Cell CA in Python

Python doesn’t seem to be the first programming language people go to when developing cellular automata models. However, given that Python is an object-orientated language that is easy to read and write, it might actually be ideal for such models, especially if you prefer to think from the perspective of the agent (if you’d rather model using matrices you can do that too by using Python’s NumPy package). As we’ll see, it can also be pretty fast. Below I’ll cover one way to implement the Cancer Stem cell CA described in the Cancer Stem Cell CA in Matlab post.

GETTING PYTHON

If you don’t have Python, the easiest way to get it and nearly all of its scientific packages is to download the Anaconda distribution from https://store.continuum.io/cshop/anaconda/. This installation and all modules will be kept in it’s own folder, so it won’t interfere with the Python that comes bundled in your OS. It also means that uninstalling is just a matter of dropping the folder in the trash. You may also want to download an IDE, a list of which can be found on the Anaconda website (http://docs.continuum.io/anaconda/ide_integration ;my personal favorite is PyCharm).

As far as versions are concerned, the “go-to” version is Python 2.7. However, 2.7 is no longer being developed so you will eventually have to migrate to Python 3. For a long time many avoided this upgrade because several of the most important packages hadn’t been ported over, but that has since changed and most of the critical packages like Numpy, SciPy, Matplotlib, Pandas, etc… are now available in Python 3. If you choose to go with Python 2, it’s important to note that when dividing you need to use a float in the denominator, as division by an integer returns a rounded down integer. This isn’t the case with Python 3, however.

OVERVIEW

Before we begin coding, I’d like to provide an overview of how the model will be implemented. Instead of using a matrix to track cells, we’ll create a dictionary of cells. The keys to this dictionary will be the cell’s position, while the value will be the cell itself. Each cell we create will know the positions of it’s neighbors, and the cell will use the dictionary of cells to look up its neighbors (a dictionary in Python is really just a hash table, so this look-up is fairly fast). A nice advantage of using a dictionary is that we can easily add and remove entries from it, giving us the ability to iterate only across cells, not empty positions in a matrix. This effectively allows the domain to grow and shrink according to how many cells are present. During each round of this model, we’ll shuffle the values in this dictionary and iterate through them, having each cell execute it’s actions.

One final note. I’ve taken screenshots of the code to make it easier to read, but the raw code can be found in this .doc file: CSC_CA_in_python. It’s not possible to upload .py files, so if you download the code, just change the extension from .doc to .py and then run the CA. Or, just copy and paste into your own file and run it.

Let’s start coding by importing the packages we will be using.The first is NumPy, Python’s package for scientific computing (http://www.numpy.org), a package you’ll almost always need. Second, we’ll import the package we’ll use to visualize our results, matplotlib (http://matplotlib.org). Next, we’ll import Numba, a ‘just in time’ compiler than can dramatically speed up sections of code in which numerical calculations are called for (http://numba.pydata.org). Finally, importing the random module will allow us to randomly select cells.

We’ll begin writing up our CA by defining a few methods that we’ll use to construct the world of our CA. First, lets construct that dictionary which will contain the list of positions in a central position’s Moore neighborhood. To do this, we can use both list comprehension and dictionary comprehension. Basically, these approaches allow us to build lists or dictionaries without creating an empty list and messy for loop to populate that list/dictionary. Not only is list comprehension more convenient and cleaner, but it is also faster than the alternative.

SPEEDUP TRICKS

Here we’ll setup a few methods to speed up our CA. We’ll frequently request a random binomial number and shuffle cells, so we’ll call those methods once, here at the top, so they don’t have to be reevaluated each time we call them, something that would slow down our model. Next, we’ll create a few methods to speed up the generation of random binomial numbers (something we’ll do frequently) by creating methods to call binomial using numba’s just in time compiler (jit). To do this, all we have to do is create a wrapper method, and then add the @jit decorator. This small modification provides a dramatic speed-up; without it the CA and plotting takes 16 min, but adding the simple @jit decorator reduces the time it takes down to around 5 minutes! Who says Python is always slow 🙂

CANCER CELL CLASS

OK, now that everything is set up, we’ll start creating our Cancer Cell class, which defines all of the cancer cell’s attributes and behaviors. First, we define the mandatory __init__ method, which defines the attributes each newly created cancer cell will have. In this case, each new cancer cell will be assigned a position (pos), the number of divisions it has remaining (divisions_remaining), and the list containing the positions in its neighborhood (neighbor_pos_list). This list is assigned simply by using the cell’s position to look up that list in the dictionary_of_neighbor_pos_lists that we created earlier.

Now that we’ve defined our cell’s attributes, we can start defining what exactly our cells will do. Because this is a fairly simple CA, we’ll just define all of the behaviors in one method (sort of…), which we’ll call act. First we’ll write up the bit of code for cell division. Recall that a cancer cell can only divide if that the cell is lucky and has an empty space in its neighborhood. In this case, if divide_q is 1, the cell is lucky and has a chance to divide. If the cell is indeed lucky, it then uses the locate_empty_neighbor_position method to find all of the free spaces in its neighborhood (the second requirement for division). Jumping to that locate_empty_neighbor_position method, the first thing we’ll do is create a list of positions that are NOT in the cell dictionary, that is, spaces which are currently unoccupied. If there are actually empty positions, a randomly selected empty position in the neighborhood is returned. Moving back to the act method, a new Cancer Cell is created, assigned that random position, made to look up its neighbor list in the dictionary_of_neighbor_pos_lists, and then added to the cell dictionary. Finally, the dividing cell decreases the number of divisions that it can undergo by one.

After a cell has tried to divide, it will determine if it needs to die. Recall that a cell will die for one of two reasons: 1) it dies because it has exceed its maximum proliferative potential, that is,  divisions_remaining <= 0; or 2) it dies spontaneously, that is, if die_q = 1 . If either of these conditions are true, the cell and it’s position are deleted from the cell dictionary, thus removing it from the CA and freeing up space.

CANCER STEM CELL CLASS

Now that our Cancer Cell class is created, we can start defining the behaviors of Cancer Stem Cells. We could simply clog up our code by adding a bunch of if/else statements to our Cancer Cell’s act method, but instead we’ll create a Cancer Stem Cell class, which will be a subclass of the Cancer Cell class. Classes in Python have inheritance, which means that our new Cancer Stem Cell class has all of attributes and methods of the Cancer Cell class, and so we don’t need to redefine all attributes or re-write the locate_empty_neighbor_position method. The only attribute we’ll redefine is the PLOT_ID, which we’ll use when visualizing the results of our CA. Next, we’ll redefine the act method. The process of division is the same as before, except that the stem cell can either divide symmetrically or asymmetrically. Which type of division occurs is determined by calling the divide_symmetrically_q method. If divide_symmetrically = 1, a new Cancer Stem Cell is created and added to the cell dictionary; if divide_symmetrically = 0, a normal Cancer Cell is created and added to the dictionary instead. Now we have our Cancer Cells and Cancer Stem Cells setup and ready for action!

RUN THE MODEL

The last thing we have to do is setup the initial conditions of our CA, which we’ll do under the if __name__ == “__main__” line. First, we’ll define the constants at the top. Next, we’ll create the initial Cancer Stem Cell, which will be positioned in the middle of world. We do this by initializing this first stem cell, using the position in the center of the world and DICTIONARY_OF_NEIGHBOR_POS_LISTS, and then add it to the cell dictionary. Next, we’ll copy the cells in the cell dictionary to a list (line 189 for Python 2, or line 192 for Python 3). While copying the values to a list isn’t ideal, it does provide two advantages. First, we can shuffle this list, allowing us to move through our cells randomly. Second, this list is what allows us to change the cell dictionary on the fly, as the length of a dictionary cannot be changed while iterating through it. Finally, we loop through our list of randomly ordered cells, having each one conduct their actions by calling their act method, a process that is repeated until maximum number of reps has been met.

VISUALIZE RESULTS

After we’ve completed our simulation, we would like to visualize our results, so we create a visualization_matrix, which is just matrix of zeros. Because we don’t care about the order of our dictionary, and won’t be modifying it’s length, we just iterate through the cells in our cell dictionary. Each cell then adds it’s PLOT_ID number to the matrix, so that stem cells will be one color, and normal cells another color. Now we use matplotlib’s imshow and show methods to visualize the results:

CONCLUSION

Using Python has its pros and cons. The pros are that it’s easy to read and write, reducing development time and making it easier to share code. The con is that, in its native form, Python is not the fastest. However, as we’ve seen, there are tools to increase performance. Numba is only one of those tools, but there are others, with the most popular probably being Numba Pro, CUDA, Cython, and PyPy. There are also ways to take advantage of multithreading and mulitprocessing to speed up your models even more. So, if its possible to use these tools, you can have both fast development and execution speed! I hope you found this post helpful, and that maybe you’ll even consider using Python for your next project.