How to make Deformetrica run faster


Deformetrica is based on methods that are reputed to be slow. Well, considering all the things that Deformetrica can do at the same time (especially in atlas construction), this is probably not so true…

Anyway, slow run time may also be due to the way you use Deformetrica. Below is a list of items you may want to consider to boost Deformetrica:

  • Kernel-type: cudaexact would be the way to go but the choice of the kernel evaluation depends of many parameters (number of points, the spatial extent of the data, computer architecture, etc.). That’s the reason why we advice to experiment this parameter on a sample your data in order to determine what will be the best suited kernel type for your simulations.
  • Multi-threading: the atlas construction method is multi-threaded. Potentially, each subject is treated in a different thread. So, don’t forget to adjust the number of thread in paramDiffeos.xml to either the number of subjects or the maximum number of threads that you have on your machine. If you deal with a huge number of subjects, consider running Deformetrica on a cluster. Multi-threading does not work for registration.
  • Kernel-width: more than often, one wants to try very small kernel sizes, in the hope to increase matching accuracy. However, too small values rarely pay off. See by yourself by experimenting with the toy example in Tutorial #2. In general, we suggest you to start with large values of kernel sizes (for both the deformation and the currents/varifolds). You then get quick results and can start decreasing kernel sizes wisely
  • Mesh resolution: keep in mind that a fine mesh resolution does not improve matching accuracy! The mesh sampling should be adapted to the kernel size of the currents/varifolds metric. The best strategy is to use coarse meshes for registration (especially the source mesh), and then use the deformation to deform a fine mesh. This will be much much faster, and more than often as good as what you would have obtained using the fine mesh for computing the registration.
  • Number of time points: this is the discretization of the point trajectories along deformation. The default value is 10 (for a time-step of 0.1). Decreasing this number to 5 will halve the execution time. It may be a good value for quick tests.


The main limiting factors of the code are:

  • the number of control points: one control point is placed at every deformation kernel width. The more spread the data, the more control points. The smaller the kernel width, the more control points.
  • the size of the grids for FFTs when p3m option is used. The smaller the kernel widths, the more points on grids, the slower the algorithm.
  • the number of vertices in the source/template mesh
  • the number of time points for the discretization of the non-linear deformation.

Comments are closed.