You have some C/C++ code that you've used to build a package for R. Now you'd like to profile that code. How do you do it?
You can compile your package for use with, e.g.,
R CMD INSTALL --no-multiarch --clean .
But, you'll notice that R is quite insistent about including the -g -O2
compiler flags, which can complicate debugging.
Therefore, create the file $HOME/.R/Makevars
And, assuming your my_package/src/Makevars
file contains
CXX_STD = CXX11
Use the following line to override R's optimization flags:
CXX1XFLAGS += -Og
Note that -Og
is an optimization level that is ... optimized for debugging
purposes. Different flag lines (such as CXX1XFLAGS
) work for various versions
of the standard.
You can then install oprofile
using
sudo apt-get install oprofile
And build a test script containing code that will evoke the functions/methods you wish to profile. Run this script using:
operf R -f tests.R
The summary results can be viewed with:
opreport
And the timing of your library with:
opreport -l ~/R/x86_64-pc-linux-gnu-library/3.2/my_package/libs/my_package.so
Source files annotated with the samples per line can be generated with:
opannotate --source ~/R/x86_64-pc-linux-gnu-library/3.2/my_package/libs/my_package.so
The output appears as follows:
samples % image name symbol name
2013892 66.7818 my_package.so MyClass::hot_method(std::vector<double> const&)
831090 27.5594 my_package.so MyClass::another_method(unsigned long)
69978 2.3205 my_package.so void DoStuff(int,int,double)
56868 1.8858 my_package.so int AdjustThings(unsigned long, unsigned long)
8845 0.2933 my_package.so MyClass::cheerios(int)
Oprofile works by checking in periodically to see where your code is at. The first line is the number of these samples that occurred in each function (listed at right), while the second column is the percent of the total samples this represents.
The foregoing indicates that MyClass::hot_method()
is probably where
optimization efforts should be focused.
The annotated output appears as follows:
: nearest_neighbors.push_back(which_lib[0]);
118511 3.9299 : for(auto curr_lib: which_lib)
: {
3219 0.1067 : curr_distance = dist[curr_lib];
1398851 46.3867 : if(curr_distance <= dist[nearest_neighbors.back()])
: {
: i = nearest_neighbors.size();
123725 4.1028 : while((i > 0) && (curr_distance < dist[nearest_neighbors[i-1]]))
: {
: i--;
: }
: nearest_neighbors.insert(nearest_neighbors.begin()+i, curr_lib);
Here the first column is again the number of samples and the second column is again the percentage of the total samples this represents. This gives a pretty good idea of where the bottleneck is.