by Bob Muenchen
In my ongoing quest to analyze the world of analytics, I’ve updated the Growth in Capability section of The Popularity of Data Analysis Software. To save you the trouble of foraging through that tome, I’ve pasted it below.
Growth in Capability
The capability of analytics software has grown significantly over the years. It would be helpful to be able to plot the growth of each software package’s capabilities, but such data are hard to obtain. John Fox (2009) acquired them for R’s main distribution site http://cran.r-project.org/, and I collected the data for later versions following his method.
Figure 9 shows the number of R packages on CRAN for the last version released in each year. The growth curve follows a rapid parabolic arc (quadratic fit with R-squared=.995). The right-most point is for version 3.1.2, the last version released in late 2014.
To put this astonishing growth in perspective, let us compare it to the most dominant commercial package, SAS. In version, 9.3, SAS contained around 1,200 commands that are roughly equivalent to R functions (procs, functions etc. in Base, Stat, ETS, HP Forecasting, Graph, IML, Macro, OR, QC). In 2014, R added 1,357 packages, counting only CRAN, or approximately 27,642 functions. During 2014 alone, R added more functions/procs than SAS Institute has written in its entire history.
Of course SAS and R commands solve many of the same problems, they are certainly not perfectly equivalent. Some SAS procedures have many more options to control their output than R functions do, so one SAS procedure may be equivalent to many R functions. On the other hand, R functions can nest inside one another, creating nearly infinite combinations. SAS is now out with version 9.4 and I have not repeated the arduous task of recounting its commands. If SAS Institute would provide the figure, I would include it here. While the comparison is far from perfect, it does provide an interesting perspective on the size and growth rate of R.
As rapid as R’s growth has been, these data represent only the main CRAN repository. R has eight other software repositories, such as Bioconductor, that are not included in
Figure 9. A program run on 5/22/2015 counted 8,954 R packages at all major repositories, 6,663 of which were at CRAN. (I excluded the GitHub repository since it contains duplicates to CRAN that I could not easily remove.) So the growth curve for the software at all repositories would be approximately 34.4% higher on the y-axis than the one shown in Figure 9. Therefore, the estimated total growth in R functions for 2014 was 28,260 * 1.344 or 37981.
As with any analysis software, individuals also maintain their own separate collections typically available on their web sites. However, those are not easily counted.
What’s the total number of R functions? The Rdocumentation site shows the latest counts of both packages and functions on CRAN, Bioconductor and GitHub. They indicate that there is an average of 20.37 functions per package. Since a program run on 5/22/2015 counted 8,954 R packages at all major repositories except GitHub, on that date there were approximately 182,393 total functions in R. In total, R has over 150 times as many commands as SAS.
I invite you to follow me here or at http://twitter.com/BobMuenchen. If you’re interested in learning R, DataCamp.com offers my 16-hour interactive workshop, R for SAS, SPSS and Stata Users for $25. That’s a monthly fee, but it definitely won’t take you a month to take it! For students & academics, it’s $9. I also do R training on-site.