
If you don’t want to buy a new expensive high-performance multicore laptop, cloud computing is your best bet. Hence you would directly increase the performance of your code. the time you have to wait for your computation to be done. Distributing these computations over several independent computers would directly save over the computation wall-clock time, ie. This bad performance is directly proportional to the number of calls to your model. So that any statistical operation (optimization, evidence estimation, etc.) with this model would probably take minutes or hours, requiring hundreds or thousands of calls to model. Consider for instance that one has a prior over n data samples so that the likelihood of the data could be something like that: n <- 1e7 X <- rnorm(n) model <- function(x) prod(dnorm(x, mean = X, sd = abs(X)))Ī single computation would take: system.time(model(rnorm(n))) # user system elapsed # 1.648 0.054 1.708

Let us see define a toy example as an illustration. Without any painful or complicated devops tasks. In this post I want to share my experience on how to get a working RStudio on AWS with your own files and as many CPU as you need.

But aren't statistical simulations just different trials of the same things? What if you were using parallel computing to actually work them in parallel and achieve a scalable speedup? What if you were using it on AWS? It is possible that you still have to wait minutes (or hours for heavy statistical simulation) for your computation to be done. All the usual tricks ( matrix calculations, *apply functions, compiler, Rcpp) may not bring a sufficient speedup. While getting working R code is quite straightforward, getting high performance R code may become a headache. I have been pushed to switch to Python but to me RStudio remains an unbeatable state-of-the-art IDE for data analysis and research on the whole. I have been using R for almost ten years now I like R, I love it.

What if AWS could save you days without changing your usual workflow?

Read the original article on Sicara’s blog here.ĭata analysis with RStudio is great, apart from R famous poor performance.
