Interviews: Ask Author and Programmer Andy Nicholls About R 187
Andy Nicholls has been an R programmer and consultant for Mango Solutions since 2011 (where he currently manages the R consultancy team), after a long stint as a statistician in the pharmaceutical industry. He has a serious background in mathematics, too, with a Masters in math and another in Statistics with Applications in Medicine. Andy has taught more than 50 on-site R training courses and has been involved in the development of more than 30 R packages; he's also a regular contributor to events at LondonR, the largest R user group in the UK. But since not everyone can get to London for a user group meeting, you can get some of the insights he's gained as an R expert in Sams Teach Yourself R In 24 Hours (available in print or at Safari), of which he is the lead author. Today, though, you can ask Andy about the much-lauded statistics-oriented free software (GPL) language directly -- Why to use it, how to get started, how to get things done, and where those intriguing release names come from. (The about page is helpful, too.) As usual, please ask as many questions as you'd like, but one question at a time, please.
Note: Slashdot is always looking for interesting interview guests. Who do you want to ask? Let us know!
R? (Score:5, Funny)
Is that a pirates-only language?
Re: (Score:2)
Only if you use Mate[y].
Re: (Score:1)
only if calculating the Planck Constant
Re: (Score:2)
I'm still waiting for the answers of 'ask ray kurzweil'. It's been two months already.
Evolution of R (Score:4, Interesting)
How has the way you use R changed over time? For myself, I don't think I've gone through an entire R session in the past six months without loading dplyr. Combine that with the pipeline operator and I think if you'd shown the R code I wrote yesterday to me of two years ago, I wouldn't have believed it was the same language.
Future of R, now that programmers use it? (Score:2, Insightful)
What's your take on the future of R? It used to be that it was a tool for statisticians, and now it's been discovered by programmers. As a statistician who's not a programmer, but who hangs out sometimes on slashdot and stackoverflow, it feels sometime like it's in danger of becoming just another language for programmers, instead of a tool for statisticians. Should I be worried? Can it be both? Is this mass inflow of programmers going to change it somehow? Or am I just having a "get off my lawn" moment?
More
Re: (Score:1)
The only reason any sane programmer uses it is because they have to write some stat code using some obscure test or analysis package only available in R.
Re:Future of R, now that programmers use it? (Score:4, Insightful)
As a statistician who's not a programmer, but who hangs out sometimes on slashdot and stackoverflow, it feels sometime like it's in danger of becoming just another language for programmers, instead of a tool for statisticians.
As a programmer who used to research programming languages, here's no danger of that at all.
It's not much of a stretch to say that no programmer really uses R. At most, programmers use the high-quality statistical libraries which only work with R. R is basically the best statistical packages every written bound together by one of the worst programming languages ever developed.
Re: (Score:1)
It's not much of a stretch to say that no programmer really uses R. At most, programmers use the high-quality statistical libraries which only work with R. R is basically the best statistical packages every written bound together by one of the worst programming languages ever developed.
This is it *exactly*!
Re: (Score:3)
I actually program exclusively in R and fine it OK once you learn the quirks. Where it excels is in sort of "jotting" down thoughts about programs. e.g. you can define a S3 class and then make one that only has a few of the properties, or claim your object is a class it is not. This would drive any Java programer bananas but it's super nice for going fast and loose.
Similarly, the fact that it can recover your call in addition to the arguments you passed makes several functions work much better when you have
Re: (Score:2)
I actually program exclusively in R and fine it OK once you learn the quirks.
I dunno -- there's an awful lot that's cumbersome about R and constantly does my head in. My pet bugbears:
No native hash/dictionary construct (there is the third-party hash library, but that's not great for portability). ... odd (many people have written previously about R quirks in this re
It's not possible to define functions at the end of your code, making code difficult to read (or requiring you to source a separate script that contains your functions, but again, portability suffers).
Variable scoping is
Key advantages of R (Score:5, Interesting)
In your view, what are the key advantages of R over other scientific computing languages, most notably Matlab (which has to be considered with its plethora of toolboxes of course)?
Re:Key advantages of R (Score:5, Interesting)
In your view, what are the key advantages of R over other scientific computing languages, most notably Matlab (which has to be considered with its plethora of toolboxes of course)?
Or Python with scipy/numpy, or Julia, given their open source nature in addition to the plethora of libraries.
Re: (Score:2)
While I am really only dipping my toe into R I decided to do some research on this question a while back.
I have used python for a number of scientific applications and was attempting to determine if I should use Rpy2 (http://rpy2.bitbucket.org/). It initially made sense to keep all of the data retrieval, formatting and analysis in a few python scripts. However, it seems that the design of the R language intrinsically accounts for the problem solving methodology: "R is designed to operate the way that proble
Tips for new statisticians (Score:1)
For those that are relatively new to R and hope to enter the field of statistics, where would you recommend focusing your R training efforts?
For example, which programming concepts, or fields of application, or packages, etc. do you feel are especially worthy of attention?
Similarly, what would you recommend we avoid?
Re: (Score:3)
Hoisting the AC for asking a good question.
To add on: R is gaining massive traction in graduate programs but so many professors teach it like it's SPSS, almost as a cargo cult coding language, and so much of the documentation is written for people who are already experienced coders. Is there any decent introduction to R for someone that doesn't already know it (or another programming language) fluently?
Re: (Score:2)
What about the painful side of R? (Score:5, Interesting)
There's an entire book, the R Inferno, dedicated to R's many "quirks" and problems. Is there ever a plan to dedicate some time to focusing on cleaning up the language and making it less painful to use?
Harsh crowd (Score:2, Interesting)
In my experience (from searching for R advice online - I've never mailed the R discussion list myself) the R community is incredibly harsh and unforgiving of new users. Answers to beginners' questions are normally brusque - often extremely so. (I remember one exchange, where a user basically asked "I've read the documentation for par, and I don't understand ...", and the response was, in its entirety, "?par" -- which, for those unfamiliar with R, is the command to bring up the documentation for par.)
On the
Re: (Score:2)
As a statistician: someone not trained in statistics using statistical methods when they don't understand the concepts in that mathematically dense paper from 1963 is a dangerous thing. If you want me to be your statistics consultant, pay me my consulting rate. I don't generally costly for free, on the r-help mailing list or elsewhere.
If you don't understand that 1963 paper, you need a statistics consultant. Don't expect someone to do your statistical work for free.
I think you just beautifully proved the OP's point.
Impressed with R's speed (Score:2)
I encountered R via Johns Hopkins University's data science series of Coursera courses which I highly recommend. The first one is at https://www.coursera.org/learn... [coursera.org]
As a mainly Python programer, but someone with an eclectic interest in programing languages (I enjoy Prolog, Lisp, ML...), I've found R very intriguing: it's a very "functional" programing language, but also object oriented (using dollar signs instead of the customary dots). I've also found R to be incredibly quick -- provided you know and use
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
R has been around longer than Java, and is based on S which is older than C++. There's a huge body of existing code and libraries to leverage. But from what I gather, the real reason to use R is because the only other option you're being offered is SAS, and you don't want to deal with that mess! Or so I hear.
Bottom line, if you're not being threatened with SAS, there may be little reason to learn R. But if you are, or if you think there's any danger you might be, R is probably something you want to learn AS
R vs Python (Score:1)
I am myself an R aficionado, but what do you answer to someone who says that Python has gone a long why to be a good contender for data analysis tasks (SciPy, Pandas, Scikit etc...)?
Re: (Score:2)
Using R to learn statistics (Score:4, Interesting)
Minitab (Score:2)
I think minitab is better. How would you convince me otherwise?
How about errors and debugging? (Score:3)
I have in mind cases like the following, in which a confusion about list access using the [ operator (when the [[ should have been used) provides a cryptic error message with no traceback available.
> symlog_scaler <- list(linear_to=2.5, abscissa=2.0,
+ scaling_function=function(x,linear_to=2.5,abscissa=2.0){
+ y <- x; linear_to = abs(linear_to); big_ix = (linear_to<x)
+ y[big_ix] = linear_to + log(1+(x[big_ix] - linear_to), base=abscissa)
+ small_ix = (-linear_to>x)
+ y[small_ix] = -(linear_to + log(1+(-x[small_ix] - linear_to),base=abscissa))
+ y})
> symlog_scaler$scaling_function(-5:5)
[1] -4.307355 -3.821928 -3.084963 -2.000000 -1.000000 0.000000 1.000000 2.000000 3.084963
[10] 3.821928 4.307355
> symlog_scaler['scaling_function'](-5:5)
Error: attempt to apply non-function
> traceback()
No traceback available
>
Third party GUIs (Score:2)
I have been impressed with the strong community surrounding R, and the excellent third party libraries that are available in the CRAN.
What is your view on the various third party GUIs that exist for R, such as RStudio, Tinn-R and RExcel? Do you use or recommend any of them?
Re: (Score:1)
Re: (Score:2)
Hopefully it wasn't me who drove you to that drink. I'm more of a gin man, myself, but I do enjoy a good rum; might I ask what you're poring tonight?
Re: (Score:2)
my persona can be as nice as the next person's UNTIL I am attacked
And this is why I was trying to point out that my initial comment, months ago, was indeed a joke and not a directed attack. Ya gotta admit, ya jumped in pretty heavy at the onset.
We good?
Cruzan is good stuff, definitely one of my choice rums when I go that route. Getting any "spendier" than that is just for show. My gin of preference is Citadelle; I bought a bottle of Tanqueray #10 at 4x the cost per ounce one night when I wanted to indulge and it's ended up being a show piece, certainly not best of bre
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
I'd gladly lay off but you started up again even now
Everything you are referring to was posted before we supposedly made amends and had already been replied to by you.
Your POST HISTORY SHOWS YOU CONSTANTLY COMING IN AFTER I HAVE BEEN IN POSTS TOO
I stood up for you in one post and directly replied to you in another, in this very topic. Aside from that, there was another thread a few days ago where we interacted, and I made one off-the-cuff remark about wishing you'd leave me alone (in a thread where that type of comment was actually quite relevant), which was also made during that little tiff.
My only posts to or about you since our su
Re: (Score:1)
Re: (Score:1)
Ken M usually does not post as anonymous.
Re: (Score:2)
Re: (Score:2)