By Harrison Jones, ASA, Actuarial and Insurance Solutions at Deloitte
Actuaries are travelling deeper into the fields of data science and machine learning, where open-source software is widely used. Two of the most popular open-source programming languages used by actuaries are R and Python. Both provide the user with considerable functionality to perform the type of data analysis required by actuaries, including but not limited to, data manipulation, data visualization, and the calculation of statistical models.
The topic of R vs. Python is already hotly debated by many professionals in the field of data science. The argument might not be as long-lived as tabs vs. spaces, however, R vs. Python is still discussed at length in many opinion- and fact-based articles. It’s also reasonable to assume that the debate will continue if the popularity of these two programming languages is maintained.
This article serves three primary purposes:
- to provide the reader with helpful resources where the facts have been presented and the question already debated;
- to add to the existing material with one actuary’s opinion; and,
- to present an argument that, in the long run, interoperability should be the primary focus of organizations.
If you plan on doing your own research into R vs. Python, ensure that you have a credible source, and the article was written in the last few years (the more recent the better). As both languages evolve rapidly, a recent and credible source is vital.
- Datacamp provides an easy-to-read and visually appealing infographic. They, correctly, start by identifying four key questions that need to be answered and recognize that there is not a global answer for R vs. Python.
- IBM’s article highlights a key point that is vital and accurate: “Increasingly, the question isn’t which to choose, but how to make the best use of both programming languages for your specific use cases.”
One fault however (and this is true with many R vs. Python articles) is that they imply that R is only used by non-programmers:
“R is a statistical tool used by academics, engineers and scientists without any programming skills. Python is a production-ready language used in a wide range of industry, research and engineering workflows.”
The fact is that both R and Python are approachable to learn and have historically been favoured in different communities.
- RStudio presents interesting nuances to the R vs. Python debate, sharing a similar opinion to IBM’s: “both languages are valuable, and both are here to stay.”
Notably, the Chief Data Scientist for RStudio, and arguably one of the most prominent R programmers, Hadley Wickham, says “Use whatever makes you happy.” A company whose primary revenue stream began by delivering commercial software products that supported teams using their R GUI has made it a point to highlight their support of Python and other open-source data science software.
One actuary’s opinion
I’ve spent my fair share of time using Python; however, I have a personal preference for using R through the RStudio GUI. With R, I’ve performed general data management/manipulation, data visualization, predictive modelling, pricing, reserving, and capital modelling, all of which I know can also be written using Python. I was guided into R as my primary tool for actuarial data analysis through a combination of my university courses, and the preferences of my team and clients. What I was not guided by was any material difference in the functionality or performance of R over Python.
A lot of articles or blog posts that weigh in on the R vs. Python debate focus too heavily on comparing features. This is done without recognizing the fact that in most cases, both programming languages can do whatever an actuary requires. The debate shouldn’t yield an obvious answer, and in my opinion, the debate doesn’t need to happen.
Interoperability is the way forward
Organizations would do well to recognize that actuarial teams might not want to be limited to only one option. This is more important than picking one “best” software to use. In this context, interoperability1 refers to an actuarial team having many software options at their disposal without losing efficiency when it comes to using or exchanging data, code, and models.
The obvious advantage of an organization prioritizing interoperability is flexibility. With more programming language options, there are more features which will cater to a wider audience. Another interesting side-effect relates to “cognitive diversity.” It’s not a proven fact that an interoperable organization will be more cognitively diverse; however, it’s reasonable to assume that if two users are approaching the problem from two different programming languages, they will likely approach the problem in different ways. Cognitive diversity has been shown to have many benefits, it engages employees (especially those in younger generations), it boosts problem solving, and eliminates “groupthink.”
The disadvantage of building interoperability in an organization is, well, the building. It is not always an easy transition. It requires upfront work and buy-in from all members of the team. There are software that can help make the job easier:
- Both RStudio and Jupyter Notebook allow users to write code in both R and Python.
- Feather (V2) is a portable file format for storing data frames in both R and Python.
- reticulate is an R package that allows the user to run Python code, and rpy2 is a Python package that allows the user to run R code.
For most actuarial use cases, the differences in features between R and Python are negligible. However, a majority of actuaries will have a preference between the two programming languages, and that’s OK. For that matter, they might prefer using another option such as Julia, an open-source programming language that is gaining in popularity due to its processing speed and growing support community.
The bottom line is that organizations should adapt and build a cognitively diverse environment through interoperability. There is upfront work to achieve this, but it will yield significant benefits in the long run.
- Panel | Debunking the R vs. Python Myth | RStudio (2020)
- reticulate R package
- rpy2 Python package
- Interoperable data storage using Feather V2
1 I’m using the term interoperability based on a narrow view within the actuarial use of open-source software. This article provides a more detailed explanation of the broader idea of software interoperability: https://www.formstack.com/resources/blog-software-interoperability