5 min to read
Python or R
I. Comparison
As stated in the previous article, programming plays a crucial role in a Data Scientist’s Toolbox. But there are sooooo many languages out there, how would you decide what to learn?
Wandering around google with the keyword “data science programming language”, you must have noticed that Python and R outweigh the rest for their support for data science tasks (statistics, machine learning, etc.). So the question of what programming language to use can be narrowed down to “Python or R”.
Both have great packages to support you with your data science task, and each come with its own strengths. There has been a lot of articles on Python and R, their differences and advantages over each other. Below is just some main points (personal experience included) to guide you through your selection. A detailed comparison will be provided at the end of this section.
1. Simlilarity
- Great language for data science and analytics.
- Both have amazing packages/modules (they are made of ready-served functions, for example creating a matrix, creating a dataframe; so you don’t have to made these function from scratch) to help you with your tasks like numpy, pandas, sklearn in Python and the famous tidyverse in R.
2. Difference
Python is a general purpose language, ranging from web development, web app, etc., whereas R is mainly used for statistical task. From my viewpoint, there are 6 main points that can help distinguish the two programming languages, which are: speed, community support, job opportunity, visualization, statistics and deep learning. The order is chosen from basic concepts to complicated ones.
-
Speed:
- Python is a high level programming language, so it is pretty slow
- R is slow too (with tidyverse). However, data.table package can significantly improve its speed. The issue can also be overcome with C/FORTRAN (other programming languages, which can be called inside R)
-> Plus for R
-
Community:
- Python has a large community from which you can find help.
- R community is somewhat smaller.
-> Plus for python
-
Job opportunity:
-
The demand for Python programmer has witnessed a sharp rise in recent years, from somewhat the same to twice as much as the demand for R. And I believe this trend will continue to present for a long period of time, considering the fact that Python is a general purpose programming language
-> Plus for Python
-
-
Visualization:
- Python seaborn and matplotlib are the two go-to modules when it comes to visualization. However, the results are more convoluted.
- R with the single ggplot2 package will basically cover your every basic needs of visualization. The syntax is straightforward and the setting can be understood and tweaked easily to configure your output.
-> Plus for R
-
Statistics:
- Python does have scipy to support statistical tasks, however, its uses and application is very limited compared to a statistical programming language like R
- R is basically a statistical programming language
-> Plus for R
-
Deep learning:
- Python is more suitable for deep learning and NLP tasks.
- R is not suitable for those tasks (very limited GPU support).
-> Plus for Python
So in my opinion, R is a better tool for data science. Python is cool too, but it is just too slow to fit my need. But what do I know :D All in all, as both have their own strengths, the result boils down to what language fits your need, not to compare one is better or another. The two languages are absolutely cool and nice to know as a Data Scientist. Some tasks will be more efficiently done in R and some should definitely be done in Python.
As an advice for you guys, I would highly recommend learning Python first, as its learning curve is pretty flat compared to R. Python’s syntax is also very easy to read, compared to R. So if you choose python as your first language to learn, your life would be much less painful.
For a detailed look at their pros and cons, you can access the infographic made by DataCamp here.
II. Installation
For personal reason, I usually use base Python/R together with an Integrated Development Environment (IDE). Using IDE, you will be provided with the benefits of: code editor, compiler, debugger, smart code completion. Sound cool right? But as these IDEs run on top of base R/Python, you know what you have to do first right?
1. R
- Base R can be installed from CRAN.
- RStudio - an R IDE platform can be installed via the following link.
2. Python
Same logic is applied for the installation of Python. However, the installation can be a little bit trickier.
-
Base
- Python can be installed from its homepage. I suggest using the 3.7 version of Python, since it is more widely supported by the developed modules and packages. One thing to remember, please make sure you checked the option “Add Python to PATH”, or else you will need to do it manually later.
- In order to install Python modules, you need pip. So make sure you install it before hand.
-
IDE: 2 options
- Option 1: As for Python IDE, I personally use Visual Studio Code, together with jupyter, which is needed to run IDE. Using Python with VS Code helps decrease the disk size needed to run Python, since you can choose which modules to install. After installing VS Code, you will need to navigate to its extension option and install Python extension from there too!
- Option 2: Some people may suggest using Anaconda, as its installation is pretty straight forward and with its installation, a whole environment is created and ready to be used with ease. However, such ease will also bring a big, heavy disadvantage. Yes you heard me right, heavy and big. A huge number of modules will be installed together with Anaconda, requiring a large amount of disk space.
Comments