Test your knowledge |
|
Hi everybody!
In this blog post, I would like to introduce to you guys Python, the most common programming language for practicing Machine Learning – Data Mining (ML and DM) today, and Jupyter, a convenient environment for writing Python code.
Python
Let’s start with Python.
How to choose a language?
When selecting a programming language to learn and use, what are your criteria? Let me guess, some of the most crucial factors should be:
With the criteria listed above, Python seems to be the most competent candidate for our purpose.
Python’s basic properties
Python is a high-level, interpreted and general-purpose programming language, first released by Guido van Rossum in 1991.
For ML and DM, there is a very big community of researchers and developers working in Python, together with a huge number of free, reliable, open-source libraries that can fulfill most of our needs (e.g. Numpy, Pandas, Scikit-learn and Matplotlib).
Version
To say a little bit about its history, the first version of Python was released in 1991, while Python 2 was then born in 2000, and finally, the newest version, Python 3 came to the world in 2008.
Note that Python 3 is not fully backward-compatible with its old versions, so be careful if you want to update your code from Python 2 to 3.
A brief comparison with R
R is another language that is frequently used for ML and DM. In the past, R, but not Python, was even the most popular choice in this field. This situation has only changed for several years when Python not only jumped to the top in ML and DM but also became the second-most favored programming language of the world in general.
R is well supported with a large number of useful libraries for statistical analysis, it also has a long history of being an essential tool for researching, so why is R being surpassed by Python? To me, I have used both R and Python and the following are the reasons why I prefer Python over R:
As a conclusion, I suggest you guys, especially if you are new to this field and are wondering what language to use, to use Python, or more specifically, Python 3 (as Python 2 is old and will not be supported from 1 Jan 2020).
Test your understanding |
|
All the code I will be using for my blogs on Machine Learning and Data Mining is supposed to be in Python 3, and most of which are written in an environment called Jupyter Notebook (or a new version, Jupyter Lab). Hence, let me introduce Jupyter to you guys.
Jupyter
Jupyter Notebook is a web-based interactive environment for coding Python.
Jupyter Lab is an updated version of the Jupyter Notebook with additional features (e.g. opening multiple notebooks in the same tab), which makes it more similar to a general IDE.
For simple usage, Jupyter Notebook seems to be enough. However, as things evolve, we should also adapt. In the future, it is much likely that Jupyter Lab will replace the old Notebook, given it is backward-compatible with Jupyter Notebook.
Below is a captured-screen of a sample notebook created with Jupyter Notebook.
You can see that it is quite simple. On the head is the name of our notebook, followed by a tool-bar and lastly, a notebook consisting of many cells. Right below each cell is its corresponding output. For example, in the first coding-cell, I write a command to print ‘Hello friends‘ and the greeting appears just below this cell.
A cell can contain Python code (of course) or a Markdown, which is just normal text. You can see in my above notebook, I actually use 2 cells as markdowns, the first contains ‘My introduction to Jupyter‘ and the second contains ‘Example of shared-memory‘.
Remember that even we can run each cell separately, all cells share the same memory space. It is different in compiled language that each file occupies its own memory, here in an interpreted language all the cells in the same notebook share a common memory. For example, in the second cell, I declare a variable x and assign value 100 to it, this value of x is then queried in the third cell, this means that the second and third cell share the same memory space.
You can try using Python with an IDE (like Spyder or PyCharm) and compare your experience with using Jupyter. If it is for data mining, data analysis or machine learning, I bet you will definitely love the Jupiters. It is much more comfortable.
To have more instruction on how to use Jupyter, you can click on Help on the tool-bar, or by going to Jupyter’s website.
Install Python and Jupyter
There are primarily 2 ways to install Python and Jupyter:
Test your understanding |
|
Conclusion
Python is a simple, easy-to-learn, versatile and “quite” fast programming language, thus it is the most ideal choice for Data Mining and Machine Learning at the moment.
Jupyter Notebook (or Lab) is a light-weighted, simple and comprehensive environment to manage your data analysis code.
To install Python, as well as Jupyter and other general libraries for ML and DM, I recommend to download and use Anaconda – a single package that contains all the needed.
References: