Python - Coding 101

Featured image

As mentioned in the previous post on Introduction to Data Science, coding is the most fundamental yet important skill in a Data Scientist Toolbox.

As a result, I have decided to launch Coding 101 - which I hope will act as a guideline for you on your first baby step towards being an expert coder. I also expect that you have already chosen which programming language to learn from my previous post on choosing R or Python

The 101 series will first cover the most basic field in programming, like how to do math with R or Python, data types in each of these languages, and the most popular functions and packages. Then we will talk about more advanced thing in Python and R like function writings and control flows. I will also introduce the most popular libraries in the 2 programming languages in the latter posts of the series.

After going the series, you will be equipped with enough knowledge and tools to start experimenting Exploratory Data Analysis, and extracting insights from data.

This is the 101 series for Python. If you are a R learner, refer to the series of R - Coding 101.

The Python - Coding 101 series will have the structure as follow (updating):

Welcome to the first article of the Python - Coding 101 series, after reading this article, you will learn:

In this article, we will talk about:

Please note that every line of code down below is made using Python 3. Syntax in Python 2 maybe different, so keep that in mind or things could get confusing easily.

First thing first, you need an environment to start learning and try some coding yourself, so follow these steps:

I. Basic

1. Operators

1.1. Arithmetic operators

Now that you have initiate an environment, first thing to try out is to use Python as a calculator!

Python provides operators, which are special symbols that represent computations like addition and multiplication. A summary of arithmetic operators in Python can be found below:

# add (=5)
print(2 + 3)
# subtract (=3)
print(4 - 1)
# multiply (=6)
print(2 * 3)
# divide (=3)
print(6 / 2)
# power (=9)
print(3 ** 2)
# modulo
print(5 % 2)
# quotient
print(5 // 2)

printis a function in Python to display result to screen. Using print here does not literally print anything on the paper, but it prompts Python to show the results of the code inside parentheses on the screen. If you don’t use print, only the last result will be displayed. More about function will be discussed later in this post.

Another important detail is the #. They denote the starting of a comment.Comment are a best practice when working with large project, where you need to write down what you are doing in each step. If you tell the program to run these comments, you will be ignored :D Comments has no effect on the execution of the program.

1.2. Logical operators

What if you want to compare two input together, to see if one is equal, or greater than the other?

In such cases, you may refer to logical operators. Logical operators includes:

Example:

(2 + 4) == (2 * 3)

The ouput will be of type bool, which will be discuss in the data types section.

2. Variables, expressions and statements

One of the most powerful features of a programming language is the ability to manipulate variables. A variable is a name that refers to a value.

Note that I use the term refers to a value instead of holding a value. Wonder why? Because variable is just a name pointing to the real value stored in memory cells.

An assignment statement creates a new variable and give it a value. In python, assignment statements can be written as follow: variable = value

Example:

number = 5
multiplication_expression = 5 * 2
squared = 2 ** 2
print(number)

Can you guess which of the 4 above are assignment statements?

In the above example, the 3 first expressions are assignment statement, whereas the last one is not. print(number) only displays 5 onto the screen, it does not create any variable, thus is not an assignment statement. Expression like print(number) is called statement. A statement is a unit of code that has an effect, like creating a variable or displaying a value. Generally, statements don’t have any values.

3. Data types

Many of the code we have written above use only one data types (int), I hope you noticed that. But in Python, data types are not limited to only int, there are also other built-in data types, as well as data types from external modules. Built-in data types are types which can be used directly without installing any libraries. You can use the type() function to retrieve its type.

3.1. Built-in

3.2. External data types

Apart from built-in data types, there are some other important data types that you need to know in order to master the art of data exploring, including but not limited to numpy array and pandas dataframe.

A NumPy array is a grid of values, all of the same type, and is indexed by a tuple of non-negative integers. A list is the Python equivalent of an array, but is resizeable and can contain elements of different types. However, lists are exposed to some disadvantages.

Firstly, a list take more space than an array. Using list is slower than NumPy array too.

Secondly, we cannot perform arithmetic functions on list. What would you expect the program to display when you use the below input.

test = [1,2,3]
print(test * 2)

If you expect to get a result like [2,4,6] then using list here is absolutely not a good choice. Using list as in the above example, the program will output [1,2,3,1,2,3]. Using NumPy array here, we can perform arithmetic functions on an array which we cannot do on a list.

Another advantage of using NumPy array is that you will be able to create multi-dimensional array, which is unavailable in python lists.

Pandas dataframe is adopted from R dataframe. A dataframe is a way to store data in rectangular grids that can easily be overviewed. Each row of these grids corresponds to measurements or values of an instance, while each column is a vector containing data for a specific variable. This means that a data frame’s rows do not need to contain, but can contain, the same type of values: they can be numeric, character, logical, etc.

DataFrames are similar to SQL tables or the spreadsheets that you work with in Excel or Google Spreadsheet. Pandas DataFrame is also an integral part of Python and NumPy ecosystems, bringing it a huge advantage.

4. Functions, methods and packages

For example:

string = 'I am a string'
print(string.upper()) # upper is a method
print(list()) # list is a function

As you may have seen, methods and functions are pretty much the same, but functions can be called only by its name, as it is defined independently. But methods can’t be called by its name only, we need to invoke the class by a reference of that class in which it is defined, i.e. method is defined within a class and hence they are dependent on that class.

You can find you more about class and object oriented programming here.

II. Advanced

1. Loop

So far we have learnt how to run code cells. But what if you want to run a code cell repeatedly, do we need to press Ctrl+Enter many times? Luckily there are loop to help us with such tasks.

Python has two primitive loop commands: while statement and for statement

2. Conditional & Control flow

From the previous tutorials in this series, you now have quite a bit of Python code under your belt. Everything you have seen so far has consisted of sequential execution, in which statements are always performed one after the next, in exactly the order specified.

But the world is often more complicated than that. Frequently, a program needs to skip over some statements, execute a series of statements repetitively, or choose between alternate sets of statements to execute. That is where control structures come in. A control structure directs the order of execution of the statements in a program.

Conditional & control flow is denoted by an if statement in Python. The syntax is as follow:

if <expr1>:
    <statement1>
elif <expr2>:
    <statement2>
else:
    <statement3>

in which <expr> is an expression evaluated in Boolean context (give out the result of True or False). The priority of evaluation follow a top-down order, meaning that <expr2> is evaluated only if <expr1> is False and <expr3> is evaluated only if both <expr1> and <expr2> is False.

Note that you can also include control flow into loops

III. Using external packages

Now you have got the knowledge of basic Python, the next steps is to leverage the advantages of external packages.

The most common packages include:

To learn these packages and to review what you have learnt so far, you can refer to cheat sheets, which will give you a brief overview of the packages and the most common function in such packages.

I have collected several cheat sheets so far, you can find the link to download them below:

IV. Last words

Now you have had enough in your data science toolbox, you can start doing your first project.

Guidance on your first project will be online shortly. So stay tuned!

If you have any questions, feel free to comment below, or contact me by the ‘Contact me!’ button on the top right corner of the website.

In the meantime, practice again and again and againnn.