23 min to read
Python - Coding 101
As mentioned in the previous post on Introduction to Data Science, coding is the most fundamental yet important skill in a Data Scientist Toolbox.
As a result, I have decided to launch Coding 101 - which I hope will act as a guideline for you on your first baby step towards being an expert coder. I also expect that you have already chosen which programming language to learn from my previous post on choosing R or Python
The 101 series will first cover the most basic field in programming, like how to do math with R or Python, data types in each of these languages, and the most popular functions and packages. Then we will talk about more advanced thing in Python and R like function writings and control flows. I will also introduce the most popular libraries in the 2 programming languages in the latter posts of the series.
After going the series, you will be equipped with enough knowledge and tools to start experimenting Exploratory Data Analysis, and extracting insights from data.
This is the 101 series for Python. If you are a R learner, refer to the series of R - Coding 101.
The Python - Coding 101 series will have the structure as follow (updating):
-
Coding 101
- Introduction (you are here!)
- Coding 101 - exercises
-
Writing functions in Python
-
Control flows in Python
-
Errors and exceptions
-
Classes
Welcome to the first article of the Python - Coding 101 series, after reading this article, you will learn:
- How to do math with Python
- What is a variable and how to create one
- Must-know data types in Python
- Basic syntax of looping and conditions
- Must-know library & package in Python
In this article, we will talk about:
Please note that every line of code down below is made using Python 3. Syntax in Python 2 maybe different, so keep that in mind or things could get confusing easily.
First thing first, you need an environment to start learning and try some coding yourself, so follow these steps:
- Open VS Code (installation guide here)
-
Select Python interpreter: use the Python: Select Interpreter command from the Command Palette (
Ctrl+Shift+P
) - Open jupyter notebook to start coding: use the Python: Create Blank New Jupyter Notebook command also from the Command Palette. Make sure that you have already install jupyter beforehand by opening terminal and input
pip install jupyter
.
I. Basic
1. Operators
1.1. Arithmetic operators
Now that you have initiate an environment, first thing to try out is to use Python as a calculator!
Python provides operators, which are special symbols that represent computations like addition and multiplication. A summary of arithmetic operators in Python can be found below:
# add (=5)
print(2 + 3)
# subtract (=3)
print(4 - 1)
# multiply (=6)
print(2 * 3)
# divide (=3)
print(6 / 2)
# power (=9)
print(3 ** 2)
# modulo
print(5 % 2)
# quotient
print(5 // 2)
print
is a function in Python to display result to screen. Using print
here does not literally print anything on the paper, but it prompts Python to show the results of the code inside parentheses on the screen. If you don’t use print
, only the last result will be displayed. More about function will be discussed later in this post.
Another important detail is the #
. They denote the starting of a comment.Comment are a best practice when working with large project, where you need to write down what you are doing in each step. If you tell the program to run these comments, you will be ignored :D Comments has no effect on the execution of the program.
1.2. Logical operators
What if you want to compare two input together, to see if one is equal, or greater than the other?
In such cases, you may refer to logical operators. Logical operators includes:
==
: equal!=
: different from>=
: greater than or equal>
: greater than<=
: smaller than or equal<
: smaller than
Example:
(2 + 4) == (2 * 3)
The ouput will be of type bool
, which will be discuss in the data types section.
2. Variables, expressions and statements
One of the most powerful features of a programming language is the ability to manipulate variables. A variable is a name that refers to a value.
Note that I use the term refers to a value instead of holding a value. Wonder why? Because variable is just a name pointing to the real value stored in memory cells.
An assignment statement creates a new variable and give it a value. In python, assignment statements can be written as follow: variable = value
Example:
number = 5
multiplication_expression = 5 * 2
squared = 2 ** 2
print(number)
Can you guess which of the 4 above are assignment statements?
In the above example, the 3 first expressions are assignment statement, whereas the last one is not. print(number)
only displays 5 onto the screen, it does not create any variable, thus is not an assignment statement. Expression like print(number)
is called statement. A statement is a unit of code that has an effect, like creating a variable or displaying a value. Generally, statements don’t have any values.
3. Data types
Many of the code we have written above use only one data types (int
), I hope you noticed that. But in Python, data types are not limited to only int
, there are also other built-in data types, as well as data types from external modules. Built-in data types are types which can be used directly without installing any libraries. You can use the type()
function to retrieve its type.
3.1. Built-in
-
Numeric
-
Integer:
int
Used to store natural number. Example:
4
,2
,12
-
Float:
float
Used to store fractional number. Accurate up to 15 decimal places. Example:
4.5
,2.3
-
Complex:
complex
Used to store complex number. written in the form,
x + yj
. Example:2 + 1j
-
-
Boolean
bool
. Used to store logical value.Example:
True
,False
(Notice which character is capitalized) -
Sequence
Sequence is an ordered collection of other values. You can use the
len()
function to get the length of a sequence.# string animal = 'cutecat' # List a_list = [1, 'element 2'] # how to access each value of the sequence? print("letter is" + letter) # how to access multiple values of the sequence? letters_1 = animal[1:] letters_2 = animal[:-1] letters_3 = animal[1:-2] letters_4 = animal[1:2] print("letters_1 is" + letters_1) print("letters_2 is" + letters_2) print("letters_3 is" + letters_3) print("letters_4 is" + letters_4) # changing element of a list a_list[0] = 'changed element' print('changed list' + a_list)
General
-
How to access each value of the sequence?
You can do it by using slicing operator
[index]
. The forward index is numbered from 0, backward is numbered from -1In the example above,
cutecat
is a string. Note that you need to use quotation mark if you want to denote something as a string. You should copy the script above into your notebook and run it to see what the result looks like. -
How to access multiple values of the sequence?
Do it by applying the syntax
[index1:index2]
. This syntax can be explained as following: getting value at index 1 (inclusive) to index 2 (exclusive). You can also use[:index2]
to get value from the beginning to index 2 (exclusive) or[index1:]
to get value from index 1 (inclusive) to the end of the sequence. -
How to concatenate sequences?
You can concatenate sequences by using
+
. Example:'cute' + 'cat'
. The output will be'cutecat'
.You can also use multiplication operator
*
. Using this, sequences will be multiplied. Example:'cutecat' * 2
. The output will be'cutecatcutecat'
.
Sequence data types
-
String:
str
String is defined within quotation mark
' '
.String are mutable. You can add/multiply strings by
+
or*
operator. -
List:
list
List is one of the most used datatype in Python and is very flexible. All the items in a list do not need to be of the same type. Example:
a_list = [1, 'element 2']
.List is defined within square bracket
[]
where items are separated by commas.Lists are mutable. You can add/multiply, change, delete list elements with ease.
- Add/Multiply: Use
+
or*
operator -
Change:
a_list[0] = 'changed element'
Input:
a_list = [1, 'element 2'] a_list[0] = 'changed element' print(a_list)
Output:
['changed element', 'element 2']
-
Delete:
del(a_list[0])
Input:
a_list = [1, 'element 2'] del(a_list[0]) print(a_list)
Output:
['element 2']
- Add/Multiply: Use
-
Tuple:
tuple
Tuple are like list, but they are immutable. Tuples are used to write-protect data and are usually faster than lists as they cannot change dynamically.
Tuple is defined within parentheses
()
where items are separated by commas.
-
-
Mapping
Dictionary:
dict
. Python dictionary is an unordered collection of key-value pair. Dictionary is designed to quickly retrieve data. However, you need to know the key in order to retrieve its value.Dictionary is defined within curly bracket
{}
where items are in the form ofkey:value
pair. Value can be of any kinds, but key need to be of immutable data type (which is almost every data type but list).You can retrieve, add, change the data inside dictionary using a syntax like the sliding operator in sequences
-
Retrieve:
Input:
dictionary = {1:'value','key':2} print("first value is", d[1]);
Output:
first value is value
-
Add:
Input:
dictionary = {1:'value','key':2} dictionary['new_key'] = 'new_value' print(dictionary)
Output:
{1: 'value', 'key': 2, 'new_key': 'new_value'}
-
Change:
Input:
dictionary = {1:'value','key':2} dictionary['key'] = 'new_value' print(dictionary)
Output:
{1: 'value', 'key': 'new_value'}
-
-
Set
set
is an unordered collection of unique items. Set is defined by values separated by comma inside braces{ }
. Since, set are unordered collection, indexing has no meaning. Hence, the slicing operator[]
does not work.You can use set operation like union, intersect
-
Union:
Input:
set1 = {12,3,1} set2 = {12,1,3,4,4,6,13} print(set1.union(set2))
Output:
{1, 3, 4, 6, 12, 13}
-
Intersect:
Input:
set1 = {12,3,1} set2 = {12,1,3,4,4,6,13} print(set1.intersection(set2))
Output:
{1, 3, 12}
-
3.2. External data types
Apart from built-in data types, there are some other important data types that you need to know in order to master the art of data exploring, including but not limited to numpy array and pandas dataframe.
- NumPy array
A NumPy array is a grid of values, all of the same type, and is indexed by a tuple of non-negative integers. A list is the Python equivalent of an array, but is resizeable and can contain elements of different types. However, lists are exposed to some disadvantages.
Firstly, a list take more space than an array. Using list is slower than NumPy array too.
Secondly, we cannot perform arithmetic functions on list. What would you expect the program to display when you use the below input.
test = [1,2,3]
print(test * 2)
If you expect to get a result like [2,4,6]
then using list here is absolutely not a good choice. Using list as in the above example, the program will output [1,2,3,1,2,3]
. Using NumPy array here, we can perform arithmetic functions on an array which we cannot do on a list.
Another advantage of using NumPy array is that you will be able to create multi-dimensional array, which is unavailable in python lists.
- Pandas dataframe
Pandas dataframe is adopted from R dataframe. A dataframe is a way to store data in rectangular grids that can easily be overviewed. Each row of these grids corresponds to measurements or values of an instance, while each column is a vector containing data for a specific variable. This means that a data frame’s rows do not need to contain, but can contain, the same type of values: they can be numeric, character, logical, etc.
DataFrames are similar to SQL tables or the spreadsheets that you work with in Excel or Google Spreadsheet. Pandas DataFrame is also an integral part of Python and NumPy ecosystems, bringing it a huge advantage.
4. Functions, methods and packages
-
Methods
- Method is called by its name, but it is associated to an object (dependent).
- A method is implicitly passed the object on which it is invoked.
- It may or may not return any data.
- A method can operate on the data (instance variables) that is contained by the corresponding class
-
Functions
- Function is block of code that is also called by its name. (independent)
- The function can have different parameters or may not have any at all. If any data (parameters) are passed, they are passed explicitly.
- It may or may not return any data.
- Function does not deal with Class and its instance concept.
For example:
string = 'I am a string'
print(string.upper()) # upper is a method
print(list()) # list is a function
As you may have seen, methods and functions are pretty much the same, but functions can be called only by its name, as it is defined independently. But methods can’t be called by its name only, we need to invoke the class by a reference of that class in which it is defined, i.e. method is defined within a class and hence they are dependent on that class.
You can find you more about class and object oriented programming here.
-
Packages
To understand Python packages, you need to understand the concept of Python module first.
A module can be considered a code library, which is a file containing a set of functions you want to include in your application. And a Python packages is a set of Python modules.
Some of the most popular packages include SciPy (for statistical tasks), Pandas (for dataframe), sklearn (for machine learning)
II. Advanced
1. Loop
So far we have learnt how to run code cells. But what if you want to run a code cell repeatedly, do we need to press Ctrl+Enter
many times? Luckily there are loop to help us with such tasks.
Python has two primitive loop commands: while
statement and for
statement
-
while
statementThere are two main part of a while loop: the condition and the set of statements. With the while loop we can execute a set of statements as long as a condition is true
i = 1 while i < 6: print(i) i += 1
In the above example, the condition of the loop is
i < 6
and the statements to be executed isprint(i)
andi += 1
. The loop can be interpreted as follow: starting from i=1, display i to the screen and increment i by one unit, repeat that until i=5 then end the loop.In while loop, you can also introduce some keywords to make it suit your need, including
break
,continue
andelse
-
break
: With the break statement we can stop the loop even if the while condition is true# exit the loop when i = 3 i = 1 while i < 6: print(i) if i == 3: break i += 1
-
continue
: stop the current iteration, and continue with the next# Continue to the next iteration if i is 3 (do not print 3) i = 0 while i < 6: i += 1 if i == 3: continue print(i)
-
else
: run a block of code once when the condition no longer is true# Print a message once the condition is false: i = 1 while i < 6: print(i) i += 1 else: print("i is no longer less than 6")
-
-
for
statementRemember the sequence data type above? Its elements can be iterated over by a for loop.
fruits = ["apple", "banana", "cherry"] for x in fruits: print(x)
In the above example, the loop can be interpreted as follow: for each element in fruits, print that element.
You can also include a for loop inside a for loop, making it a nested loop, in which the “inner loop” will be executed one time for each iteration of the “outer loop”.
adj = ["red", "big", "tasty"] fruits = ["apple", "banana", "cherry"] for x in adj: for y in fruits: print(x, y)
In for loop,
break
,continue
andelse
discussed in the above section can also be applied.
2. Conditional & Control flow
From the previous tutorials in this series, you now have quite a bit of Python code under your belt. Everything you have seen so far has consisted of sequential execution, in which statements are always performed one after the next, in exactly the order specified.
But the world is often more complicated than that. Frequently, a program needs to skip over some statements, execute a series of statements repetitively, or choose between alternate sets of statements to execute. That is where control structures come in. A control structure directs the order of execution of the statements in a program.
Conditional & control flow is denoted by an if
statement in Python. The syntax is as follow:
if <expr1>:
<statement1>
elif <expr2>:
<statement2>
else:
<statement3>
in which <expr>
is an expression evaluated in Boolean context (give out the result of True or False). The priority of evaluation follow a top-down order, meaning that <expr2>
is evaluated only if <expr1>
is False and <expr3>
is evaluated only if both <expr1>
and <expr2>
is False.
Note that you can also include control flow into loops
III. Using external packages
Now you have got the knowledge of basic Python, the next steps is to leverage the advantages of external packages.
The most common packages include:
- NumPy and Pandas for data wrangling
- matplotlib and seaborn for visualization
- sklearn for machine learning
To learn these packages and to review what you have learnt so far, you can refer to cheat sheets, which will give you a brief overview of the packages and the most common function in such packages.
I have collected several cheat sheets so far, you can find the link to download them below:
- Basic Python (source: dataquest)
- Intermediate Python (source: dataquest)
- Pandas (source: dataquest)
- Seaborn (source: datacamp)
- Scikit-learn (source: datacamp)
IV. Last words
Now you have had enough in your data science toolbox, you can start doing your first project.
Guidance on your first project will be online shortly. So stay tuned!
If you have any questions, feel free to comment below, or contact me by the ‘Contact me!’ button on the top right corner of the website.
In the meantime, practice again and again and againnn.
Comments