My Data Science

Featured image

Data is the new oil.

Data Science is the sexiest job of the 21st century.

These days, I believe that you have heard about sayings like these at least once. The popularity and usability of Data Science is undoubtable, but have you really understand what Data Science is?

From my personal experience, I believe that a data scientist is a person who knows more about Computer Science than a Statistician, and knows more about Statistics than a Computer Scientist.

Thus, in order to become such data scientist, I have been taking the below approach:

Basic Programming

In order to do data science, the first thing you have to know is to code. Of course you can use some applications like Stata or SPSS to support you with some statistical tasks.

However, programming language like R and Python will provide you with more flexibility and access to algorithms.

There are great paid resources for learning how to code, such as DataCamp or DataQuest. If you want to learn how to program in R/Python/SQL, I will try my best to help you learn to code via posts in this website.

Starting your first project

With the skills you have learnt from basic programming, you can now start doing some small projects to analyze data and gain insights, such as the project with the iris species.

This will not only a great way for you to sharpen your analytical skills, but it also acts as a way for you to review your coding capability. Learning without actual work is like water off a duck’s back. You will easily forget what you have learnt if you don’t review it regularly.

My favorite go-to website for such project is kaggle and drivendata.

Math/Statistics

Math and Statistics is the heart of data science. Without math and stats, algorithms cannot be developed and machine learning will not be born. Nonetheless, math/statistics have not been given enough attention, mostly because of some “coding bootcamp” which guarantee that you can throw away years of learning math and become a data scientist in just some weeks.

Well that is true though, if you want to become a shitty data scientist!

I would recommend Khan academy for basic understanding of these. Of course I will also include some posts dedicated to this subject in this website for those who want to learn directly from this blog.

Machine Learning

Supervised, unsupervised and reinforcement learning also play an crucial role in a Data Scientist Toolbox. They can help unearth underlying insights and patterns from input data, which cannot be done with mere human eyes.

Deep Learning

Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. From my point of view, if you don’t work in a heavily technical company, deep learning cannot be fully utilized, since it needs a hugee amount of data (millions of records) and requires high computational cost.

But to some extent, the knowledge of deep learning can help you in large-scale projects like: Automatic speech recognition, Image recognition, Natural Language Processing, etc.

Last words

Last words: I will try my best to cover these aspects of data science in my blogs, so that you can have a structured approach towards the field. So stay tuned for more updates!