Python for Data Science

Mohammed Ameen
Analytics Vidhya
Published in
7 min readNov 10, 2020

--

Python is a popular programming language used heavily in Data Science. In these articles, we will be covering all the basics which you should know before we go into how to use Python for Data Science. The concepts which we will be covering are:

  1. Introduction to Python.
  2. Data Structures.
  3. Conditions and Loops.
  4. Functions.
  5. Exceptions and File Handling.
  6. Regular Expression and Web scraping.
  7. Object-Oriented Programming(OOPS).

Once we are done with the topics mentioned above, we are gonna learn the three important libraries. These libraries are:

  1. NumPy.
  2. Pandas.
  3. Matplotlib.

To make it easier to read, learn and practice, I’ll break down these ten topics into ten articles!. I hope you all are excited and eager to learn. Let’s dive in.

1. Introduction to Python

What is Python? Why should we use it for Data Science?

Python is a popular, dynamic, interpreted, object-oriented, high-level programming language which is easy to learn and understand. It is was developed by Guido van Rossum in the late 1980s. It can be found easily anywhere today. Just to name few, python is not only used in small projects but big companies like Google, Microsoft, Facebook, Netflix or Nasa.

Python is the fastest-growing programming languages according to StackOverflow.

In Data science Python is widely used. Also, it is a favourite tool of a data scientist as it is easy to learn and supported on multiple platforms. Due to its massive libraries, It is the best language used by data scientists for various data science projects and applications. Its open-source community is increasing day by day.

Some of the advantages of using Python:

  1. Readable and Maintainable code.
  2. Helps in developing large and complex software applications.
  3. Allows us to run the same code in different systems.
  4. Very large and robust libraries.
  5. Many open-source frameworks and tools like NumPy, Pandas, Matplotlib, SciPy, Scikit Learn etc are being used in machine learning and data analytics.

Before we get started I would highly recommend you to install the latest version of python and download anaconda. Get familiar with jupyter notebook. and you will be good to go!

Our First Python program (Hello World)

Open Jupyter Notebook and in the Python command line type the following code!! — Press shift + enter and there you go. Congratulation you just wrote your first program. Hurray!!

Python: Variables and Keywords

Variables are containers for storing data values. In Python, we assign values to variables. The variable is created the moment you assign a value to it.

Let’s try to visualise! it! Imagine you have a box. In that, you can place a single item. Here let the box be the container and the item being the data value.

As you can see from the above picture, here the box is a variable which holds a string: “Samia”. In Python 3 a string is a Unicode character(eg “H20-O2” is a string). In Python, we can identify strings by single or double quotation marks.

It is very important that we understand the assignment operator ( = ).

To understand it better let’s take an example say that we created a variable name myNum which holds an integer: 4.

Each time we try to use the same variable box to store another number, the initial value gets erased. It is important to know that in Python every variable can be overwritten.

Let’s take another example

Say we have a students information including whether they are covid positive or negative. Let’s take four attributes of a particular student(name, age, marks, positive_or_negative) and store them in Python variables!. Type the following code in the Jupyter notebook cell:

name = “Carlton”
age = 21
marks = 92.5
positive_or_negative = False

Just like other programming languages in Python we have different data types. As you can see the name variable holds string: “Carlton” which is a string value surrounded by double quotation marks.

The age and marks variables hold integer: 21 and float:92.5 which are numeric values. There is three numeric Python data type:

  1. int
  2. float
  3. complex

The positive_or_negative is a Boolean value. A boolean literal can have only two values True or False.

In boolean literal, True is equal to 1 and False is equal to 0.

To verify the type of object in python we use type( ) functions. Don’t worry about functions as of now. Just try it!!

Example

x = 10

print(type(x))

Python Basic Mathematical Operators.

Operators are used in performing operations on variables and values. There are various mathematical operators in Python:

  • Arithmetic operators
  • Comparison operators
  • Logical operators
  • Assignment operators
  • Identity operators
  • Membership operators

1. Arithmetic Operators.

Arithmetic operators are used with numeric values to perform common mathematical operations.

Let’s us take two variables x and y and perform these operations in Jupyter notebook:

2. Comparison Operators.

Comparison operators are used to comparing two values. The result will give us True or False Boolean values.

Performing comparison operators on two variables x = 4 and y = 6 in Jupyter notebook:

3. Logical operators.

Logical operators are used to combining conditional statements.

4. Assignment Operators.

Assignment operators are used in assigning values to variables. Some of the operators are:

Let’s try to understand the assignment operators by assigning a value to a variable c. I want you to have fun and play around with different operators in your Jupyter notebook.

I’m gonna assign an integer value 23 to the variable c.

5. Identity Operators.

Identity operators are used in comparing the objects, not if they are equal, but if they are the same object, with the same memory location:

Operator Description Example is Returns True if both the variables are the same objects. x is y is not Returns True if both the variables are not the same objects. x is not y

6. Membership operators.

Membership operators are used in testing if a sequence is present in an object!.

Operator Description Example in Returns True if a sequence with the specified value is present in the object x in y not in Returns True if a sequence with the specified value is not present in the object
x not in y

For example, let’s consider a name ram and we assign the name value to a variable name. Let’s say we want to know whether the letter is present or not.

name = “ram”

Note: try it with your values in your Jupyter notebook! . You will only learn when you practice and understand the concept.

Python Operators Precedence.

Operators precedence affects how an expression is evaluated.

The following table lists all operators from highest precedence to lowest.

Example:

x = 10 + 6 * 3; here, x is assigned 28 not 48 because operator * has higher precedence than +. So, it multiples 6 * 3 first then adds 10.

Python Comments

Comments are my favorite. It not only helps you to make someone understand your code but also helps you to make the code more readable.

Comments start with a # and whatever you write after that Python ignores it.

Conclusion.

In this article, we covered the introduction to Python and the fundamentals for you to start your journey into Python for Data Science where you learned what is variables and different data types. Different Python mathematical operators and how to use them. In the next article, I will be introducing data structures in Python. Till then happy learning!

--

--

Mohammed Ameen
Analytics Vidhya

Data Science enthusiast-Love reading books and exploring different cultures.