Numpy V Pandas

November 18, 2019

Numpy V Pandas

Numpy and Pandas are not two separate packages, they are used together all the time.
With a different perspective, I write this post to share the differences between them, and which is best for a particular situation. They are powerful and are the building blocks of Data Analysis and Scientific Computations.Before we go to the main content, we can know about those briefly.

Numpy

The NUMerical PYthon has nd.array(N Dimensional) functionality, which holds homogeneous data. It is built with the help of Cython and lower level functionalities. These nd.arrays are pretty much similar to the conventional array/list, but it is faster that them.

>>> import numpy as np
>>> a = np.arange(10).reshape(2,5)
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])

We can get the datatype of the array, by using dtype.name attribute.

>>> a.dtype.name

'int64'

>>> type(a)

<type 'numpy.ndarray'>

For more info: Numpy Documentation

Pandas

The pandas is short for PANel DAta has dataframe, which holds heterogeneous data. It is built on top of numpy, thereby inheriting the qualities. They are similar to numpy arrays normally, but it can do more that. We can also convert pandas dataframe to numpy arrays, but it is a costly operation which requires typecasting.

>>> import numpy as np
>>> import pandas as pd

1 Dimensional pandas arrays are called as Series, and can be created like this

>>> s = pd.Series([1, 3, 5, np.nan, 6, 8])
>>> s
0 1
1 3
2 5
3 NaN
4 6
5 8
dtype: float64

2 Dimensional pandas arrays are called as Data Frame, We are going to create a Data Frame but first we create series of indices.

>>> dates = pd.date_range('20130101', periods=6)
>>> print(dates)
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01, ..., 2013-01-06]
Length: 6, Freq: D, Timezone: None

After creating the indices, we create the data frame

>>> df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
>>> df
A B C D
2013-01-01 -0.228804 1.756711 0.029835 0.589072
2013-01-02 -0.214418 0.073005 -0.339403 -0.523901
2013-01-03 0.515138 -0.603327 0.785776 -0.661374
2013-01-04 -0.154879 -1.164844 -1.618861 0.904558
2013-01-05 -0.669651 -1.488846 1.431594 1.468455

2013-01-06 1.037434 -0.596740 -0.451529 -0.288568

Numpy Vs Pandas

Numpy consumes less memory, can perform faster when compared with Numpy.
Pandas has wide range of functions to read tablular files such as CSV, TSV, etc. and also can get data from realtime database like MySQL.
Numpy performs better when it has 50K rows, whereas the pandas can perform well with more than 500K rows.
We can integrate Numpy with C/C++ and Fortran code.
Both are not independent, they might look as two different package,but they are not! Pandas is built on top of Numpy.
Slicing works different between them.
In Dataframe, we can have column names which looks more readable.

Search This Blog

Tech Talk

Featured

Adobe Experience Manager - Create an OSGI Configuration

Numpy V Pandas

Comments

Post a Comment

Popular Posts

Does Savitar exist in Hindu Mythology?

Adobe Experience Manager - configure OSGI Configurations