Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. Pandas comes with two primary data structures

  • Series – (One dimensional)
  • DataFrame – (Two dimensional)

These two structures helps us to handle majority of the usecases. Those who are handy with R programming language can easily implement their logic in a much powerful and better way using python pandas. Users get almost all the functionalities present in the R’s dataframe. Pandas is built on top of the popular Numpy package.

Pandas has very good timeseries data handling and processing capability. We can avoid unnecessary loops and logic by implementing pandas. It is capable of doing

  • Frequency conversion (Eg: creating 5 minute data using a dataset with 1 second frequency),
  • Data range generation
  • Moving window statistics
  • date shifting etc.

Since there are so many documents related to the pandas, I am not going to explain pandas in detail. I will be explaining some usecases with pandas implementation in my further blog posts. I will be using pandas and other scientific libraries extensively in my upcoming blog posts.

 

Advertisement