Pandas Groupby: a simple but detailed tutorial | by Shiu-Tang Li – Towards Data Science
Sign up
Sign in
Sign up
Sign in
Member-only story
Shiu-Tang Li
Follow
Towards Data Science
--
4
Share
Pandas groupby is quite a powerful tool for data analysis. However, it’s not very intuitive for beginners to use it because the output from groupby is not a Pandas Dataframe object, but a Pandas DataFrameGroupBy object. A DataFrame object can be visualized easily, but not for a Pandas DataFrameGroupBy object. If an object cannot be visualized, then this makes it harder to manipulate.
Some of the tutorials I found online contain either too much unnecessary information for users or not enough info for users to know how it works. I think a guide which contains the key tools used frequently in a data scientist’s day-to-day work would definitely help, and this is why I wrote this article to help the readers better understand pandas groupby.
Important notes.
1. I assume the reader already knows how group by calculation works in R, SQL, Excel (or whatever tools), before getting started.
2. All codes are tested and they work for Pandas 1.0.3. There could be bugs in older Pandas versions.
Examples will be provided in each section — there could be different ways to generate the same result, and I would go with the one I often use.
Here’s the outline:
In order to generate the statistics for each group in the data set, we need to classify the data into groups, based on one or more columns.
--
--
4
Towards Data Science
Mathematician → Data scientist → Software engineer
Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams
source