Skip to content
Snippets Groups Projects

add pandas df filtering exercise

Merged Toby Hodges requested to merge pandas-ex-1 into master
All threads resolved!

for context: the dataframe mentioned in the exercise is built from this CSV, i.e.

covid_cases = pd.read_csv('https://git.embl.de/grp-bio-it-workshops/intermediate-python/-/raw/master/data/CovidCaseData_20200624.csv')

one possible solution would be:

covid_cases[covid_cases['year'] == 2019][covid_cases['cases'] > 0]

Happy for review from anyone @ralves @meechan @ext.bauer

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Toby Hodges mentioned in merge request !12 (merged)

    mentioned in merge request !12 (merged)

    • Resolved by Toby Hodges

      Nice exercise. Like it. I could not test the compilation.

      Remark on solution:

      Proposed solution leads to UserWarning, as index array of second "filter" is too large for result of first filter.

      In [9]: covid_cases[covid_cases['year'] == 2019][covid_cases['cases'] > 0]                           
      <ipython-input-9-de53dbbe1f91>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
        covid_cases[covid_cases['year'] == 2019][covid_cases['cases'] > 0]
      Out[9]: 
               dateRep  day  month  year  ...  geoId  countryterritoryCode   popData2019 continentExp
      5181  31/12/2019   31     12  2019  ...     CN                   CHN  1.433784e+09         Asia
      
      [1 rows x 11 columns]
      

      Proposal: Combine index arrays with booleans

      c = covid_cases                                                                              
      c[(c['year']==2019) & (c['cases']>0)] 

      Explanation:

      In [1]: import pandas as pd                                                                          
      
      In [2]: covid_cases = pd.read_csv('https://git.embl.de/grp-bio-it-workshops/intermediate-python/-/raw
         ...: /master/data/CovidCaseData_20200624.csv')                                                    
      
      In [3]: c0 = covid_cases                                                                             
      
      In [4]: c1 = c0[c0['year']==2019]                                                                    
      
      In [5]: c2 = c1[c1['cases']>1]                                                                       
      
      In [6]: c1[c0['cases']>1]                                                                            
      <ipython-input-6-81a87b0cc528>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
        c1[c0['cases']>1]
      Out[6]: 
               dateRep  day  month  year  ...  geoId  countryterritoryCode   popData2019 continentExp
      5181  31/12/2019   31     12  2019  ...     CN                   CHN  1.433784e+09         Asia
      
      [1 rows x 11 columns]
      Edited by Julian Bauer
  • Toby Hodges added 1 commit

    added 1 commit

    Compare with previous version

  • Toby Hodges resolved all threads

    resolved all threads

  • Toby Hodges added 1 commit

    added 1 commit

    • 37f4c945 - add missing solution style tag

    Compare with previous version

  • Toby Hodges added 25 commits

    added 25 commits

    Compare with previous version

  • Renato Alves
  • Toby Hodges added 50 commits

    added 50 commits

    Compare with previous version

  • merged

  • Toby Hodges mentioned in commit 666a5414

    mentioned in commit 666a5414

  • Toby Hodges resolved all threads

    resolved all threads

  • Please register or sign in to reply
    Loading