Panda dataframe conditional .mean() depending on values in certain column( pandas 数据框条件 .mean() 取决于特定列中的值)
问题描述
我正在尝试创建一个新列,该列返回同一 df 中现有列的值的平均值.但是,平均值应根据其他三列中的分组来计算.
I'm trying to create a new column which returns the mean of values from an existing column in the same df. However the mean should be computed based on a grouping in three other columns.
Out[184]: 
   YEAR daytype hourtype  scenario  option_value    
0  2015     SAT     of_h         0      0.134499       
1  2015     SUN     of_h         1     63.019250      
2  2015     WD      of_h         2     52.113516       
3  2015     WD      pk_h         3     43.126513       
4  2015     SAT     of_h         4     56.431392 
当YEAR"、daytype"和hourtype"相似时,我基本上希望有一个新列mean"来计算option value"的平均值.
I basically would like to have a new column 'mean' which compute the mean of "option value", when "YEAR", "daytype", and "hourtype" are similar.
我尝试了以下方法但没有成功...
I tried the following approach but without success ...
In [185]: o2['premium']=o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_cf'].mean()
TypeError: incompatible index of inserted column with frame index
推荐答案
这是一种方法
In [19]: def cust_mean(grp):
   ....:     grp['mean'] = grp['option_value'].mean()
   ....:     return grp
   ....:
In [20]: o2.groupby(['YEAR', 'daytype', 'hourtype']).apply(cust_mean)
Out[20]:
   YEAR daytype hourtype  scenario  option_value       mean
0  2015     SAT     of_h         0      0.134499  28.282946
1  2015     SUN     of_h         1     63.019250  63.019250
2  2015      WD     of_h         2     52.113516  52.113516
3  2015      WD     pk_h         3     43.126513  43.126513
4  2015     SAT     of_h         4     56.431392  28.282946
那么,你的尝试出了什么问题?
So, what was going wrong with your attempt?
它返回一个与原始数据框形状不同的聚合.
It returns an aggregate with different shape from the original dataframe.
In [21]: o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value'].mean()
Out[21]:
YEAR  daytype  hourtype
2015  SAT      of_h        28.282946
      SUN      of_h        63.019250
      WD       of_h        52.113516
               pk_h        43.126513
Name: option_value, dtype: float64
或者使用变换
In [1461]: o2['premium'] = (o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value']
                              .transform('mean'))
In [1462]: o2
Out[1462]:
   YEAR daytype hourtype  scenario  option_value    premium
0  2015     SAT     of_h         0      0.134499  28.282946
1  2015     SUN     of_h         1     63.019250  63.019250
2  2015      WD     of_h         2     52.113516  52.113516
3  2015      WD     pk_h         3     43.126513  43.126513
4  2015     SAT     of_h         4     56.431392  28.282946
                        这篇关于 pandas 数据框条件 .mean() 取决于特定列中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:pandas 数据框条件 .mean() 取决于特定列中的值
				
        
 
            
        - 我如何卸载 PyTorch? 2022-01-01
 - 使用公司代理使Python3.x Slack(松弛客户端) 2022-01-01
 - 如何使用PYSPARK从Spark获得批次行 2022-01-01
 - 计算测试数量的Python单元测试 2022-01-01
 - YouTube API v3 返回截断的观看记录 2022-01-01
 - 使用 Cython 将 Python 链接到共享库 2022-01-01
 - 检查具有纬度和经度的地理点是否在 shapefile 中 2022-01-01
 - 我如何透明地重定向一个Python导入? 2022-01-01
 - CTR 中的 AES 如何用于 Python 和 PyCrypto? 2022-01-01
 - ";find_element_by_name(';name';)";和&QOOT;FIND_ELEMENT(BY NAME,';NAME';)";之间有什么区别? 2022-01-01
 
