Averaging data from multiple data files in Python with pandas(使用 Pandas 在 Python 中对来自多个数据文件的数据进行平均)
问题描述
我有 30 个来自我运行的实验的 30 次重复运行的 csv 数据文件.我正在使用 pandas 的 read_csv() 函数将数据读入 DataFrame 列表.我想从此列表中创建一个 DataFrame,其中包含每列 30 个 DataFrame 的平均值.有没有内置的方法来实现这一点?
为了澄清,我将在下面的答案中扩展示例.假设我有两个 DataFrame:
<预><代码>>>>X乙丙0 -0.264438 -1.026059 -0.6195001 0.927272 0.302904 -0.0323992 -0.264273 -0.386314 -0.2176013 -0.871858 -0.348382 1.100491>>>是乙丙0 1.923135 0.135355 -0.2854911 -0.208940 0.642432 -0.7649022 1.477419 -1.659804 -0.4313753 -1.191664 0.152576 0.935773我应该使用什么合并函数来使用 DataFrame 制作各种类型的 3D 数组?例如,
<预><代码>>>>automagic_merge(x, y)乙丙0 [-0.264438, 1.923135] [-1.026059, 0.135355] [-0.619500, -0.285491]1 [ 0.927272, -0.208940] [ 0.302904, 0.642432] [-0.032399, -0.764902]2 [-0.264273, 1.477419] [-0.386314, -1.659804] [-0.217601, -0.431375]3 [-0.871858, -1.191664] [-0.348382, 0.152576] [ 1.100491, 0.935773]所以我可以计算这些列表上的平均值、s.e.m 等,而不是整个列.
查看:
在[14]中:glued = pd.concat([x, y], axis=1, keys=['x', 'y'])在 [15] 中:粘合出[15]:xyA B C A B C0 -0.264438 -1.026059 -0.619500 1.923135 0.135355 -0.2854911 0.927272 0.302904 -0.032399 -0.208940 0.642432 -0.7649022 -0.264273 -0.386314 -0.217601 1.477419 -1.659804 -0.4313753 -0.871858 -0.348382 1.100491 -1.191664 0.152576 0.935773在 [16] 中:glued.swaplevel(0, 1, axis=1).sortlevel(axis=1)出[16]:乙丙x y x y x y0 -0.264438 1.923135 -1.026059 0.135355 -0.619500 -0.2854911 0.927272 -0.208940 0.302904 0.642432 -0.032399 -0.7649022 -0.264273 1.477419 -0.386314 -1.659804 -0.217601 -0.4313753 -0.871858 -1.191664 -0.348382 0.152576 1.100491 0.935773在 [17] 中:glued =glued.swaplevel(0, 1,axis=1).sortlevel(axis=1)在 [18] 中:粘合出[18]:乙丙x y x y x y0 -0.264438 1.923135 -1.026059 0.135355 -0.619500 -0.2854911 0.927272 -0.208940 0.302904 0.642432 -0.032399 -0.7649022 -0.264273 1.477419 -0.386314 -1.659804 -0.217601 -0.4313753 -0.871858 -1.191664 -0.348382 0.152576 1.100491 0.935773
为了记录,没有必要交换级别和重新排序,只是为了视觉目的.
然后您可以执行以下操作:
在[19]中:glued.groupby(level=0,axis=1).mean()出[19]:乙丙0 0.829349 -0.445352 -0.4524961 0.359166 0.472668 -0.3986502 0.606573 -1.023059 -0.3244883 -1.031761 -0.097903 1.018132
I have 30 csv data files from 30 replicate runs of an experiment I ran. I am using pandas' read_csv()
function to read the data into a list of DataFrames. I would like to create a single DataFrame out of this list, containing the average of the 30 DataFrames for each column. Is there a built-in way to accomplish this?
To clarify, I'll expand on the example in the answers below. Say I have two DataFrames:
>>> x
A B C
0 -0.264438 -1.026059 -0.619500
1 0.927272 0.302904 -0.032399
2 -0.264273 -0.386314 -0.217601
3 -0.871858 -0.348382 1.100491
>>> y
A B C
0 1.923135 0.135355 -0.285491
1 -0.208940 0.642432 -0.764902
2 1.477419 -1.659804 -0.431375
3 -1.191664 0.152576 0.935773
What is the merging function I should use to make a 3D array of sorts with the DataFrame? e.g.,
>>> automagic_merge(x, y)
A B C
0 [-0.264438, 1.923135] [-1.026059, 0.135355] [-0.619500, -0.285491]
1 [ 0.927272, -0.208940] [ 0.302904, 0.642432] [-0.032399, -0.764902]
2 [-0.264273, 1.477419] [-0.386314, -1.659804] [-0.217601, -0.431375]
3 [-0.871858, -1.191664] [-0.348382, 0.152576] [ 1.100491, 0.935773]
so I can calculate average, s.e.m., etc. on those lists instead of the entire column.
Check it out:
In [14]: glued = pd.concat([x, y], axis=1, keys=['x', 'y'])
In [15]: glued
Out[15]:
x y
A B C A B C
0 -0.264438 -1.026059 -0.619500 1.923135 0.135355 -0.285491
1 0.927272 0.302904 -0.032399 -0.208940 0.642432 -0.764902
2 -0.264273 -0.386314 -0.217601 1.477419 -1.659804 -0.431375
3 -0.871858 -0.348382 1.100491 -1.191664 0.152576 0.935773
In [16]: glued.swaplevel(0, 1, axis=1).sortlevel(axis=1)
Out[16]:
A B C
x y x y x y
0 -0.264438 1.923135 -1.026059 0.135355 -0.619500 -0.285491
1 0.927272 -0.208940 0.302904 0.642432 -0.032399 -0.764902
2 -0.264273 1.477419 -0.386314 -1.659804 -0.217601 -0.431375
3 -0.871858 -1.191664 -0.348382 0.152576 1.100491 0.935773
In [17]: glued = glued.swaplevel(0, 1, axis=1).sortlevel(axis=1)
In [18]: glued
Out[18]:
A B C
x y x y x y
0 -0.264438 1.923135 -1.026059 0.135355 -0.619500 -0.285491
1 0.927272 -0.208940 0.302904 0.642432 -0.032399 -0.764902
2 -0.264273 1.477419 -0.386314 -1.659804 -0.217601 -0.431375
3 -0.871858 -1.191664 -0.348382 0.152576 1.100491 0.935773
For the record, swapping the level and reordering was not necessary, just for visual purposes.
Then you can do stuff like:
In [19]: glued.groupby(level=0, axis=1).mean()
Out[19]:
A B C
0 0.829349 -0.445352 -0.452496
1 0.359166 0.472668 -0.398650
2 0.606573 -1.023059 -0.324488
3 -1.031761 -0.097903 1.018132
这篇关于使用 Pandas 在 Python 中对来自多个数据文件的数据进行平均的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:使用 Pandas 在 Python 中对来自多个数据文件的数据
- 我如何卸载 PyTorch? 2022-01-01
- 计算测试数量的Python单元测试 2022-01-01
- 使用 Cython 将 Python 链接到共享库 2022-01-01
- 检查具有纬度和经度的地理点是否在 shapefile 中 2022-01-01
- ";find_element_by_name(';name';)";和&QOOT;FIND_ELEMENT(BY NAME,';NAME';)";之间有什么区别? 2022-01-01
- 如何使用PYSPARK从Spark获得批次行 2022-01-01
- YouTube API v3 返回截断的观看记录 2022-01-01
- 我如何透明地重定向一个Python导入? 2022-01-01
- 使用公司代理使Python3.x Slack(松弛客户端) 2022-01-01
- CTR 中的 AES 如何用于 Python 和 PyCrypto? 2022-01-01