如何在python中实现概率分布的Conflation?-Python问题

How to implement Conflation for probability distribution in python?(如何在python中实现概率分布的Conflation?)

本文介绍了如何在python中实现概率分布的Conflation?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在网上查找了将多个连续概率分布组合成一个连续概率分布的方法.这种方法叫做Conflation，该方法可以在下面的文章中找到:

假设我有大约 4 个列表，例如 4 个范数分布，

list_1 = [5, 8, 6, 2, 1]list_2 = [2, 6, 1, 3, 8]list_3 = [1, 9, 2, 7, 5]list_4 = [3, 2, 4, 1, 6]并实现 Conflation 结果列表变成，
Con_list = [2.73, 34.56, 3.69, 3.23, 12]
(如果我错了，请纠正我)
如何将照片中的两个方程实现到python中以获得输入的PDF分布的Conflation?
我在关于平均列表之前发现了stackflow问题，代码如下，
def 平均(l):伦 = 伦 (l)定义除法(x):返回 x/llen# return map(divide, map(sum, zip(*l)))返回地图(除法，地图(总和，zip(l)))
我一直在尝试重新编码这个函数以遵循上面的等式，但我找不到一种方法来将连续分布的 pdf 合并.
编辑 1:
根据 @Josh Purtell 的回答，我重写了代码，但是，我不断收到以下错误消息:
错误信息:
回溯(最近一次调用最后一次):文件/tmp/sessions/c903d99d60f20c3b/main.py"，第 72 行，在 <module> 中.图=conflate_pdf(域，dists，lb，ub)文件/tmp/sessions/c903d99d60f20c3b/main.py"，第 58 行，在 conflate_pdfdenom = quad(prod_pdf, lb, ub, args=(dists))[0]文件/usr/local/lib/python3.6/dist-packages/scipy/integrate/quadpack.py"，第341行，四边形点)文件/usr/local/lib/python3.6/dist-packages/scipy/integrate/quadpack.py"，第 448 行，在 _quad 中return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)类型错误:只有大小为 1 的数组可以转换为 Python 标量
代码:
def prod_pdf(x,pdfs):prod=np.ones(pdfs[0].shape[0])对于 pdf 中的 pdf:产品=产品*pdf退货def conflate_pdf(x,dists,lb,ub):denom = quad(prod_pdf, lb, ub, args=(dists))[0]返回 prod_pdf(x,dists)/denom磅=-10ub=10域= np.arange(lb,ub,.01)dist_1 = stats.norm.pdf(域，2,1)dist_2 = stats.norm.pdf(域，2.5,1.5)dist_3 = stats.norm.pdf(域，2.2,1.6)dist_4 = stats.norm.pdf(域，2.4,1.3)dist_5 = stats.norm.pdf(域，2.7,1.5)dist=[dist_1, dist_2, dist_3, dist_4, dist_5]图=conflate_pdf(域，dists，lb，ub)从 matplotlib 导入 pyplot 作为 pltplt.plot(域，dist_1)plt.plot(域，dist_2)plt.plot(域，dist_3)plt.plot(域，dist_4)plt.plot(域，dist_5)plt.plot(域，图)plt.xlabel(域")plt.ylabel(pdf")plt.title(合并的PDF")plt.show()
从代码来看，是什么导致了这个错误?
编辑 2:
我设法重写代码以查看分发列表，而不是在 Edit 1 中的产品函数中获取 pdf，但是，我仍然在 Edit 中遇到相同的错误1.
代码:
def prod_pdf(x,pdfs):prod=np.ones(np.array(pdfs)[0].shape)对于 pdf 中的 pdf:打印(生产)对于 enumerate(pdf) 中的 c,y:prod[c]=prod[c]*y打印('最终:'，产品)退货def conflate_pdf(x,dists,lb,ub):denom = quad(prod_pdf, lb, ub, args=(dists))[0]打印('Denom:'，denom)打印('产品pdf:'，prod_pdf(x，dists))conflated_pdf=prod_pdf(x,dists)/denom打印(conflated_pdf)返回 conflated_pdf磅=-10ub=10域= np.arange(lb,ub,.01)dist_1 = st.norm.pdf(域，2,1)dist_2 = st.norm.pdf(域，2.5,1.5)dist_3 = st.norm.pdf(域，2.2,1.6)dist_4 = st.norm.pdf(域，2.4,1.3)dist_5 = st.norm.pdf(域，2.7,1.5)从 matplotlib 导入 pyplot 作为 pltplt.plot(域，dist_1，'r')plt.plot(域，dist_2，'g')plt.plot(域，dist_3，'b')plt.plot(域，dist_4，'y')plt.plot(域，dist_5，'c')dist=[dist_1, dist_2, dist_3, dist_4, dist_5]图=conflate_pdf(域，dists，lb，ub)plt.plot(域，图形，'m')plt.xlabel(域")plt.ylabel(pdf")plt.title(合并的PDF")plt.show()
编辑 3:
我尝试运行以下代码(基于 @Josh Purtell 的回答)，但是，我一直在获取一个变量，它在 product 函数之后获取整个数组，并产生相同的错误有关 size-1 数组的消息.请参阅以下带有部分输出的代码:
代码:
from scipy.integrate import quad来自 scipy 导入统计将 numpy 导入为 npdef prod_pdf(x,dists):p_pdf=1打印('传入数组:'，p_pdf)对于 dist 中的 dist:p_pdf=p_pdf*dist打印('最终:'，p_pdf)返回 p_pddef conflate_pdf(x,dists,lb,ub):打印('输入产品pdf:'，prod_pdf(x，dists))denom = quad(prod_pdf, lb, ub, args=(dists,))[0]# denom = simps(prod_pdf)# denom = nquad(func=(prod_pdf), range=([lb, ub]), args=(dists,))[0]打印('Denom:'，denom)conflated_pdf=prod_pdf(x,dists)/denom打印('合并PDF:'，conflated_pdf)返回 conflated_pdf磅=-10ub=10域= np.arange(lb,ub,.01)dist_1 = st.norm.pdf(域，2,1)dist_2 = st.norm.pdf(域，2.5,1.5)dist_3 = st.norm.pdf(域，2.2,1.6)dist_4 = st.norm.pdf(域，2.4,1.3)dist_5 = st.norm.pdf(域，2.7,1.5)从 matplotlib 导入 pyplot 作为 pltplt.xlabel(域")plt.ylabel(pdf")plt.title(合并的PDF")plt.legend()plt.plot(domain, dist_1, 'r', label='Dist.1')plt.plot(domain, dist_2, 'g', label='Dist.2')plt.plot(domain, dist_3, 'b', label='Dist.3')plt.plot(domain, dist_4, 'y', label='Dist.4')plt.plot(domain, dist_5, 'c', label='Dist.5')dist=[dist_1, dist_2, dist_3, dist_4, dist_5]print('分发列表:
', diss)图=conflate_pdf(域，dists，lb，ub)plt.plot(domain,graph, 'm', label='Conflated Dist.')plt.show()
这是输出的一小部分:
传入数组:1最终:[2.14638374e-32 2.41991991e-32 2.72804284e-32 ... 6.41980576e-155.92770938e-15 5.47278628e-15]最终:[4.75178372e-48 5.66328097e-48 6.74864868e-48 ... 7.03075979e-216.27970218e-21 5.60806584e-21]最终:[2.80912097e-61 3.51131870e-61 4.38823989e-61 ... 1.32670185e-261.14952951e-26 9.95834610e-27]最终:[1.51005552e-81 2.03116529e-81 2.73144352e-81 ... 1.76466623e-341.46198598e-34 1.21092834e-34]最终:[1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-402.98464396e-40 2.39335035e-40]输入产品pdf:[1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-402.98464396e-40 2.39335035e-40]传入数组:1最终:[2.14638374e-32 2.41991991e-32 2.72804284e-32 ... 6.41980576e-155.92770938e-15 5.47278628e-15]最终:[4.75178372e-48 5.66328097e-48 6.74864868e-48 ... 7.03075979e-216.27970218e-21 5.60806584e-21]最终:[2.80912097e-61 3.51131870e-61 4.38823989e-61 ... 1.32670185e-261.14952951e-26 9.95834610e-27]最终:[1.51005552e-81 2.03116529e-81 2.73144352e-81 ... 1.76466623e-341.46198598e-34 1.21092834e-34]最终:[1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-402.98464396e-40 2.39335035e-40]
我设法查看代码以在 Edit 3 中实现相同的方法，我编辑了代码，它从每个分布中获取第一个变量，但是，对于循环的其余部分，它会继续打印相同的值，它不会转到列表中的下一个值，并且合并分布是单个变量.请参阅以下带有部分输出的代码:
代码:
from scipy.integrate import quad来自 scipy 导入统计将 numpy 导入为 npdef prod_pdf(x,dists):p_pdf=1打印('传入数组:'，p_pdf)对于 enumerate(dists) 中的 c,dist:p_pdf=p_pdf*dist[c]打印('最终:'，p_pdf)返回 p_pdfdef conflate_pdf(x,dists,lb,ub):打印('输入产品pdf:'，prod_pdf(x，dists))denom = quad(prod_pdf, lb, ub, args=(dists,))[0]# denom = simps(prod_pdf)# denom = nquad(func=(prod_pdf), range=([lb, ub]), args=(dists,))[0]打印('Denom:'，denom)conflated_pdf=prod_pdf(x,dists)/denom打印('合并PDF:'，conflated_pdf)返回 conflated_pdf磅=-10ub=10域= np.arange(lb,ub,.01)dist_1 = st.norm.pdf(域，2,1)dist_2 = st.norm.pdf(域，2.5,1.5)dist_3 = st.norm.pdf(域，2.2,1.6)dist_4 = st.norm.pdf(域，2.4,1.3)dist_5 = st.norm.pdf(域，2.7,1.5)从 matplotlib 导入 pyplot 作为 pltplt.xlabel(域")plt.ylabel(pdf")plt.title(合并的PDF")plt.legend()plt.plot(domain, dist_1, 'r', label='Dist.1')plt.plot(domain, dist_2, 'g', label='Dist.2')plt.plot(domain, dist_3, 'b', label='Dist.3')plt.plot(domain, dist_4, 'y', label='Dist.4')plt.plot(domain, dist_5, 'c', label='Dist.5')dist=[dist_1, dist_2, dist_3, dist_4, dist_5]print('分发列表:
', diss)图=conflate_pdf(域，dists，lb，ub)plt.plot(domain,graph, 'm', label='Conflated Dist.')plt.show()
输出的一部分:
传入数组:1决赛:2.1463837356630605e-32决赛:5.0231307782193034e-48决赛:3.266239495519432e-61决赛:2.187514996217005e-81决赛:1.979657878680375e-97传入数组:1决赛:2.1463837356630605e-32决赛:5.0231307782193034e-48决赛:3.266239495519432e-61决赛:2.187514996217005e-81决赛:1.979657878680375e-97名称:3.95931575736075e-96传入数组:1决赛:2.1463837356630605e-32决赛:5.0231307782193034e-48决赛:3.266239495519432e-61决赛:2.187514996217005e-81决赛:1.979657878680375e-97合并PDF:0.049999999999999996
编辑 4:
我实现了以下代码，它似乎有效，而且，我设法解决了 quad 的问题，看来我将 quad 更改为 fixed_quad 并规范化 pdf 列表.我会得到同样的结果.代码如下:
import scipy.stats as st将 numpy 导入为 np将 scipy.stats 导入为 st导入 matplotlib.pyplot 作为 plt从 sklearn.preprocessing 导入 MinMaxScaler、Normalizer、normalize、StandardScaler从 scipy.integrate 导入 quad、simps、quad_vec、nquad、cumulative_trapezoid从 scipy.integrate 导入 romberg、trapezoid、simpson、romb从 scipy.integrate 导入 fixed_quad、正交、quad_explain来自 scipy 导入统计导入时间def user_prod_pdf(x,dists):p_list=[]p_pdf=1打印('传入数组:'，p_pdf)对于 dist 中的 dist:打印('传入分布数组:'，dist.pdf(x))p_pdf=p_pdf*dist.pdf(x)打印('产品PDF:'，p_pdf)p_list.append(p_pdf)打印('最终产品PDF:'，p_pdf)打印('产品PDF列表:'，p_list)返回 p_pdfdef user_conflate_pdf(x,dists,lb,ub):打印('输入产品pdf:'，user_prod_pdf(x，dists))denom = quad(user_prod_pdf, lb, ub, args=(dists,))[0]打印('Denom:'，denom)conflated_pdf=user_prod_pdf(x,dists)/denom打印('合并PDF:'，conflated_pdf)返回 conflated_pdfdef user_conflate_pdf_2(pdfs):"计算给定 pdf 的合并.[ARGS]- pdfs: PDFs numpy 形状数组 (n, x)其中 n 是 PDF 的数量x 是变量空间.[返回]归一化合并 PDF 的一维数组."# 合并合并 = np.array(pdfs).prod(axis=0)# 标准化conflation/= conflation.sum()返回合并def my_product_pdf(x,dists):p_list=[]p_pdf=1打印('传入数组:'，p_pdf)list_full_size=np.array(dists).shape打印('完整列表大小:'，list_full_size)打印('列表大小:'，list_full_size[0])对于范围内的 x(list_full_size[1]):p_pdf=1对于范围内的 y(list_full_size[0]):p_pdf=float(p_pdf)*dists[y][x]打印('产品价值:'，p_pdf)打印('产品PDF:'，p_pdf)p_list.append(p_pdf)打印('最终产品PDF:'，p_pdf)打印('产品PDF列表:'，p_list)# 返回 p_pdf返回 p_list# 返回 np.array(p_list)def my_conflate_pdf(x,dists,lb,ub):打印('
')# 打印('产品 pdf: ', prod_pdf(x,dists))打印('产品pdf:'，my_product_pdf(x，dists))denom = fixed_quad(my_product_pdf, lb, ub, args=(dists,), n=1)[0]打印('Denom:'，denom)# conflated_pdf=prod_pdf(x,dists)/denomconflated_pdf=my_product_pdf(x,dists)/denom# conflated_pdf=[i/j for i,j in zip(my_product_pdf(x,dists), denom)]打印('合并PDF:'，conflated_pdf)返回 conflated_pdf磅=-10ub=10域= np.arange(lb,ub,.01)# dist_1 = st.norm(2,1)# dist_2 = st.norm(2.5,1.5)# dist_3 = st.norm(2.2,1.6)# dist_4 = st.norm(2.4,1.3)# dist_5 = st.norm(2.7,1.5)# dist_1_pdf = st.norm.pdf(domain, 2,1)# dist_2_pdf = st.norm.pdf(domain, 2.5,1.5)# dist_3_pdf = st.norm.pdf(domain, 2.2,1.6)# dist_4_pdf = st.norm.pdf(domain, 2.4,1.3)# dist_5_pdf = st.norm.pdf(domain, 2.7,1.5)# dist_1_pdf/= dist_1_pdf.sum()# dist_2_pdf/= dist_2_pdf.sum()# dist_3_pdf/= dist_3_pdf.sum()# dist_4_pdf/= dist_4_pdf.sum()# dist_5_pdf/= dist_5_pdf.sum()dist_1 = st.norm(2,1)dist_2 = st.norm(4,2)dist_3 = st.norm(7,4)dist_4 = st.norm(2.4,1.3)dist_5 = st.norm(2.7,1.5)dist_1_pdf = st.norm.pdf(域，2,1)dist_2_pdf = st.norm.pdf(域，4,2)dist_3_pdf = st.norm.pdf(域，7,4)dist_4_pdf = st.norm.pdf(域，2.4,1.3)dist_5_pdf = st.norm.pdf(域，2.7,1.5)# dist_1_pdf/= dist_1_pdf.sum()# dist_2_pdf/= dist_2_pdf.sum()# dist_3_pdf/= dist_3_pdf.sum()# dist_4_pdf/= dist_4_pdf.sum()# dist_5_pdf/= dist_5_pdf.sum()# 用户:plt.xlabel(域")plt.ylabel(pdf")plt.title(用户合并的PDF")plt.plot(domain, dist_1_pdf, 'r', label='Dist.1')plt.plot(domain, dist_2_pdf, 'g', label='Dist.2')plt.plot(domain, dist_3_pdf, 'b', label='Dist.3')plt.plot(domain, dist_4_pdf, 'y', label='Dist.4')plt.plot(domain, dist_5_pdf, 'c', label='Dist.5')dist=[dist_1, dist_2, dist_3, dist_4, dist_5]user_graph=user_conflate_pdf(domain,dists,lb,ub)打印('最终合并的PDF:'，user_graph)# user_graph/= user_graph.sum()plt.plot(domain, user_graph, 'm', label='Conflated PDF')plt.legend()plt.show()# 用户 2:plt.xlabel(域")plt.ylabel(pdf")plt.title(用户合并的 PDF 2")plt.plot(domain, dist_1_pdf, 'r', label='Dist.1')plt.plot(domain, dist_2_pdf, 'g', label='Dist.2')plt.plot(domain, dist_3_pdf, 'b', label='Dist.3')plt.plot(domain, dist_4_pdf, 'y', label='Dist.4')plt.plot(domain, dist_5_pdf, 'c', label='Dist.5')dist=[dist_1_pdf, dist_2_pdf, dist_3_pdf, dist_4_pdf, dist_5_pdf]user_graph=user_conflate_pdf_2(dists)打印('最终用户合并的PDF 2:'，user_graph)# user_graph/= user_graph.sum()plt.plot(domain, user_graph, 'm', label='Conflated PDF')plt.legend()plt.show()# 我的代码:# 从 matplotlib 导入 pyplot 作为 pltplt.xlabel(域")plt.ylabel(pdf")plt.title(我的混合 PDF 代码")plt.plot(domain, dist_1_pdf, 'r', label='Dist.1')plt.plot(domain, dist_2_pdf, 'g', label='Dist.2')plt.plot(domain, dist_3_pdf, 'b', label='Dist.3')plt.plot(domain, dist_4_pdf, 'y', label='Dist.4')plt.plot(domain, dist_5_pdf, 'c', label='Dist.5')dist=[dist_1_pdf, dist_2_pdf, dist_3_pdf, dist_4_pdf, dist_5_pdf]my_graph=my_conflate_pdf(域，dists，lb，ub)打印('最终合并的PDF:'，my_graph)my_graph/= np.array(my_graph).sum()# my_graph = inverse_normalise(my_graph)plt.plot(domain, my_graph, 'm', label='Conflated PDF')plt.legend()plt.show()# 合并的 PDF:打印('用户合并的PDF:'，user_graph)打印('我的合并PDF:'，np.array(my_graph))
这是输出:
我的问题在这里，我知道我需要规范化 PDF 列表.但是，如果我没有对 PDF 进行标准化，我该如何修改我的合并代码以获得以下图?
要得到上面的图和我的混淆代码:
# user_graph/= user_graph.sum()# dist_1_pdf/= dist_1_pdf.sum()# dist_2_pdf/= dist_2_pdf.sum()# dist_3_pdf/= dist_3_pdf.sum()# dist_4_pdf/= dist_4_pdf.sum()# dist_5_pdf/= dist_5_pdf.sum()
我没有标准化的混淆代码图:
 解决方案 
免责声明:我很有可能误解了您或论文作者，在这种情况下，请建议对此答案进行编辑.
这是我认为合并可能看起来像的一个微不足道的，不是特别高效的实现
##define pdfs 离散 RV X = {1,2,3,4}将 numpy 导入为 npdef mult_list(pdfs):prod=np.ones(pdfs[0].shape[0])对于 pdf 中的 pdf:产品=产品*pdf退货定义合并(pdfs):返回mult_list(pdfs)/sum(mult_list(pdfs))pdf_1=np.array([.25,.25,.25,.25])pdf_2=np.array([.33,.33,.33,.00])pdf_3=np.array([.25,.12,.13,.50])打印(合并([pdf_1，pdf_2，pdf_3]))
产生结果混淆的pdf
<预><代码>>>>[0.5 0.24 0.26 0. ]

通过了粗略的嗅探测试.

在事物的连续方面，以上转化为

from scipy.integrate import quad来自 scipy 导入统计将 numpy 导入为 npdef prod_pdf(x,dists):p_pdf=1对于 dist 中的 dist:p_pdf=p_pdf*dist.pdf(x)返回 p_pdfdef conflate_pdf(x,dists,lb,ub):denom = quad(prod_pdf, lb, ub, args=(dists))[0]返回 prod_pdf(x,dists)/denomdists=[stats.norm(2,1),stats.norm(4,2)]磅=-10ub=10域= np.arange(lb,ub,.01)图=conflate_pdf(域，dists，lb，ub)从 matplotlib 导入 pyplot 作为 pltplt.plot(域，图)plt.xlabel(域")plt.ylabel(pdf")plt.title(合并的PDF")plt.show()plt.savefig("conflatedpdf.png")

这给

正如您所看到的，分布不是双峰的，正如人们所希望的那样.

I looked online for performing the combining several continuous probability distributions into one continuous probability distribution. This method is called Conflation, the method can be found in the following article: An Optimal Method for Consolidating Data from Different Experiments. In this article, I found out that it was better to perform Conflation instead of averaging to combine distributions.

From what I understood from the article is that equation performs by multiplying each probability density values from several probability distributions divided by the integration of the product of each probability density value from several probability distributions for continuous distribution while for the discrete distribution it is done by multiplying each probability density value from several probability distributions divided by the summation of each probability density value from several probability distributions. (Details can be found on page 5 of the article)

Say I have around 4 lists from, for example, 4 norm distributions, for example,

list_1 = [5, 8, 6, 2, 1]
list_2 = [2, 6, 1, 3, 8]
list_3 = [1, 9, 2, 7, 5]
list_4 = [3, 2, 4, 1, 6]

and implementing the Conflation the result list becomes,

Con_list = [2.73, 34.56, 3.69, 3.23, 12]

(Correct me if I am wrong)

how is it possible to implement both equations in the photo into python to get the Conflation of inputted PDF distribution?

I found stackflow question before regarding averaging list and the code was the following,

def average(l):
    llen = len(l)
    def divide(x):
        return x / llen
    # return map(divide, map(sum, zip(*l)))
    return map(divide, map(sum, zip(l)))

I have been trying to recode this function to follow the equation above but I can't find a way to get conflated pdf for a continuous distribution.

Edit 1:

Based on the answer from @Josh Purtell, I rewrote the code, however, I keep on getting the following error message:

Error Message:

Traceback (most recent call last):
  File "/tmp/sessions/c903d99d60f20c3b/main.py", line 72, in <module>
    graph=conflate_pdf(domain, dists,lb,ub)
  File "/tmp/sessions/c903d99d60f20c3b/main.py", line 58, in conflate_pdf
    denom = quad(prod_pdf, lb, ub, args=(dists))[0]
  File "/usr/local/lib/python3.6/dist-packages/scipy/integrate/quadpack.py", line 341, in quad
    points)
  File "/usr/local/lib/python3.6/dist-packages/scipy/integrate/quadpack.py", line 448, in _quad
    return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
TypeError: only size-1 arrays can be converted to Python scalars

Code:

def prod_pdf(x,pdfs):
    prod=np.ones(pdfs[0].shape[0])
    for pdf in pdfs:
        prod=prod*pdf
    return prod

def conflate_pdf(x,dists,lb,ub):
    denom = quad(prod_pdf, lb, ub, args=(dists))[0]
    return prod_pdf(x,dists)/denom

lb=-10
ub=10
domain=np.arange(lb,ub,.01)

dist_1 = stats.norm.pdf(domain, 2,1)
dist_2 = stats.norm.pdf(domain, 2.5,1.5)
dist_3 = stats.norm.pdf(domain, 2.2,1.6)
dist_4 = stats.norm.pdf(domain, 2.4,1.3)
dist_5 = stats.norm.pdf(domain, 2.7,1.5)

dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
graph=conflate_pdf(domain, dists,lb,ub)

from matplotlib import pyplot as plt
plt.plot(domain, dist_1)
plt.plot(domain, dist_2)
plt.plot(domain, dist_3)
plt.plot(domain, dist_4)
plt.plot(domain, dist_5)
plt.plot(domain,graph)
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.show()

From the code, what causes this error?

Edit 2:

I managed to rewrite the code to look into lists of distribution instead of getting the pdf in the product function in Edit 1, but still, I keep on having the same error in Edit 1.

Code:

def prod_pdf(x,pdfs):
    prod=np.ones(np.array(pdfs)[0].shape)
    for pdf in pdfs:
        print(prod)
        for c,y in enumerate(pdf):
            prod[c]=prod[c]*y
        print('final:', prod)
    return prod

def conflate_pdf(x,dists,lb,ub):
    denom = quad(prod_pdf, lb, ub, args=(dists))[0]
    print('Denom: ',denom)
    print('product pdf: ', prod_pdf(x,dists))
    conflated_pdf=prod_pdf(x,dists)/denom
    print(conflated_pdf)
    return conflated_pdf

lb=-10
ub=10
domain=np.arange(lb,ub,.01)

dist_1 = st.norm.pdf(domain, 2,1)
dist_2 = st.norm.pdf(domain, 2.5,1.5)
dist_3 = st.norm.pdf(domain, 2.2,1.6)
dist_4 = st.norm.pdf(domain, 2.4,1.3)
dist_5 = st.norm.pdf(domain, 2.7,1.5)

from matplotlib import pyplot as plt
plt.plot(domain, dist_1, 'r')
plt.plot(domain, dist_2, 'g')
plt.plot(domain, dist_3, 'b')
plt.plot(domain, dist_4, 'y')
plt.plot(domain, dist_5, 'c')

dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
graph=conflate_pdf(domain, dists,lb,ub)


plt.plot(domain,graph, 'm')
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.show()

Edit 3:

I tried to run the following code (based on an answer from @Josh Purtell), but, I keep on getting one variable it gets the whole array after product function and it produces the same error message regarding the size-1 array. See the following code with a portion of the output:

Code:

from scipy.integrate import quad
from scipy import stats
import numpy as np

def prod_pdf(x,dists):
    p_pdf=1
    print('Incoming Array:', p_pdf)
    for dist in dists:
        p_pdf=p_pdf*dist
        print('final:', p_pdf)
    return p_pd

def conflate_pdf(x,dists,lb,ub):
    print('Input product pdf: ', prod_pdf(x,dists))
    denom = quad(prod_pdf, lb, ub, args=(dists,))[0]
    # denom = simps(prod_pdf)
    # denom = nquad(func=(prod_pdf), ranges=([lb, ub]), args=(dists,))[0]
    print('Denom: ', denom)
    conflated_pdf=prod_pdf(x,dists)/denom
    print('Conflated PDF: ', conflated_pdf)
    return conflated_pdf

lb=-10
ub=10
domain=np.arange(lb,ub,.01)

dist_1 = st.norm.pdf(domain, 2,1)
dist_2 = st.norm.pdf(domain, 2.5,1.5)
dist_3 = st.norm.pdf(domain, 2.2,1.6)
dist_4 = st.norm.pdf(domain, 2.4,1.3)
dist_5 = st.norm.pdf(domain, 2.7,1.5)

from matplotlib import pyplot as plt
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.legend()
plt.plot(domain, dist_1, 'r', label='Dist. 1')
plt.plot(domain, dist_2, 'g', label='Dist. 2')
plt.plot(domain, dist_3, 'b', label='Dist. 3')
plt.plot(domain, dist_4, 'y', label='Dist. 4')
plt.plot(domain, dist_5, 'c', label='Dist. 5')

dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
print('distribution list: 
', dists)
graph=conflate_pdf(domain, dists,lb,ub)

plt.plot(domain,graph, 'm', label='Conflated Dist.')
plt.show()

Here is a small portion of the output:

Incoming Array: 1
final: [2.14638374e-32 2.41991991e-32 2.72804284e-32 ... 6.41980576e-15
 5.92770938e-15 5.47278628e-15]
final: [4.75178372e-48 5.66328097e-48 6.74864868e-48 ... 7.03075979e-21
 6.27970218e-21 5.60806584e-21]
final: [2.80912097e-61 3.51131870e-61 4.38823989e-61 ... 1.32670185e-26
 1.14952951e-26 9.95834610e-27]
final: [1.51005552e-81 2.03116529e-81 2.73144352e-81 ... 1.76466623e-34
 1.46198598e-34 1.21092834e-34]
final: [1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-40
 2.98464396e-40 2.39335035e-40]
Input product pdf:  [1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-40
 2.98464396e-40 2.39335035e-40]
Incoming Array: 1
final: [2.14638374e-32 2.41991991e-32 2.72804284e-32 ... 6.41980576e-15
 5.92770938e-15 5.47278628e-15]
final: [4.75178372e-48 5.66328097e-48 6.74864868e-48 ... 7.03075979e-21
 6.27970218e-21 5.60806584e-21]
final: [2.80912097e-61 3.51131870e-61 4.38823989e-61 ... 1.32670185e-26
 1.14952951e-26 9.95834610e-27]
final: [1.51005552e-81 2.03116529e-81 2.73144352e-81 ... 1.76466623e-34
 1.46198598e-34 1.21092834e-34]
final: [1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-40
 2.98464396e-40 2.39335035e-40]

I managed to look into the code to implement the same method in Edit 3, I edited the code where it gets the first variables from each distribution however, for the rest of the loop it keeps on printing the same values, it does not go to the next values in the lists and Conflated distribution is a single variable. See the following code with a portion of the output:

Code:

from scipy.integrate import quad
from scipy import stats
import numpy as np

def prod_pdf(x,dists):
    p_pdf=1
    print('Incoming Array:', p_pdf)
    for c,dist in enumerate(dists):
        p_pdf=p_pdf*dist[c]
        print('final:', p_pdf)
    return p_pdf

def conflate_pdf(x,dists,lb,ub):
    print('Input product pdf: ', prod_pdf(x,dists))
    denom = quad(prod_pdf, lb, ub, args=(dists,))[0]
    # denom = simps(prod_pdf)
    # denom = nquad(func=(prod_pdf), ranges=([lb, ub]), args=(dists,))[0]
    print('Denom: ', denom)
    conflated_pdf=prod_pdf(x,dists)/denom
    print('Conflated PDF: ', conflated_pdf)
    return conflated_pdf

lb=-10
ub=10
domain=np.arange(lb,ub,.01)

dist_1 = st.norm.pdf(domain, 2,1)
dist_2 = st.norm.pdf(domain, 2.5,1.5)
dist_3 = st.norm.pdf(domain, 2.2,1.6)
dist_4 = st.norm.pdf(domain, 2.4,1.3)
dist_5 = st.norm.pdf(domain, 2.7,1.5)

from matplotlib import pyplot as plt
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.legend()
plt.plot(domain, dist_1, 'r', label='Dist. 1')
plt.plot(domain, dist_2, 'g', label='Dist. 2')
plt.plot(domain, dist_3, 'b', label='Dist. 3')
plt.plot(domain, dist_4, 'y', label='Dist. 4')
plt.plot(domain, dist_5, 'c', label='Dist. 5')

dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
print('distribution list: 
', dists)
graph=conflate_pdf(domain, dists,lb,ub)

plt.plot(domain,graph, 'm', label='Conflated Dist.')
plt.show()

A portion of the output:

Incoming Array: 1
final: 2.1463837356630605e-32
final: 5.0231307782193034e-48
final: 3.266239495519432e-61
final: 2.187514996217005e-81
final: 1.979657878680375e-97
Incoming Array: 1
final: 2.1463837356630605e-32
final: 5.0231307782193034e-48
final: 3.266239495519432e-61
final: 2.187514996217005e-81
final: 1.979657878680375e-97
Denom:  3.95931575736075e-96
Incoming Array: 1
final: 2.1463837356630605e-32
final: 5.0231307782193034e-48
final: 3.266239495519432e-61
final: 2.187514996217005e-81
final: 1.979657878680375e-97
Conflated PDF:  0.049999999999999996

Edit 4:

I implemented the following code and it seems to work, also, I managed to sort out the problem with quad it seems if I changed the quad into fixed_quad and normalise the pdf list. I will get the same result. Here is the following code:

import scipy.stats as st
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler, Normalizer, normalize, StandardScaler
from scipy.integrate import quad, simps, quad_vec, nquad, cumulative_trapezoid
from scipy.integrate import romberg, trapezoid, simpson, romb
from scipy.integrate import fixed_quad, quadrature, quad_explain
from scipy import stats
import time

def user_prod_pdf(x,dists):
p_list=[]
p_pdf=1
print('Incoming Array:', p_pdf)
for dist in dists:
print('Incoming Distribution Array:', dist.pdf(x))
p_pdf=p_pdf*dist.pdf(x)
print('Product PDF:', p_pdf)
p_list.append(p_pdf)
print('final Product PDF:', p_pdf)
print('Product PDF list: ', p_list)
return p_pdf

def user_conflate_pdf(x,dists,lb,ub):
print('Input product pdf: ', user_prod_pdf(x,dists))
denom = quad(user_prod_pdf, lb, ub, args=(dists,))[0]
print('Denom: ', denom)
conflated_pdf=user_prod_pdf(x,dists)/denom
print('Conflated PDF: ', conflated_pdf)
return conflated_pdf

def user_conflate_pdf_2(pdfs):
"""
Compute conflation of given pdfs.

[ARGS]
- pdfs: PDFs numpy array of shape (n, x)
where n is the number of PDFs
and x is the variable space.

[RETURN]
A 1d-array of normalized conflated PDF.
"""
# conflate
conflation = np.array(pdfs).prod(axis=0)
# normalize
conflation /= conflation.sum()
return conflation

def my_product_pdf(x,dists):
p_list=[]
p_pdf=1
print('Incoming Array:', p_pdf)
list_full_size=np.array(dists).shape
print('Full list size: ', list_full_size)
print('list size: ', list_full_size[0])
for x in range(list_full_size[1]):
p_pdf=1
for y in range(list_full_size[0]):
p_pdf=float(p_pdf)*dists[y][x]
print('Product value: ', p_pdf)
print('Product PDF:', p_pdf)
p_list.append(p_pdf)
print('final Product PDF:', p_pdf)
print('Product PDF list: ', p_list)
# return p_pdf
return p_list
# return np.array(p_list)

def my_conflate_pdf(x,dists,lb,ub):
print('
')
# print('product pdf: ', prod_pdf(x,dists))
print('product pdf: ', my_product_pdf(x,dists))
denom = fixed_quad(my_product_pdf, lb, ub, args=(dists,), n=1)[0]
print('Denom: ', denom)
# conflated_pdf=prod_pdf(x,dists)/denom
conflated_pdf=my_product_pdf(x,dists)/denom
# conflated_pdf=[i / j for i,j in zip(my_product_pdf(x,dists), denom)]
print('Conflated PDF: ', conflated_pdf)
return conflated_pdf

lb=-10
ub=10
domain=np.arange(lb,ub,.01)

# dist_1 = st.norm(2,1)
# dist_2 = st.norm(2.5,1.5)
# dist_3 = st.norm(2.2,1.6)
# dist_4 = st.norm(2.4,1.3)
# dist_5 = st.norm(2.7,1.5)

# dist_1_pdf = st.norm.pdf(domain, 2,1)
# dist_2_pdf = st.norm.pdf(domain, 2.5,1.5)
# dist_3_pdf = st.norm.pdf(domain, 2.2,1.6)
# dist_4_pdf = st.norm.pdf(domain, 2.4,1.3)
# dist_5_pdf = st.norm.pdf(domain, 2.7,1.5)

# dist_1_pdf /= dist_1_pdf.sum()
# dist_2_pdf /= dist_2_pdf.sum()
# dist_3_pdf /= dist_3_pdf.sum()
# dist_4_pdf /= dist_4_pdf.sum()
# dist_5_pdf /= dist_5_pdf.sum()

dist_1 = st.norm(2,1)
dist_2 = st.norm(4,2)
dist_3 = st.norm(7,4)
dist_4 = st.norm(2.4,1.3)
dist_5 = st.norm(2.7,1.5)

dist_1_pdf = st.norm.pdf(domain, 2,1)
dist_2_pdf = st.norm.pdf(domain, 4,2)
dist_3_pdf = st.norm.pdf(domain, 7,4)
dist_4_pdf = st.norm.pdf(domain, 2.4,1.3)
dist_5_pdf = st.norm.pdf(domain, 2.7,1.5)

# dist_1_pdf /= dist_1_pdf.sum()
# dist_2_pdf /= dist_2_pdf.sum()
# dist_3_pdf /= dist_3_pdf.sum()
# dist_4_pdf /= dist_4_pdf.sum()
# dist_5_pdf /= dist_5_pdf.sum()

# User:
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("User Conflated PDF")
plt.plot(domain, dist_1_pdf, 'r', label='Dist. 1')
plt.plot(domain, dist_2_pdf, 'g', label='Dist. 2')
plt.plot(domain, dist_3_pdf, 'b', label='Dist. 3')
plt.plot(domain, dist_4_pdf, 'y', label='Dist. 4')
plt.plot(domain, dist_5_pdf, 'c', label='Dist. 5')

dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
user_graph=user_conflate_pdf(domain,dists,lb,ub)
print('Final Conflated PDF: ', user_graph)

# user_graph /= user_graph.sum()

plt.plot(domain, user_graph, 'm', label='Conflated PDF')
plt.legend()
plt.show()

# User 2:
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("User Conflated PDF 2")
plt.plot(domain, dist_1_pdf, 'r', label='Dist. 1')
plt.plot(domain, dist_2_pdf, 'g', label='Dist. 2')
plt.plot(domain, dist_3_pdf, 'b', label='Dist. 3')
plt.plot(domain, dist_4_pdf, 'y', label='Dist. 4')
plt.plot(domain, dist_5_pdf, 'c', label='Dist. 5')

dists=[dist_1_pdf, dist_2_pdf, dist_3_pdf, dist_4_pdf, dist_5_pdf]
user_graph=user_conflate_pdf_2(dists)
print('Final User Conflated PDF 2 : ', user_graph)

# user_graph /= user_graph.sum()

plt.plot(domain, user_graph, 'm', label='Conflated PDF')
plt.legend()
plt.show()

# My Code:
# from matplotlib import pyplot as plt
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("My Conflated PDF Code")
plt.plot(domain, dist_1_pdf, 'r', label='Dist. 1')
plt.plot(domain, dist_2_pdf, 'g', label='Dist. 2')
plt.plot(domain, dist_3_pdf, 'b', label='Dist. 3')
plt.plot(domain, dist_4_pdf, 'y', label='Dist. 4')
plt.plot(domain, dist_5_pdf, 'c', label='Dist. 5')

dists=[dist_1_pdf, dist_2_pdf, dist_3_pdf, dist_4_pdf, dist_5_pdf]
my_graph=my_conflate_pdf(domain,dists,lb,ub)
print('Final Conflated PDF: ', my_graph)

my_graph /= np.array(my_graph).sum()

# my_graph = inverse_normalise(my_graph)

plt.plot(domain, my_graph, 'm', label='Conflated PDF')
plt.legend()
plt.show()

# Conflated PDF:
print('User Conflated PDF: ', user_graph)
print('My Conflated PDF: ', np.array(my_graph))

Here is the output:

My question here, I understand that I would need to normalise the PDF lists. But, say I did not normalise the PDF, how can I modify my conflation code to get the following plot?

To get the plot above and my conflated code:

# user_graph /= user_graph.sum()
# dist_1_pdf /= dist_1_pdf.sum()
# dist_2_pdf /= dist_2_pdf.sum()
# dist_3_pdf /= dist_3_pdf.sum()
# dist_4_pdf /= dist_4_pdf.sum()
# dist_5_pdf /= dist_5_pdf.sum()

My conflated code plot with no normalisation:

解决方案

Disclaimer: there's a good chance I'm misunderstanding either you or the paper authors, in which case please suggest an edit to this answer.

Here is a trivial, not-especially-performant implementation of what I think conflation might look like

##define pdfs for discrete RV X = {1,2,3,4}
import numpy as np

def mult_list(pdfs):
    prod=np.ones(pdfs[0].shape[0])
    for pdf in pdfs:
        prod=prod*pdf
    return prod

def conflate(pdfs):
    return mult_list(pdfs)/sum(mult_list(pdfs))

pdf_1=np.array([.25,.25,.25,.25])
pdf_2=np.array([.33,.33,.33,.00])
pdf_3=np.array([.25,.12,.13,.50])

print(conflate([pdf_1,pdf_2,pdf_3]))

which yields the resulting conflated pdf

>>> [0.5  0.24 0.26 0.  ]

which passes a cursory sniff test.

On the continuous side of things, the above translates to

from scipy.integrate import quad
from scipy import stats
import numpy as np

def prod_pdf(x,dists):
    p_pdf=1
    for dist in dists:
        p_pdf=p_pdf*dist.pdf(x)
    return p_pdf

def conflate_pdf(x,dists,lb,ub):
    denom = quad(prod_pdf, lb, ub, args=(dists))[0]
    return prod_pdf(x,dists)/denom

dists=[stats.norm(2,1),stats.norm(4,2)]
lb=-10
ub=10
domain=np.arange(lb,ub,.01)
graph=conflate_pdf(domain,dists,lb,ub)

from matplotlib import pyplot as plt
plt.plot(domain,graph)
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.show()
plt.savefig("conflatedpdf.png")

which gives

As you can see, the distribution is not bimodal, just as one would hope.

这篇关于如何在python中实现概率分布的Conflation?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持编程学习网！