神经网络的 softmax 激活函数的实现-C/C++问题

Implementation of a softmax activation function for neural networks(神经网络的 softmax 激活函数的实现)

本文介绍了神经网络的 softmax 激活函数的实现的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在神经网络的最后一层使用 Softmax 激活函数.但是我在安全实现这个函数方面遇到了问题.

I am using a Softmax activation function in the last layer of a neural network. But I have problems with a safe implementation of this function.

一个简单的实现是这样的:

A naive implementation would be this one:

Vector y = mlp(x); // output of the neural network without softmax activation function
for(int f = 0; f < y.rows(); f++)
  y(f) = exp(y(f));
y /= y.sum();

这对于 > 100 个隐藏节点效果不佳，因为在许多情况下 y 将是 NaN(如果 y(f) > 709，exp(y(f)) 将返回 inf).我想出了这个版本:

This does not work very well for > 100 hidden nodes because the y will be NaN in many cases (if y(f) > 709, exp(y(f)) will return inf). I came up with this version:

Vector y = mlp(x); // output of the neural network without softmax activation function
for(int f = 0; f < y.rows(); f++)
  y(f) = safeExp(y(f), y.rows());
y /= y.sum();

其中 safeExp 定义为

double safeExp(double x, int div)
{
  static const double maxX = std::log(std::numeric_limits<double>::max());
  const double max = maxX / (double) div;
  if(x > max)
    x = max;
  return std::exp(x);
}

这个函数限制了exp的输入.在大多数情况下这有效但并非在所有情况下我都没有真正设法找出在哪些情况下它不起作用.当我在前一层有 800 个隐藏神经元时，它根本不起作用.

This function limits the input of exp. In most of the cases this works but not in all cases and I did not really manage to find out in which cases it does not work. When I have 800 hidden neurons in the previous layer it does not work at all.

然而，即使这有效，我也会以某种方式扭曲"ANN的结果.你能想出任何其他方法来计算正确的解决方案吗?是否有任何 C++ 库或技巧可以用来计算这个 ANN 的准确输出?

However, even if this worked I somehow "distort" the result of the ANN. Can you think of any other way to calculate the correct solution? Are there any C++ libraries or tricks that I can use to calculate the exact output of this ANN?

Itamar Katz 提供的解决方案是:

edit: The solution provided by Itamar Katz is:

Vector y = mlp(x); // output of the neural network without softmax activation function
double ymax = maximal component of y
for(int f = 0; f < y.rows(); f++)
  y(f) = exp(y(f) - ymax);
y /= y.sum();

它在数学上确实是相同的.然而，在实践中，由于浮点精度，一些小值变为 0.我想知道为什么没有人在教科书中写下这些实现细节.

And it really is mathematically the same. In practice however, some small values become 0 because of the floating point precision. I wonder why nobody ever writes these implementation details down in textbooks.

问题描述

推荐答案