Hadoop Distribution Differences(Hadoop 分布差异)
问题描述
有人可以概述各种可用的 Hadoop 发行版之间的各种差异吗:
Can somebody outline the various differences between the various Hadoop Distributions available:
- Cloudera - http://www.cloudera.com/hadoop
- 雅虎 - http://developer.yahoo.net/blogs/hadoop/
- Cloudera - http://www.cloudera.com/hadoop
- Yahoo - http://developer.yahoo.net/blogs/hadoop/
使用 Apache Hadoop 发行版作为基准.
using the Apache Hadoop distro as a baseline.
是否有充分的理由在标准 Apache Hadoop 发行版上使用这些发行版之一?
Is there a good reason to using one of these distributions over the standard Apache Hadoop distro?
推荐答案
免责声明:今年夏天我在 Cloudera 实习(但我最好的一些朋友在 Yahoo! :-))
Disclaimer: I interned at Cloudera this summer (but some of my best friends are at Yahoo! :-))
Yahoo 发行版是他们在其集群的某些子集上运行(运行?)的 Hadoop 20 版本.它包括一组用于稳定性、错误修复等的补丁.它是一个源版本;它没有 rpm 或 debian 软件包等对管理员友好的功能.
The Yahoo distribution is a version of Hadoop 20 that they run (ran?) on some subset of their clusters. It includes a set of patches for stability, bug fixes, etc. It is a source release; it does not have admin-friendly features like rpm or debian packages, etc.
Cloudera 发行版是 rpms 和 debs 形式的软件包(源代码也可用).这意味着您可以通过标准方法等获取更新.它还包括稳定性和错误修复补丁.它一直在维护(并不是说雅虎不是——我想人们可以去 github 上查看他们上次更新它的时间).它还打包了 Pig 和 Hive.
The Cloudera distribution is packages as rpms and debs (the source is also available). This means you can get updates via standard methods, etc. It also includes stability and bug fix patches. It is constantly maintained (not to say Yahoo's isn't -- I suppose one could just go on github and check when they last updated it). It also packages Pig and Hive.
Cloudera 的 Hadoop 20 发行版处于测试阶段,18 被认为是稳定的(更多关于这方面的信息,请访问 Cloudera 博客).18 版本还包括 Hive 和 Pig 的包;对于 20,您必须自己构建它们(虽然存在补丁,但目前还没有支持 20 的 Pig 或 Hive 的官方版本).Cloudera 和雅虎 20 版本之间很可能有很大的重叠;两者都提供清单,因此您可以检查.Cloudera 发行版的最新文档位于 http://archive.cloudera.com
Cloudera's distribution of Hadoop 20 is in beta, and 18 is considered stable (more on this on the Cloudera blog). The 18 version also includes packages for Hive and Pig; for 20, you have to build them yourself (there aren't official releases of Pig or Hive that support 20 yet, although patches exist). There may well be significant overlap between the Cloudera and Yahoo versions of 20; both provide manifests, so you can check. The latest documentation of Cloudera's distros is at http://archive.cloudera.com
Yahoo 不为其分发提供支持;他们将补丁版本作为服务提供给社区,因此感兴趣的人可以构建雅虎内部运行的内容.鉴于 Yahoo 集群的规模,这是一个重大贡献,尤其是如果您不是一直遵循 JIRA 的 Hadoop 开发人员.Cloudera 支持其商业发行版,并通过 Hadoop 邮件列表提供一些社区支持,对于发行版特定问题,在其 GetSatisfaction 页面上提供.
Yahoo does not provide support for their distribution; they provide their patched version as a service to the community, so the folks who are interested can build what Yahoo runs internally. Given the size of Yahoo clusters, that's a significant contribution, especially if you aren't a Hadoop developer who follows the JIRAs all the time. Cloudera supports their distribution commercially, as well as providing some community support via the Hadoop mailing lists and, for distro-specific issues, on their GetSatisfaction page.
两者都与原版 Apache 发行版有很大不同,因为它们会在两个版本之间对其进行修补(cloudera 版本 20 有 60 多个补丁!).
Both are pretty different from the vanilla Apache distro since they patch it in between releases (the cloudera version of 20 has 60+ patches!).
这篇关于Hadoop 分布差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Hadoop 分布差异
- value & 是什么意思?0xff 在 Java 中做什么? 2022-01-01
- Eclipse 插件更新错误日志在哪里? 2022-01-01
- Java包名称中单词分隔符的约定是什么? 2022-01-01
- Jersey REST 客户端:发布多部分数据 2022-01-01
- Spring Boot连接到使用仲裁器运行的MongoDB副本集 2022-01-01
- Safepoint+stats 日志,输出 JDK12 中没有 vmop 操作 2022-01-01
- 如何使用WebFilter实现授权头检查 2022-01-01
- 将log4j 1.2配置转换为log4j 2配置 2022-01-01
- C++ 和 Java 进程之间的共享内存 2022-01-01
- 从 finally 块返回时 Java 的奇怪行为 2022-01-01