How to Serialize object in hadoop (in HDFS)(如何在 hadoop 中序列化对象(在 HDFS 中))
问题描述
我有一个 HashMap <字符串,数组列表 <整数 > >.我想将我的 HashMap 对象(hmap)序列化到 HDFS 位置,然后在 Mapper 和 Reducers 将其反序列化以使用它.
I have a HashMap < String,ArrayList < Integer > >. I want to serialize my HashMap object(hmap) to HDFS location and later deserialize it at Mapper and Reducers for using it.
为了在 HDFS 上序列化我的 HashMap 对象,我使用了如下的普通 java 对象序列化代码,但出现错误(权限被拒绝)
To serialize my HashMap object on HDFS I used normal java object serialization code as follows but got error (permission denied)
try
{
FileOutputStream fileOut =new FileOutputStream("hashmap.ser");
ObjectOutputStream out = new ObjectOutputStream(fileOut);
out.writeObject(hm);
out.close();
}
catch(Exception e)
{
e.printStackTrace();
}
我遇到了以下异常
java.io.FileNotFoundException: hashmap.ser (Permission denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at java.io.FileOutputStream.<init>(FileOutputStream.java:110)
at KMerIndex.createIndex(KMerIndex.java:121)
at MyDriverClass.formRefIndex(MyDriverClass.java:717)
at MyDriverClass.main(MyDriverClass.java:768)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
有人可以建议或分享如何在 hdfs 上的 hadoop 中序列化对象的示例代码吗?
Can someone please suggest or share the sample code of how to serialize object in hadoop on hdfs ?
推荐答案
请尝试使用 SerializationUtils 来自 Apache Commons Lang.
Please try using SerializationUtils from Apache Commons Lang.
下面是方法
static Object clone(Serializable object) //Deep clone an Object using serialization.
static Object deserialize(byte[] objectData) //Deserializes a single Object from an array of bytes.
static Object deserialize(InputStream inputStream) //Deserializes an Object from the specified stream.
static byte[] serialize(Serializable obj) //Serializes an Object to a byte array for storage/serialization.
static void serialize(Serializable obj, OutputStream outputStream) //Serializes an Object to the specified stream.
在存储到 HDFS 时,您可以存储从序列化返回的 byte[]
.在获取对象时,您可以将类型转换为相应的对象,例如:文件对象并可以将其取回.
While storing in to HDFS you can store byte[]
which was returned from serialize.
While getting the Object you can type cast to corresponding object for ex: File object and can get it back.
在我的例子中,我在 Hbase 列中存储了一个哈希图,我在我的映射器方法中将它检索回来,作为 Hashmap .. 并且成功了.强>
当然,你也可以用同样的方法...
Surely, you can also do that in the same way...
另一件事是你也可以使用 Apache Commons IO 参考这个 (org.apache.commons.io.FileUtils
);但稍后您需要将此文件复制到 HDFS.因为您希望 HDFS 作为数据存储.
Another thing is You can also Use Apache Commons IO refer this (org.apache.commons.io.FileUtils
);
but later you need to copy this file to HDFS. since you wanted HDFS as datastore.
FileUtils.writeByteArrayToFile(new File("pathname"), myByteArray);
注意: jar apache commons io 和 apache commons lang 在 hadoop 集群中始终可用.
Note : Both jars apache commons io and apache commons lang are always available in hadoop cluster.
这篇关于如何在 hadoop 中序列化对象(在 HDFS 中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何在 hadoop 中序列化对象(在 HDFS 中)
- Eclipse 插件更新错误日志在哪里? 2022-01-01
- 从 finally 块返回时 Java 的奇怪行为 2022-01-01
- value & 是什么意思?0xff 在 Java 中做什么? 2022-01-01
- Jersey REST 客户端:发布多部分数据 2022-01-01
- 将log4j 1.2配置转换为log4j 2配置 2022-01-01
- Safepoint+stats 日志,输出 JDK12 中没有 vmop 操作 2022-01-01
- 如何使用WebFilter实现授权头检查 2022-01-01
- Java包名称中单词分隔符的约定是什么? 2022-01-01
- C++ 和 Java 进程之间的共享内存 2022-01-01
- Spring Boot连接到使用仲裁器运行的MongoDB副本集 2022-01-01