How to read multiple image files as input from hdfs in map-reduce?(如何在 map-reduce 中从 hdfs 读取多个图像文件作为输入?)
问题描述
private static String[] testFiles = new String[] {"img01.JPG","img02.JPG","img03.JPG","img04.JPG","img06.JPG","img07.JPG","img05.JPG"};
// private static String testFilespath = "/home/student/Desktop/images";
private static String testFilespath ="hdfs://localhost:54310/user/root/images";
//private static String indexpath = "/home/student/Desktop/indexDemo";
private static String testExtensive="/home/student/Desktop/images";
public static class MapClass extends MapReduceBase
implements Mapper<Text, Text, Text, Text> {
private Text input_image = new Text();
private Text input_vector = new Text();
@Override
public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException {
System.out.println("CorrelogramIndex Method:");
String featureString;
int MAXIMUM_DISTANCE = 16;
AutoColorCorrelogram.Mode mode = AutoColorCorrelogram.Mode.FullNeighbourhood;
for (String identifier : testFiles) {
try (FileInputStream fis = new FileInputStream(testFilespath + "/" + identifier)) {
//Document doc = builder.createDocument(fis, identifier);
//FileInputStream imageStream = new FileInputStream(testFilespath + "/" + identifier);
BufferedImage bimg = ImageIO.read(fis);
AutoColorCorrelogram vd = new AutoColorCorrelogram(MAXIMUM_DISTANCE, mode);
vd.extract(bimg);
featureString = vd.getStringRepresentation();
double[] bytearray=vd.getDoubleHistogram();
System.out.println("image: "+ identifier + " " + featureString );
}
System.out.println(" ------------- ");
input_image.set(identifier);
input_vector.set(featureString);
output.collect(input_image, input_vector);
}
}
}
public static class Reduce extends MapReduceBase
implements Reducer<Text, Text, Text, Text> {
@Override
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
String out_vector="";
while (values.hasNext()) {
out_vector.concat(values.next().toString());
}
output.collect(key, new Text(out_vector));
}
}
static int printUsage() {
System.out.println("image_mapreduce [-m <maps>] [-r <reduces>] <input> <output>");
ToolRunner.printGenericCommandUsage(System.out);
return -1;
}
@Override
public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), image_mapreduce.class);
conf.setJobName("image_mapreduce");
// the keys are words (strings)
conf.setOutputKeyClass(Text.class);
// the values are counts (ints)
conf.setOutputValueClass(Text.class);
conf.setMapperClass(MapClass.class);
// conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
List<String> other_args = new ArrayList<String>();
for(int i=0; i < args.length; ++i) {
try {
if ("-m".equals(args[i])) {
conf.setNumMapTasks(Integer.parseInt(args[++i]));
} else if ("-r".equals(args[i])) {
conf.setNumReduceTasks(Integer.parseInt(args[++i]));
} else {
other_args.add(args[i]);
}
} catch (NumberFormatException except) {
System.out.println("ERROR: Integer expected instead of " + args[i]);
return printUsage();
} catch (ArrayIndexOutOfBoundsException except) {
System.out.println("ERROR: Required parameter missing from " +
args[i-1]);
return printUsage();
}
}
FileInputFormat.setInputPaths(conf, other_args.get(0));
//FileInputFormat.setInputPaths(conf,new Path("hdfs://localhost:54310/user/root/images"));
FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new image_mapreduce(), args);
System.exit(res);
}
}
`我正在编写一个程序,它将多个图像文件作为输入,存储在 hdfs &提取地图功能中的特征.如何指定在 FileInputStream(一些参数)中读取图像的路径?或者有什么方法可以读取多个图像文件?
`I am writing a program which takes multiple image files as input , stored in hdfs & extract the features in map function. How can I specify the path to read the image in FileInputStream(some parameters)? Or is there any way to read the multiple image files?
我想做的是:--以hdfs中的多个图像文件作为输入-- 在地图功能中提取特征.-- 迭代地减少.请帮助我编写代码或更好的方法.
What I want to do is: --Take multiple image files in hdfs as input -- extract features in map function. --reduce itearatively. Please help me in the code or better ways to do it.
推荐答案
考虑使用 HIPI 库 - 它将图像集合存储到 ImageBundle 中(这比将单个图像文件存储在 HDFS 中更有效).他们也有几个例子.
Look into using the HIPI library - it stores a collection of images into an ImageBundle (which is more efficient that storing the individual image files in HDFS). They have a couple of examples too.
至于您的代码,您需要指定您计划使用的输入和输出格式.没有当前的输入格式可以传递整个文件,但是您可以扩展 FileInputFormat 并创建一个 RecordReader 发出
对,其中键是文件名,值是图像文件的字节数.
As for your code, you need to specify what input and output formats you plan to use. There is no current input format that hands the entire file over, but you can just extend FileInputFormat and create a RecordReader that emits <Text, BytesWritable>
pairs, where the key is the filename, and the value is the bytes of the image file.
事实上 Hadoop - The Definitive Guide 提供了这种精确输入格式的示例:
In fact Hadoop - The Definitive Guide has an example of this exact input format:
这篇关于如何在 map-reduce 中从 hdfs 读取多个图像文件作为输入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何在 map-reduce 中从 hdfs 读取多个图像文件作为输入?
- Java包名称中单词分隔符的约定是什么? 2022-01-01
- Spring Boot连接到使用仲裁器运行的MongoDB副本集 2022-01-01
- 将log4j 1.2配置转换为log4j 2配置 2022-01-01
- C++ 和 Java 进程之间的共享内存 2022-01-01
- Jersey REST 客户端:发布多部分数据 2022-01-01
- 从 finally 块返回时 Java 的奇怪行为 2022-01-01
- 如何使用WebFilter实现授权头检查 2022-01-01
- value & 是什么意思?0xff 在 Java 中做什么? 2022-01-01
- Safepoint+stats 日志,输出 JDK12 中没有 vmop 操作 2022-01-01
- Eclipse 插件更新错误日志在哪里? 2022-01-01