Process SequenceFile without Enabling Hadoop Platform

Recently I got a requirement for reading Hadoop’s SequenceFile without enabling Hadoop Platform. However, most examples introduce the read/write SequenceFile with Hadoop Platform. How do I read such files without hadoop?
There’s a tricky solution in this case.
1. Download Hadoop binary file from hadoop site. For Linux/Unix please directly download it; for Windows, there’s pre-built archive file – hadoop-common-2.2.0-bin (source code is here) , created by Abhijit Ghosh.
2. Set environment variable HADOOP_HOME by the directory path (suppose the directory is /usr/local/hadoop in Unix ; or C:/hadoop-common-2.2.0-bin in Windows)
3. Append $HADOOP_HOME/bin to the end of environment variable PATH. ( i.e. /usr/local/hadoop/bin in Unix ; or C:/hadoop-common-2.2.0-bin/bin in Windows
4. Write your program like this (Notice that you have to download hadoop-common-2.2+):

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URLEncoder;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.util.ReflectionUtils;

public class ProcessSequenceFile {
	
	public static void readSequenceFile(String sequenceFileName) throws IOException, URISyntaxException {
		Configuration conf = new Configuration();
		String directoryPath = "file:///";
		directoryPath = URLEncoder.encode(directoryPath, "UTF-8");
		FileSystem fs = FileSystem.get(new URI(directoryPath),conf);
		Path file = new Path(fs.getUri().toString() +  sequenceFileName);
		@SuppressWarnings("deprecation")
		SequenceFile.Reader reader = new SequenceFile.Reader(fs, file, conf);
		Text key = (Text) ReflectionUtils.newInstance(reader.getKeyClass(), conf);
		Text value = (Text) ReflectionUtils.newInstance(reader.getValueClass(), conf);
		while(reader.next(key,value)) {
			System.out.println("Key:" + key);
			System.out.println("=================");			
			System.out.println(value);
			
		}
		
	}
	
	public static void main(String[] args) {
		try {
			// If args[0] is the SequenceFile we need to read.
			readSequenceFile(args[0]);
		} catch (IOException | URISyntaxException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}

Enjoy it!

Reference:
1. Gnosis Runmination, “WIN7下运行hadoop程序报：Failed to locate the winutils binary in the hadoop binary path.” Available: [Online] http://www.cnblogs.com/zq-inlook/p/4386216.html
2. StackOverFlow, “Running Apache Hadoop 2.1.0 on Windows”. Available: [Online] http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows
3. Abhijit Ghosh, “ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path,” SrcCodes.com. Available: [Online] http://www.srccodes.com/p/article/39/error-util-shell-failed-locate-winutils-binary-hadoop-binary-path
4. Hadoop, “Native Libraries Guide.” Available: [Online] https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/NativeLibraries.html#Native_Hadoop_Library

May 21, 2015Leave a Commenthadoop, Java, Programming, 程式設計

彙整

分類

Process SequenceFile without Enabling Hadoop Platform

About the Author

Allen

Leave a Reply Cancel reply

近期文章

近期留言

You may also like these

Math Editor in C#

[WordPress] A Method for combining both of plain and custom URLs

Solve the problem of Causing IIS Express Slow Down

Resource Scheduler , Calculator, Short-Circuit in Hadoop YARN and HDFS