We are familiar with Hadoop Distributed File System operations such as copyFromLocal, copyToLocal, mv, cp, rmr etc.
Here I am explaining the method to do these operations using Java API. Currently I am explaining the programs to do copyFromLocal and copyToLocal functions only.
Here I used eclipse IDE for programming which is installed in my windows desktop machine.
I have a hadoop cluster. The cluster machines and my destop machine are in the same network.
First create a java project and inside that create a folder named conf. Copy the hadoop configuration files (core-site.xml, mapred-site.xml, hdfs-site.xml) from your hadoop installation to this conf folder.
Create another folder named source which we are using as the input location and put a text file inside that source folder.
One thing you have to remember is that the source and destination locations will be given appropriate permissions. Otherwise read/write will be blocked.
Copying a File from Local to HDFS
The command is
hadoop fs -copyFromLocal
package com.amal.hadoop;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
/**
* @author amalgjose
*
*/
public class CopyFromLocal {
public static void main(String[] args) throws IOException {
Configuration conf =new Configuration();
conf.addResource(new Path("conf/core-site.xml"));
conf.addResource(new Path("conf/mapred-site.xml"));
conf.addResource(new Path("conf/hdfs=site.xml"));
FileSystem fs = FileSystem.get(conf);
Path sourcePath = new Path("source");
Path destPath = new Path("/user/training");
if(!(fs.exists(destPath)))
{
System.out.println("No Such destination exists :"+destPath);
return;
}
fs.copyFromLocalFile(sourcePath, destPath);
}
}
Copying a File from HDFS to Local
The command is
hadoop fs -copyToLocal
package com.amal.hadoop;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
/**
* @author amalgjose
*
*/
public class CopyToLocal {
public static void main(String[] args) throws IOException {
Configuration conf =new Configuration();
conf.addResource(new Path("conf/core-site.xml"));
conf.addResource(new Path("conf/mapred-site.xml"));
conf.addResource(new Path("conf/hdfs=site.xml"));
FileSystem fs = FileSystem.get(conf);
Path sourcePath = new Path("/user/training");
Path destPath = new Path("destination");
if(!(fs.exists(sourcePath)))
{
System.out.println("No Such Source exists :"+sourcePath);
return;
}
fs.copyToLocalFile(sourcePath, destPath);
}
}
How to copy a file from One HDFS directory to another HDFS directory on same cluster in a distributed manner using java/scala client?
It is easy. Just call the distcp class and provide the required arguments..You can either make a shell call or you can just copy the logic from the Hadoop source code.