We are familiar with Hadoop Distributed File System operations such as copyFromLocal, copyToLocal, mv, cp, rmr etc.
Here I am explaining the method to do these operations using Java API. Currently I am explaining the programs to do copyFromLocal and copyToLocal functions only.
Here I used eclipse IDE for programming which is installed in my windows desktop machine.
I have a hadoop cluster. The cluster machines and my destop machine are in the same network.
First create a java project and inside that create a folder named conf. Copy the hadoop configuration files (core-site.xml, mapred-site.xml, hdfs-site.xml) from your hadoop installation to this conf folder.
Create another folder named source which we are using as the input location and put a text file inside that source folder.
One thing you have to remember is that the source and destination locations will be given appropriate permissions. Otherwise read/write will be blocked.
Copying a File from Local to HDFS
The command is
hadoop fs -copyFromLocal
package com.amal.hadoop; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; /** * @author amalgjose * */ public class CopyFromLocal { public static void main(String[] args) throws IOException { Configuration conf =new Configuration(); conf.addResource(new Path("conf/core-site.xml")); conf.addResource(new Path("conf/mapred-site.xml")); conf.addResource(new Path("conf/hdfs=site.xml")); FileSystem fs = FileSystem.get(conf); Path sourcePath = new Path("source"); Path destPath = new Path("/user/training"); if(!(fs.exists(destPath))) { System.out.println("No Such destination exists :"+destPath); return; } fs.copyFromLocalFile(sourcePath, destPath); } }
Copying a File from HDFS to Local
The command is
hadoop fs -copyToLocal
package com.amal.hadoop; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; /** * @author amalgjose * */ public class CopyToLocal { public static void main(String[] args) throws IOException { Configuration conf =new Configuration(); conf.addResource(new Path("conf/core-site.xml")); conf.addResource(new Path("conf/mapred-site.xml")); conf.addResource(new Path("conf/hdfs=site.xml")); FileSystem fs = FileSystem.get(conf); Path sourcePath = new Path("/user/training"); Path destPath = new Path("destination"); if(!(fs.exists(sourcePath))) { System.out.println("No Such Source exists :"+sourcePath); return; } fs.copyToLocalFile(sourcePath, destPath); } }
How to copy a file from One HDFS directory to another HDFS directory on same cluster in a distributed manner using java/scala client?
It is easy. Just call the distcp class and provide the required arguments..You can either make a shell call or you can just copy the logic from the Hadoop source code.