Move the file from local to hdfs
hdfs dfs -copyFromLocal "C:\\Users\\labuser\\Desktop\\MCA54\\EncrypDataProject\\assets\\symca-students.csv" /mca54
Check whether the file is copied to hdfs
hdfs dfs -ls /mca54
Check hadoop user interface (Namenode Status) and the File we’ve copied just now.
Now create one Java Project using vs code (EncryptDataProject
)
Copy jar files from hadoop (hdfs
> common
and mapreduce
dir) and paste in lib folder of java project
Java file EncryptStudentData.java
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class EncryptStudentData {
public static class EncryptMapper extends Mapper<Object, Text, Text, NullWritable> {
private Text encryptedLine = new Text();
private static final int SHIFT = 3;
private String encrypt(String input) {
StringBuilder sb = new StringBuilder();
for (char c : input.toCharArray()) {
if (Character.isUpperCase(c)) {
char enc = (char) ((c - 'A' + SHIFT) % 26 + 'A');
sb.append(enc);
} else if (Character.isLowerCase(c)) {
char enc = (char) ((c - 'a' + SHIFT) % 26 + 'a');
sb.append(enc);
} else if (c == ' ') {
sb.append('_'); // Obfuscate spaces with underscore
} else {
sb.append(c); // Keep punctuation, digits as is
}
}
return sb.toString();
}
@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String encrypted = encrypt(line);
encryptedLine.set(encrypted);
context.write(encryptedLine, NullWritable.get());
}
}
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("Usage: EncryptStudentData <input dir> <output dir>");
System.exit(-1);
}
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Encrypt Student Data");
job.setJarByClass(EncryptStudentData.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(EncryptMapper.class);
job.setNumReduceTasks(0); // Map-only job
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
In Java, specifically within the Apache Hadoop framework,
NullWritable
is a special implementation of theWritable
interface. It serves as a placeholder for a null value when a key or value is not required in the MapReduce paradigm.
Create output
folder inside java project
command to compile EncryptStudentData.java
file
javac --release 8 -cp "lib/*" -d output "src/EncryptStudentData.java"
command to create a jar file for the EncryptStudentData program
jar -cvf src/EncryptStudentData.jar -C output/ .
command to run hadoop jar
hadoop jar C:\\Users\\labuser\\Desktop\\MCA54\\EncrypDataProject\\src\\EncryptStudentData.jar EncryptStudentData /mca54/symca-students.csv /mca54/output/EncryptStudentData
"C:....jar"
is the path of the jar file present in the local system
"/mca54/symca-students.csv"
is the path of the input files present on the hadoop server.
"/mca54/output/EncryptStudentData"
is the path of the output folder where i wish to upload all the output files on the hadoop server.