1. Move the file from local to hdfs

    symca-students.csv

    hdfs dfs -copyFromLocal "C:\\Users\\labuser\\Desktop\\MCA54\\EncrypDataProject\\assets\\symca-students.csv" /mca54
    

    image.png

  2. Check whether the file is copied to hdfs

    hdfs dfs -ls /mca54
    
  3. Check hadoop user interface (Namenode Status) and the File we’ve copied just now.

  4. Now create one Java Project using vs code (EncryptDataProject)

  5. Copy jar files from hadoop (hdfs > common and mapreduce dir) and paste in lib folder of java project

  6. Java file EncryptStudentData.java

    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    import java.io.IOException;
    
    public class EncryptStudentData {
    
        public static class EncryptMapper extends Mapper<Object, Text, Text, NullWritable> {
            private Text encryptedLine = new Text();
    
            private static final int SHIFT = 3;
    
            private String encrypt(String input) {
                StringBuilder sb = new StringBuilder();
                for (char c : input.toCharArray()) {
                    if (Character.isUpperCase(c)) {
                        char enc = (char) ((c - 'A' + SHIFT) % 26 + 'A');
                        sb.append(enc);
                    } else if (Character.isLowerCase(c)) {
                        char enc = (char) ((c - 'a' + SHIFT) % 26 + 'a');
                        sb.append(enc);
                    } else if (c == ' ') {
                        sb.append('_'); // Obfuscate spaces with underscore
                    } else {
                        sb.append(c); // Keep punctuation, digits as is
                    }
                }
                return sb.toString();
            }
    
            @Override
            public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
                String line = value.toString();
                String encrypted = encrypt(line);
                encryptedLine.set(encrypted);
                context.write(encryptedLine, NullWritable.get());
            }
        }
    
        public static void main(String[] args) throws Exception {
            if (args.length != 2) {
                System.out.println("Usage: EncryptStudentData <input dir> <output dir>");
                System.exit(-1);
            }
    
            Configuration conf = new Configuration();
            Job job = Job.getInstance(conf, "Encrypt Student Data");
            job.setJarByClass(EncryptStudentData.class);
    
            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
            job.setMapperClass(EncryptMapper.class);
            job.setNumReduceTasks(0);  // Map-only job
    
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(NullWritable.class);
    
            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    }
    

    In Java, specifically within the Apache Hadoop framework, NullWritable is a special implementation of the Writable interface. It serves as a placeholder for a null value when a key or value is not required in the MapReduce paradigm.

  7. Create output folder inside java project

    1. command to compile EncryptStudentData.java file

      javac --release 8 -cp "lib/*" -d output "src/EncryptStudentData.java"
      

      image.png

    2. command to create a jar file for the EncryptStudentData program

      jar -cvf src/EncryptStudentData.jar -C output/ .
      

      image.png

  8. command to run hadoop jar

    hadoop jar C:\\Users\\labuser\\Desktop\\MCA54\\EncrypDataProject\\src\\EncryptStudentData.jar EncryptStudentData /mca54/symca-students.csv /mca54/output/EncryptStudentData
    

    "C:....jar" is the path of the jar file present in the local system

    "/mca54/symca-students.csv" is the path of the input files present on the hadoop server.

    "/mca54/output/EncryptStudentData" is the path of the output folder where i wish to upload all the output files on the hadoop server.

    image.png

    image.png