Tuesday, May 28, 2013

Download a file from http URL using Java FileChannel - an efficient NIO implementation using Java 7

Java's Channel should always be preferred for IO related stuff because Channel can utilize OS specific optimization while dealing with the files. An input stream can easily be converted to a FileChannel using Channels.newChannel() static factory method.

package org.shunya.power.interview.design14;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URI;
import java.nio.channels.Channels;
import java.nio.channels.FileChannel;
import java.nio.channels.ReadableByteChannel;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.EnumSet;

public class RealHttpDownloader implements HttpDownloader {
    @Override
    public File download(URI uri, String fileName) throws IOException {
        Path path = Paths.get(fileName);
        long totalBytesRead = 0L;
        HttpURLConnection con = (HttpURLConnection) uri.resolve(fileName).toURL().openConnection();
        con.setReadTimeout(10000);
        con.setConnectTimeout(10000);
        try (ReadableByteChannel rbc = Channels.newChannel(con.getInputStream());
             FileChannel fileChannel = FileChannel.open(path, EnumSet.of(StandardOpenOption.CREATE, StandardOpenOption.WRITE));) {
            totalBytesRead = fileChannel.transferFrom(rbc, 0, 1 << 22); // download file with max size 4MB
            System.out.println("totalBytesRead = " + totalBytesRead);
            fileChannel.close();
            rbc.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
        return path.toFile();
    }
}


package org.shunya.power.interview.design14;

import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URI;

public interface HttpDownloader {
    public File download(URI uri, String fileName) throws IOException;
}


FileChannel ustilizes OS specific optimization and hence should provide better performance in general compared to any buffered streams.

Efficient way to calculate Checksum of a file in using Java FileChannel

Few of the times we wish the speed of C and syntax of Java for doing some IO intensive task in Java. Calculation of CRC is one of them task which requires a efficient implementation in order to give good performance.

public static long calculateCRC(File filename) {
     final int SIZE = 16 * 1024;
     try (FileInputStream in = new FileInputStream(filename);) {
         FileChannel channel = in .getChannel();
         CRC32 crc = new CRC32();
         int length = (int) channel.size();
         MappedByteBuffer mb = channel.map(FileChannel.MapMode.READ_ONLY, 0, length);
         byte[] bytes = new byte[SIZE];
         int nGet;
         while (mb.hasRemaining()) {
             nGet = Math.min(mb.remaining(), SIZE);
             mb.get(bytes, 0, nGet);
             crc.update(bytes, 0, nGet);
         }
         return crc.getValue();
     } catch (FileNotFoundException e) {
         e.printStackTrace();
     } catch (IOException e) {
         e.printStackTrace();
     }
     throw new RuntimeException("unknown IO error occurred ");
 }

Java's FileChannel's provide much better performance than the BufferedInputStream and RandomAccessFile classes.

Monday, May 20, 2013

What is Immutable Class. Why would you choose it ? How would you make a class immutable ?


What is Immutable Object

When the state of object can not be changed after its construction then the object is called Immutable.


Why do we need it

Immutable objects are inherently thread-safe, thus help writing multi-threading code without much worries. Immutable questions are meant for multi-threading program. If someone is talking bout immutability then indirectly he is talking about multi-threaded context. Immutable classes are easy to understand, as they possess a single state, which is controlled by their constructor. Immutable objects are good candidate for hash keys because their hashcode can be cached and reused for better performance.


Which Objects should be Immutable

Immutable classes are ideal for representing ADT's (Abstract Data Type) value.
Joshua Bloch suggests that
"All classes should be designed to be immutable unless there is a specific reason not to do so"


Guidelines for Making a class Immutable

1. All fields should be declared final
2. Class itself is declared final so that the derived classes do not make it Mutable.
3. this reference should not be allowed to escape during object construction such as in anonymous inner classes (for example adding action listener)
4. Any field that contains reference to mutable objects (such as arrays, collections, StringBuffer, etc)
i. Are private
ii. Are never returned or exposed to the caller
iii. Are the only reference to the Objects that they refer
iv. Do not change the state of the referenced object after the construction.
v. If mutable fields must be returned to the caller, then a defensive copy should be returned so that the changes do not reflect in the inner data structure.

public List getList() {
     return Collections.unmodifiableList(list); <=== defensive copy of the mutable field before returning it to caller
}
vi. If a mutable Object is passed in the constructor (like an array), then Immutable class should first make a defensive copy of the mutable object before storing its reference.

What is Double Checked Locking Problem in Multi-Threading ?



Double-Checked Locking Problem
In earlier times (prior to JDK 1.6) a simple uncontended synchronization block was expensive and that lead many people to write double-checked locking to write lazy initialization code. The double-checked locking idiom tries to improve performance by avoiding synchronization over the common code path after the helper is allocated. But the DCL never worked because of the limitations of pervious JMM. 

This is now fixed by new JMM (JDK 1.5 onwards) using volatile keyword.
public class Singleton {
    private Singleton() {}
    private static Singleton instance_ = null; == > A global static variable that will hold the state
    public static Singleton instance() {
        if (instance_ == null){ == > un - synchronized access to this fields may see partially constructed objects because of instruction reordering by the compiler or the cache
            synchronized(Singleton.class) {
                if (instance_ == null)
                    instance_ = new Singleton();
            }
        }
        return instance_;
    }
}
JMM will not guarantee the expected execution of this static singleton.


Why above code idiom is broken in current JMM ?
DCL relies on the un synchronized use of _instance field. This appears harmless, but it is not. Suppose Thread A is inside sycnhronized block and it is creating new Singleton instance and assigning to _instance variable, while thread B is just entering the getInstance() method. Consider the effect on memory of this initialization. Memory for the new Singleton object will be allocated; the constructor for Singleton will be called, initializing the member fields of the new object; and the field resource of SomeClass will be assigned a reference to the newly created object. There could be two scenarios now


• Suppose Thread A has completed initialization of _instance and exits synchronized block as thread B enters getInstance(). By this time, the _instance is fully initalized and Thread A has flushed its local memory to main memory (write barriers). Singleton's member fields may refer other objects stored in memory which will also be flushed out.. While Thread B may see a valid reference to the newly created _instance, but because it didn't perform a read barrier, it could still see stale values of _instance's member fields.

• Since thread B is not executing inside a synchronized block, it may see these memory operations in a different order than the one thread A executes. It could be the case that B sees these events in the following order (and the compiler is also free to reorder the instructions like this): allocate memory, assign reference to resource, call constructor. Suppose thread B comes along after the memory has been allocated and the resource field is set, but before the constructor is called. It sees that resource is not null, skips the synchronized block, and returns a reference to a partially constructed Resource! Needless to say, the result is neither expected nor desired.


Fixed double-checked Locking using volatile in new JMM (multi-threaded singleton pattern JDK 1.5)

The following code makes the helper volatile so as to stop the instruction reordering. This code will work with JDK 1.5 onwards only.
class Foo {
    private volatile Helper helper = null;
    public Helper getHelper() {
        if (helper == null) {

            synchronized(this) {
                if (helper == null)
                    helper = new Helper();
            }
        }
        return helper;
    }

}
If Helper is an immutable object, such that all of the fields of Helper are final, then double-checked locking will work without having to use volatile fields. The idea is that a reference to an immutable object (such as a String or an Integer) should behave in much the same way as an int or float; reading and writing references to immutable objects are atomic.


Alternatives to DCL2
Now a days JVM is much smarter and the relative expense of synchronized block over volatile is very less, so it does not really make sense to use DCL for performance reasons. The easiest way to avoid DCL is to avoid it. We can make the whole method synchronized instead of making the code block synchronized.
Another option is to use eager initialization instead of lazy initialization by assigning at the creation time Here is the example demonstrating eager initialization


class MySingleton {
    public static Resource resource = new Resource();
}


Using Initialization On Demand Holder idiom

Inner classes are not loaded until they are referenced. This fact can be used to utilize inner classes for lazy initialization as shown below

public class Something {
    private Something() {}
    private static class LazyHolder {
        private static final Something INSTANCE = new Something();
    }
    public static Something getInstance() {
        return LazyHolder.INSTANCE;
    }
}

And finally using Enum for Thread-Safe Singleton

public enum Singleton{
    INSTANCE;
}

Friday, May 17, 2013

There is a stream of words which contains Anagrams. How would you print anagrams in a single bucket from that stream ?

Sort each word and then see if the two words are equal ? abba, baab, abab should go to the same bucket.
Simple method to check if two Strings are anagrams

public boolean isAnagram(String s1, String s2){ 
char[] a1 = s1.toCharArray(); 
char[] a2 = s2.toCharArray(); 
Arrays.sort(a1); 
Arrays.sort(a2); 
if (Arrays.toString(a1).equals(Arrays.toString(a2))){ 
return true; 
} 
return false; 
}
Algorithm
1) Use a hashmap with string as key and list<string> as value where list of strings contain all anagrams of a given key string.
2) For each word in the input array, create a key by sorting the word and put this word to that list whose key is the sorted word. for example [aakk -> akka, akak] If it does not exist then create a new list with the sorted word as key in map.

3) Print all strings from the list whose key is the input word(sorted string).
Source Code

import java.util.*;

public class Anagrams {
    private static Map<String,List<String>> anagramsMap = new HashMap<>(100);
    public static void main(String[] args) {

        String[] input = {
            "akka", "akak", "baab", "baba", "bbaa"
        };

        for (String s: input) {
            char[] word = s.toCharArray();
            Arrays.sort(word);
            String key = String.valueOf(word);
            if (!anagramsMap.containsKey(key)) {
                anagramsMap.put(key, new ArrayList < String > ());
            }

            anagramsMap.get(key).add(s);
        }

        System.out.println("anagramsMap = " + anagramsMap);
    }
}
Time Complexity
If we ignore the time consumed by sorting an individual string then we can say that the above approach takes Big O(n) time complexity. Otherwise the actual time complexity would be N log N (sorting) + N (compare)

What is difference between Callable and Runnable Interface ?

As per Java documentation 

"Callable interface is similar to Runnable, in that both are designed for classes whose instances are potentially executed by another thread. A Runnable, however, does not return a result and cannot throw a checked exception."


public interface Callable { 
 V call() throws Exception; 
}
In order to convert Runnable to Callable use the following utility method provided by Executors class

Callable callable = Executors.callable(Runnable task);

Callable, however must be executed using a ExecutorService instead of Thread as shown below.

result = exec.submit(aCallable).get();

Submitting a callable to ExecutorService returns Future Object which represents the lifecycle of a task and provides methods to check if the task has been completed or cancelled, retrieve the results and cancel the task.

Here is the source for Future Interface


public interface Future { 
  boolean cancel(boolean mayInterruptIfRunning); 
  boolean isCancelled(); 
  boolean isDone(); 
  V get() throws InterruptedException, ExecutionException; 
  V get(long timeout, TimeUnit unit) 
  throws InterruptedException, ExecutionException, TimeoutException;
  }

Tuesday, May 14, 2013

What do you understand by Big O Notation, Why is it important in software development ?

What do you understand by Big O Notation?

Big O Notation is a mechanism used to measure the relative inefficiencies of Algorithms in terms of space and time. It makes us understand how execution time & memory requirements of an algorithm grow as a function of increasing input size. In this notation, O stands for the Order of magnitude.

Constant O(1)
A program whose running time’s order of growth is constant, executes a fixed number of operations to finish the job, thus its running time does not depend on N.


Linear O(N)
Program that spends a constant amount of time processing each piece of input data and thus running time is proportional to the N.


Following are the examples of Big O, in increasing order of their magnitude.
 
# Big O Notation Name Example
1. O (1) Constant-time Searching from a HashMap, check a number for even/odd
2. O (log n) Logarithmic Find an item inside sorted array using Binary Search
3. O (n) Linear Printing all elements from an array
4. O (n log n) LogLinear Sorting using Merge Sort
5. O (n2) Quadratic Bubble Sorting Algorithm
6. O (2n) Exponential Shortest Path Problem Djigstraw Algorithm
7. O (n!) Factorial Solving Travelling Sales Man Problem

Importance of Big O

We should always keep time efficiencies in mind while designing an algorithm using existing data structures, otherwise there could be sever performance penalties for using wrong data structure for a given scenario.

Base of Logarithm is irrelevant in Big O Notation

The base of algorithm is not relevant with respect to the order of growth, since all logarithms with a constant base are all related by a constant proportion, so log N is used when referring to the order of growth. But also note that base in case of exponent matters, because it makes lot of difference.

Time efficiency in Big O notation for few Java Collections

ArrayList (ignoring the time taken by array resize operation)
O(1) for add, size and get
O(n) for toString() method


PriorityQueue
O(1) for peek, element and size
O(log n) for offer, poll, remove() and add
O(n) for remove(Object) & contains(Object)


HashMap (with no collisions)
O(1) for get operation
O(1) for put operation


LinkedList
O(1) for removal
O(1) for add & poll method
O(n) for toString() method


References
http://en.wikipedia.org/wiki/Big_O_notation

Monday, May 13, 2013

Is it possible to write a method in Java which swaps two int or Integer ?

The answer is No.

For knowing the exact answer you must be knowing how Parameter Passing works in Java.
Incase of primitive int
Parameters to the method are passed by value in Java. In case of primitive data types, a copy of the value is passed to the method, so any changes in the method will not reflect in the calling code.


Incase of Integer Wrapper Class
For objects, the reference to the Object are copied by value to the calling method. If we reassign these reference copies then the changes will not be reflected to the method calling this swap(x,y).
/** This code will never work as intended  **/
public void swap(Integer x, Integer y) {
    Integer tmp = x;
    x = y;
    y = tmp;
}


The only way to have this possible was using some kind of setter on Integer class which could have modified the underlying value. But Java declares all Wrapper classes as Immutable for thread-safety perspective, thus there is no way to swap Integers in Java.


TIP-The called method can't change the caller's variable, although for object reference variables, the called method can change the object the variable referred to.

What are Inheritance Stretegies in JPA ?

JPA defines three inheritance strategies namely, SINGLE_TABLE, TABLE_PER_CLASS and JOINED.

Single table inheritance is default, and table per class is optional so all JPA vendors may not support it. JPA also defines mapped super class concept defined through the @MappedSuperClass annotation. A Mapped Super Class is not a persistent class, but allows a common persistable mapping to be defined for its subclasses.

1. Single Table Inheritance
In this inheritance, a single table is used to store all the instances of the entire inheritance hierarchy. The Table will have a column for every attribute of every class in the hierarchy. Discriminator columns identifies which class a particular row belongs.

2. Table Per Class Inheritance
A table is defined for each concrete class in the inheritance hierarchy to store all the attribute of that class and all its super classes.

3. Joined Table
This inheritance replicates the object model into data model. A table is created for each class in the hierarchy to store only the local attributes of that class.

Notes
Question - We want to extract common behavior in a super class in JPA entities but we do not want to have table for that super class. How would you achieve this ?
Answer - If we create a normal class as the super class, then as per JPA specifications, the fields for that class are not persisted in the database tables. We need to create a super class extracting the common fields and then annotate that class with @MappedSuperClass in order to persist the fields of that super class in subclass tables. A mapped super class has no separate table defined for it.


References
http://en.wikibooks.org/wiki/Java_Persistence/Inheritance

What does Collections.unmodifiableCollection() do ? Is it safe to use the collection returned by this method in a multi-threading environment ?

Collections.unmodifiableCollection() returns a unmodifiable dynamic view of underlying data structure. Any attempt direct or via iterator to modify this view throws UnsupportedOperationException, but any changes made in the underlying data structure will be reflected in the view.
This method is no substitute for the other thread safety techniques because iterating over a collection using this view may throw ConcurrentModificationException if original collection is structurally modified during the iteration.



For example, the following code will throw ConcurrentModificationException in the for loop.


public class UnModifiableCollection {
    private List < String > names = new ArrayList < > ();
    public void testConcurrency() {
        names.add("1");
        names.add("2");
        names.add("3");
        names.add("4");
        Collection < String > dynamicView = Collections.unmodifiableCollection(names);
        for (String s: dynamicView) { <= == will
            throw ConcurrentModification in 2nd iteration
            System.out.println("s = " + s);
            names.remove(0); <= == The culprit line modifying the underlying collection
        }
    }

    public static void main(String[] args) {
        UnModifiableCollection test = new UnModifiableCollection();
        test.testConcurrency();
    }

}

 

Hence, external synchronization is must if we are going to modify the underlying collection.

Saturday, May 11, 2013

A Simple Collection of Complex Objects or the Complex Collection of Simple Objects, which one is better ?

This questions seems quite abstract at the very first glance and its little difficult to find the intent of the interviewer here.

Interviewer is definitely asking something about the data structure and algorithms here.

"A Simple Collection of Complex Objects "
As it states that very complex objects with all kind of business logic inside them are stored in a simple collection like an simple Array.

"Complex Collection of Simple Objects"
Objects are simple in this case but the collection holding these objects is complex in nature like auto-balanced binary tree - red black tree holding collection of employee objects (which is just a POJO with minimal or no logic in it)

Complex collections are build to solve problems of performance & scalability. Thus using a red-black tree we can expect a logarithmic time for locating the next smaller sibling which otherwise is not possible using a simple array of complex objects. Thus its a open question for the discussion with interviewer if he looking for performance & scalability or something else ?

Sunday, May 5, 2013

Things to Take care while landing on the next dream Job

Some day or other we face a Situation when the current relationship with the employer can not work anymore due to various reasons. And then we look for a change. But we must carefully consider the following facts while we are on the move -

  1. Many Big IT Companies sell their configuration and support work at the name of development by offering huge remuneration benefits. If you are really looking for a quality work then prefer a first hand recommendation for the project and company, that might give a better insight into actual work. If that is not feasible, schedule an extra round with your new employer to know how your typical day will look like after joining the company.
  2. In today's connected world, most companies work across geographies and the work might require you to stay in office in odd hours (anytime outside 8AM-5PM). Working in odd hours have long term negative effects on our health and thus must be carefully evaluated. Ultimately humans are not on earth just for work (Hindi has a better word for this work : Naukri).
  3. As our work experience grow, we expect some kind of growth in our work environment. Few people want to grow on managerial side, and others might prefer technical or individual role. So we must we firm on our decision while shifting our Job.
  4. Few projects are real mess, full of chaos and bugs. Its always better to start your new Job with a work which is not matured phase. Otherwise we will loose the fun & creativity in the daily work. Reverse engineering is not the good way to learn Project's business. Moreover on one loves to clean shit of others.
  5. Good practices & methodologies tend to reduce the stress in a project thus making our life easier. If a project is too old in technology then there would be loads of boiler plate code, which might cause concerns to you. So make sure you inquire enough about your new Project.