Saturday, May 16, 2015

Designing Serialized Form

Default Serialized Form

By merely adding implements Serializable to a Java class, Java Serialization runtime will be able to serialize an object of that class into binary form. Basically, Java Serialization runtime takes the default strategy to executes the serialization process, hence the output can be called as default serialized form.

public class FavoriteStockWatchList implements Serializable {

   private final int maxSize;

   // StockDetail class is also serializable
   private final Map<Integer, StockDetail> watchList; 

   //watchList and tempRanking are implementation detail
   private int tempRanking;

   public FavoriteStockWatchList(int maxSize) {
       this.maxSize = maxSize;
       this.watchList = new HashMap<>;(maxSize);
   }

   void addStock(String code, int ranking) {
       if (ranking < 1 || ranking > maxSize) {
           throw new IllegalArgumentException("Invalid ranking.");
       }
       if (watchList.containsKey(ranking)) {
           pushDownRanking(ranking);
       }
       this.watchList.put(ranking, new StockDetail(code));
   }

   void pushDownRanking(int ranking) {
       StockDetail stock = watchList.get(ranking);
       if (stock != null) {
           tempRanking = ranking + 1;
           pushDownRanking(tempRanking);
       }
       if (stock != null && tempRanking <= maxSize) {
           this.watchList.put(tempRanking, stock);
       }
   }

   StockDetail getStockDetailByRanking(int ranking) {
       StockDetail stockDetail = watchList.get(ranking);
       // Calling update() method to retrieve statistic and
       // other detail and keep in stockDetail object.
       stockDetail.update();
       return stockDetail;
   }
}

The class above is a simple implementation of a serializable FavoriteStockWatchList. As you can see, besides added implements Serializable, the rest are just ordinary codes and it works as coded. Moreover, the FavoriteStockWatchList object can be serialized to a default serialized form. "Perfect! Everything I want is working fine and I could happily release the FavoriteStockWatchList to the public. Hey, implements serialization is just a piece of cake!" This is exactly how I feel for the first time when I touch on serialization. However, my view changed when I dig in deeper.

The core value for the list item is code and ranking, and it is up to the programmer to implements the way to manages them. In the example above, we use Map of  Integer key (to represents rankingand StockDetail as value. So? What's wrong of using Map? It is not wrong or right matter but is good or bad design. After serialization, the watchList Map becomes part of the serialized form, which actually exposing the internal/private implementation detail to the public. Imagine after some time, we found that Map is not efficient or even is not appropriate and we would like to replace it with different implementation, we have no choice but to continue support watchList Map for those old serialized forms to maintain backward compatibility. In short, by serializing implementation detail, it greatly downgrades the level of flexibility for future change in the class.

The second concern is performance. StockDetail is a serializable class which could also hold other serializable classes such as Statistic class for example. The StockDetail not only is used to hold the code, but also bring out and keeps all the stock information every time getStockDetailByRanking() method is called. Imagine we have 10 StockDetail in the list, and each of them has filled up with all the information. This will give the impact as below,
  • Consuming longer processing time
    • to serialize derived fields in StockDetail
    • to traverse the class hierarchy of each serializable field and tailor the class description accordingly.
  • Serialized form become bulky because of storing extra metadata info and it drags down the speed of object transmission over the network.
Therefore, we should only accept the default serialized form if the instance fields are all core values. Of course, the core values should be carefully designed as well. Serialization limits the flexibility of future change, and hence, the make-it-work-and-refactor-later approach is not efficient at all in serialization.

Custom Serialized Form

We could customize our FavoriteStockWatchList class to determine how does our object should be serialized and deserialized. The output of serialization is still the in binary form physically, but logically, it is now called as custom serialized form. We could customize our class by applying the following techniques,
  • declare instance field to transient, hence, it won't be serialized,
  • use ObjectOutputStream.write<DataType>() methods to write core data into serialized form,
  • use ObjectInputStream.read<DataType>() methods to read core data from serialized form.

public class FavoriteStockWatchList implements Serializable {

   private final int maxSize;

   // Marked as transient. Has to give up 'final'.
   // StockDetail no longer necessary to be a serializable class
   private transient Map<Integer, StockDetail> watchList;

   // Marked as transient. tempRanking is not a core value.
   private transient int tempRanking;

   public FavoriteStockWatchList(int maxSize) {
       this.maxSize = maxSize;
       this.watchList = new HashMap<>(maxSize);
   }

   private void writeObject(ObjectOutputStream out) 
       throws IOException {
       /** 
        * call default serialization process. This will serialize 
        * all non-transient and non-static instance fields as usual.
        * "maxSize", in this example. 
        */
       out.defaultWriteObject(); 
       out.writeInt(watchList.size()); // write list size
       for (Map.Entry<Integer, StockDetail> entry : watchList.entrySet()) {
           out.writeUTF(entry.getValue().getCode()); // write core value - code
           out.writeInt(entry.getKey()); // write core value - ranking
       }
   }

   private void readObject(ObjectInputStream in) 
       throws IOException, ClassNotFoundException {
       /**
        * Call default deserialization process. This will deserialize
        * all non-transient and non-static instance fields as usual.
        * "maxSize", in this example.
        */
       in.defaultReadObject();
       int listSize = in.readInt(); // read list size
       this.watchList = new HashMap<>(maxSize);
       for (int i=0; i<listSize; i++) {
           // read core value and add them as new stock to the list
           addStock(in.readUTF(), in.readInt()); 
       }
   }

   // other methods...
}

Above is a custom version of FavoriteStockWatchList. watchList and tempRanking have been declared as transient in order to be excluded from being serialized. In other words, we have excluded the implementation detail from serialized form. Instead, we write the core values (code and ranking) into serialized form in writeObject() method. This greatly improve the room for future change in the class. For example, we could replace the Map to List to manage the stock list, and this changes won't break the backward compatibility because the core values are still the same. You may doubt that, this didn't help if we change the core value. Yes, if core value has changed, for example, replace integer ranking with alphabet ranking, then we really got to handle the backward compatibility issue. That is why, determining the correct core value is also very important when we design our serializable class.

From the performance perspective, since the complete StockDetail could be retrieved during the runtime, so we only write the code for each list item instead of the whole StockDetail. This significantly slim down the size of serialized form because less class description and field values are written into serialized form. Moreover, the processing time could be shorten as it doesn't need to traverses unnecessary dependent classes to construct object graph.

By the way, either you are using default or custom serialized form, you also have to take care of the security. Read Security in Serialization for more security best practises in serialization.

Proxy Serialized Form

By using writeReplace() and readResolve() method, we could generate proxy serialized form instead of taking the default/custom serialized form for the real object. Below is the proxy version of FavoriteStockWatchList.

public class FavoriteStockWatchList implements Serializable {

   private final int maxSize;

   // Can stay 'final'
   private final Map<Integer, StockDetail> watchList; 

   private int tempRanking;

   public FavoriteStockWatchList(int maxSize) {
       this.maxSize = maxSize;
       this.watchList = new HashMap<>(maxSize);
   }

   void addStock(String code, int ranking) {
       if (ranking < 1 || ranking > maxSize) {
           throw new IllegalArgumentException("Invalid ranking.");
       }
       if (watchList.containsKey(ranking)) {
           pushDownRanking(ranking);
       }
       this.watchList.put(ranking, new StockDetail(code));
   }

   private Object writeReplace() throws ObjectStreamException {
       return new CoreValueProxy(this); // write replacement object
   }

   private static class CoreValueProxy implements Serializable {

       private int maxSize; // core value

       private int[] rankings; // core value - ranking in list

       private String[] codes; // core value - code in list

       private CoreValueProxy(FavoriteStockWatchList instance) { 
           this.maxSize = instance.maxSize;
           Set<Map.Entry<Integer, StockDetail>> entrySet
               = instance.watchList.entrySet();
           int size = entrySet.size();
           rankings = new int[size];
           codes = new String[size];
           int i = 0;
           for (Map.Entry<Integer, StockDetail> entry : entrySet) {
               rankings[i] = entry.getKey();
               codes[i] = entry.getValue().getCode();
               i++;
           }
       }

       private Object readResolve() throws ObjectStreamException {
           // construct real object from serialized proxy form
           FavoriteStockWatchList instance
               = new FavoriteStockWatchList(maxSize);
           for (int i=0; i<codes.length; i++) {
               instance.addStock(codes[i], rankings[i]);
           }
           // return the real object
           return instance; 
       }
   }

   //other methods...
}

Proxy serialized form also having its own advantages. With the proxy pattern in mind, it could help the programmer to easily identify/differentiate the core value and implementation detail. The proxy class will hold the core values, and all the implementation detail will stay in the enclosing class. This is much more easy to pick up, straight forward, and object-oriented compare to writing the core value directly to stream in custom serialized form. Besides, it also giving the advantage of security. Read Security in Serialization for more detail.

As a conclusion, DO NOT simply implements Serializable and accept default serialized form without proper design. You can't see the impact in the short term, but you will feel the pain in the long term. On the other hand, spend more effort in understand and designing the serializable class at the very beginning in order to save cost for supporting the mess in the long term.

Reference:
Book: Effective Java 2nd Edition by Joshua Bloch

No comments: