Sunday, June 14, 2015

Difference between Serializable and Externalizable

java.io.Serializable converts Java object into binary serialized form and vice versa. java.io.Externalizable is an extension of Serializable and hence it is meant to do the same thing. Not forget to mention that, they run on the same serialization framework. What I mean is, we serialize serializable or externalizable object by passing this object to java.io.ObjectOutputStream and deserialize this object by reading it from java.io.ObjectInputStream. So, what is the difference between them? For my point of view, the primary distinction between them is the metadata content that they capture and writes into the stream. Any other differences are actually derived from this.

Processing difference during serialization runtime

Basically, both of them has to construct and write metadata into the stream. This metadata then could be read back in order to re-constitute Java object during deserialization process. The metadata includes the class description for each of the class in the class hierarchy of the Java object that being serialized

For Serializable:

Below is a diagram that illustrates the serialization process against a Serializable Java object.

When a Serializable object is being serialized. Serialization runtime will traverse its class hierarchy from bottom to top, and constructs class description for each of the class in that class hierarchy. This class description contains the following information.
  • Class Identity
    • Stream Unique ID/serialVersionUID
    • Fully Qualified Class Name
  • Serializable Fields Information
    • Number of fields
    • Name and type of fields
  • Others
    • Operational flags
All these class descriptions, a.k.a metadata of the target object is written into the stream. After that, the class hierarchy is being traversed from bottom to top again but this time is to handle the serializable state in each of the class levels. For a particular class level, serializable primitive fields are first written into the stream. Next, if there is any Object field is found then that object field will become the next object to be serialized and the same process is happened to that object field. The same process will be repeated until no more object field is found. At that time, the serializable runtime will move on to the upper level in the class hierarchy and repeat the same process.

For Externalizable:

Below is a diagram that illustrates the serialization process against an Externalizable Java object.

When an Externalizable object is being serialized, the serialization runtime will also traverse its class hierarchy from bottom to top in order to construct class description. However, the class description for the externalizable object does not contain field information.
  • Class Identity
    • Stream Unique ID/serialVersionUID
    • Fully Qualified Class Name
  • Others
    • Operational flags
This is the crucial point that distinguish the Externalizable and Serializable. While Serializable requires fields information in order to reconstruct back the object fields, Externalizable requires programmers to handle the object's state writing and reading by themselves.

Programmatically difference

Serializable is a marker interface 

Serializable needs field information in order to re-constitute back the fields of the target object. This is done at the background and does not require any effort from a programmer. Even a novice Java programmer could create a Serializable class by merely implements `Serializable` interface and serialization just works like a magic! Below is an example of Serializable Person class.

import java.io.Serializable;
import java.util.Date;

public class Person implements Serializable {
    int id;
    String name;
    Date dob;
}

Externalizable is a contractual interface

On another hand, Externalizable never captures field information. The question is, how Externalizable reconstitute back the fields of target object? Externalizable does not want to do the magic thing like Serializable. Instead, it delegates this task to programmer to determine what fields he/she would like to serialize and deserialize. In other words, field data is directly written into/read from stream by calling the Externalizable API that implemented by the class. Below is an example of Externalizable Person class.

import java.io.Externalizable;
import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;
import java.util.Date;

public class Person implements Externalizable {
    int id;
    String name;
    Date dob;

    @Override
    public void writeExternal(ObjectOutput out) throws IOException {
        out.writeInt(id);
        out.writeUTF(name);
        out.writeObject(dob);
    }

    @Override
    public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
        id = in.readInt();
        name = in.readUTF();
        dob = (Date) in.readObject();
    }
}

As you can see, more line of codes is needed to be written to achieve the same thing done by Serializable. By the way, we could have Serializable and Externalizable class (in different class level) in the same class hierarchy of a target object. However, bear in mind that, the existence of Externalizable in any level of the class hierarchy will supersedes Serializable. This means that serialization runtime will expect to call the normal Externalizable methods instead of automatically serialize/deserialize the target object's fields.

Outcome difference

Size different

Obviously, the serialized form generated by Serializable is bigger than Externalizable because it includes fields information. The program below is able to serialize Person object, where this object could be an instance of Serializable or Externalizable Person class that given in the previous section.

private static final String FILENAME = "D:/person.ser";
  
public static void main(String[] args) throws Exception {
    Person person = new Person();
    person.id = 123;
    person.name = "HauChee";
    person.dob = new Date(new Long("1429945971467"));
    
    serialize(person);
    
    File f = new File(FILENAME);
    try (FileChannel channel = FileChannel.open(
      Paths.get("D:/", "person.ser"), StandardOpenOption.READ)) {
        System.out.println(channel.size());
    }
}

public static void serialize(Object obj) throws IOException {
    try (ObjectOutputStream out
      = new ObjectOutputStream(new FileOutputStream(FILENAME))) {
        out.writeObject(obj);
    }
}

The result is the Serializable serialized form has the size of 174 bytes, which bigger than the Externalizable serialized form with the size of 118 bytes. 

Processing speed different

Serializable will take longer time to inspect the object graph to construct the fields information during serialization process. It also take time to digest the fields information in order to construct back the objects during deserialization process. All these are done using Java Reflection, which commonly known with low performance.

Externalizable skips these kind of processing because it does not has the obligation as per Serializable. It is the programmer responsibility to provide the implementation to write/read field data into/from the stream. Serialization runtime basically just make normal method calls of the Externalizable API. As a result, Externalizable performs better in term of processing speed.

Tips:
Well, there is a way for Serializable to generate smaller serialized form and catch up the speed. This can be done by customizing the Serializable serialized form. Read designing serialized form for more detail.

Different in term of side effect toward programmer

Serializable giving out the fish directly

Serialiable is powerful to do all the hard work at the background. As a result, programmer will just take it for granted. Programmer tend to ignore to know more about Serializable because it works by default. They don't know that, there are ways to optimize and proper use of Serializable. Moreover, the problem brought up by inproper use of Serializable is not raised in the first place. Problems alway come after few releases where your classes have grown bigger (performance issue) or old algorithm has be to changed (backward compatibility issue). Read designing serialized form to know how to minimize to chances of getting such problems by designing your Serializable class. 

Externalizable require programmer to do fishing

The thing will not work simply in Externalizable. Programmer has to determine which fields needed to be serialized and which is not. The good part is, programmer is forced to know the fields of the target object's superclasses because they would not be serialized by default. Most of the time, this part has been ignored by programmer when using Serializable, which may causing unnecessary data or duplicate data get written into stream. In short, little or not, Externalizable is giving rooms for programmer to think and design the desired serialized form. 

Serializable or Externalizable?

This post so far seems like making Externalizable the one to be chosen. It avoid unnecessary metadata processing, and forcing programmer to learn and deal with the data serialization. However, Externalizable has a drawback that make you think twice before you use it. A class that has implemented Externalizable will have two additional public methods that callable from the outside world. In fact, these methods are only required in serialization runtime and not meant for other callers. Unintended call to these methods may change the state of the object. Worst case, it could become the weak point to be attacked.

I personally prefer and recommend Serializable. It is able to do what Externalizable could. It provides a few of special private methods for customizing serialized form to become lighten and efficient. Programmer who use Serializable has the options to choose default serialization or custom serialization, or even a mix of both of them. 

One last point that I would like to emphasize here. Using Externalizable does not means that our serialized form is proper designed. In the case where a programmer just write whatever field data into stream, most of the time, this is not help in minimising the impact backward compatibility issue. In short, either using Serializable or Externalizable, we still need to proper design our serialized form for future maintenance sake.

No comments: