Friday, June 26, 2015

Java NIO Buffer

Buffer Introduction

Input/Output or in short form, I/O is meant for data or signal transmission. Traditional Java IO, java.io.InputStream and java.io.OutputStream has been fulfilling this purpose, but they leave out the raw binary data in a fixed size byte array. While the byte array is treated as data container or temporary staging area but it is up to us to implement the way in manipulating the byte array. Direct manipulating the byte array is always not a wise approach especially when data involve multi-bytes characters. Read more to know why in my other post Encoding and Decoding.

java.nio.Buffer is one of the key abstractions of Java NIO that was introduced since Java version 4. While it retains the same notion of fixed size data container, it also becomes much more efficient with the capabilities below.
  • It encapsulates the way of accessing the backing data. Direct or non-direct.
  • It provides a rich set of operations for manipulating the backing data by making use of its buffer attributes.
  • Primitive type buffer classes make data transferring easier as it is done in respective primitive data type size.
There is no doubt that Buffer complements the way we dealing with byte array in the old way. Moreover, it becomes the base of other key attractions of Java NIO. For example, java.nio.charset.Charset takes Buffer for encoding or decoding;  java.nio.Channel takes Buffer for writing or reading. Therefore, Buffer is the good starting point in learning Java NIO.

Working with Buffer

In order to working with Buffer effectively, we must understand what and how the buffer attributes work.

Buffer attributes

Capacity: The maximum number of elements that the buffer could hold. Capacity is set when a buffer object is created. It cannot be negative and it can never change. Trying to read/write data into an element with the index greater than the capacity will get java.nio.BufferOverflowException and java.bio.BufferUnderflowException respectively.

Limit: The first element index where this element could not be read or written. It indicates the end of data access. Read/write to the element with the index greater or equal to the limit will get java.lang.IndexOutOfBoundException. The initial limit is equal to capacity. It cannot be negative and never greater than capacity.

Position: The first element index where this element would be read or written. It acts like a moving pointer which will increments automatically after every read or write operation. This movement is only in one direction. It cannot be negative and never greater than the limit.

Mark: The index of the current position is captured. So that the position pointer could be reset or revisits to that captured index. Initially, it is undefined. Existing mark will be discarded if the position moves to an index smaller than the mark in any case. Trying to reset to undefined mark will get java.bio.InvalidMarkException.

The rule below simplifies the conditions mentioned above. The relationship between attributes is invariant and always holds.

0 <= mark <= position <= limit <= capacity

The diagram on the left illustrates the default state of buffer attributes in a newly created buffer object.





When a data is written into the buffer, the position will increment automatically in a linear way and always in one direction. Same applies to reading operation.





Creating buffer

Each primitive data type has their respective buffer class except boolean.


None of them can be instantiated directly. They are all abstract classes but we could create specific primitive type buffer object by calling the static factory method wrap() or allocate().

Wrapper buffers

wrap(<primitive-data-type>[] array)

This method will create a buffer object, which wraps the given existing primitive data array as the backing array. Below is an example of creating a byte type buffer object. You can create other type of buffer object by calling that specific primitive buffer class wrap() method.

public static void main(String[] args) {
   ByteBuffer byteBuffer = ByteBuffer.allocate(5);
   byte[] array = byteBuffer.array(); // Get the backing array

   array[3] = (byte) 30; // Modify on the original array
   printBytes(byteBuffer); // Print buffer's data content

   byteBuffer.put(4, (byte)90); // Modify buffer's data content
   printBytes(array); // Print original array
}

public static void printBytes(ByteBuffer byteBuffer) {
   System.out.print("Print buffer's data content: ");
   byteBuffer.clear();
   while (byteBuffer.hasRemaining()) {
       System.out.printf("%d ", byteBuffer.get());
   }
   System.out.println("");
}

public static void printBytes(byte[] bytes) {
   System.out.print("Print original array: ");
   for (byte b : bytes) {
       System.out.printf("%d ", b);
   }
   System.out.println("");
}

Print buffer's data content: 0 0 0 30 0
Print original array: 0 0 0 30 90
One important point to take note is that, calling the array() method will return the backing array. Modification on this array will cause the backing array in the buffer object to be modified, vice versa.

Direct buffers

allocateDirect(int capacity)

This method only exist in ByteBuffer. The buffer object that is created using this method will attempt to avoid copying the buffer's content to (or from) an intermediate buffer before (or after) each invocation of one of the underlying operating system's native I/O operations. Direct buffer offers better performance compare to non-direct buffer. However, it comes along with drawbacks.
  • Higher allocation and deallocation cost compare to non-direct buffer.
  • The buffer content could reside outside of the normal garbage collected heap. Hence, its impact upon the memory footprint might not obvious.
Therefore, it is recommended to only use direct buffer when changing from the non-direct buffer to direct buffer giving obvious performance improvement to your application.

View buffers

as<primitive-data-type>Buffer()

These methods also only exist in ByteBuffer. They will create the respective primitive type of view buffer object. Originally, the backing array of the byte buffer object is indexed in term of byte. Turn it to the primitive type of view buffer object allow the backing array to be indexed in term of that specific primitive data size. Below is an example of creating integer view buffer from a byte buffer.

public static void main(String[] args) {
   ByteBuffer byteBuffer = ByteBuffer.allocate(16);
   // Write data into buffer. This will take up 4 initial bytes of the backing array.
   // After writing, the position now is 4.
   byteBuffer.put(new byte[] {1, 2, 3, 4}); 
   System.out.println("Print byteBuffer info: ");
   printInfo(byteBuffer);

   IntBuffer intBuffer = byteBuffer.asIntBuffer(); // create integer view buffer.
   System.out.println("Print intBuffer info: ");
   // The backing array of this integer view buffer start at the byte buffer position, 
   // which is 4.
   // One int equals to 4 bytes. Hence, the new capacity is 3.
   printInfo(intBuffer); 
   printIntBuffer(intBuffer);

   // Change on the integer view buffer backing array will
   // cause the byte buffer content to be modified. Vice versa.
   // The buffer attributes of both buffer are independant.
   intBuffer.put(11);
   intBuffer.put(22);
   intBuffer.put(33);

   printByteBuffer(byteBuffer);
}

public static void printInfo(Buffer buffer) {
   System.out.println("> Capacity: " + buffer.capacity());
   System.out.println("> Position: " + buffer.position());
   System.out.println("> Limit: " + buffer.limit());
}

public static void printByteBuffer(ByteBuffer byteBuffer) {
   System.out.print("Print byteBuffer content: \n> ");
   byteBuffer.rewind();
   while (byteBuffer.hasRemaining()) {
       System.out.printf("%d ", byteBuffer.get());
   }
   byteBuffer.rewind();
   System.out.println("");
}

public static void printIntBuffer(IntBuffer intBuffer) {
   System.out.print("Print intBuffer content: n> ");
   intBuffer.rewind();
   while (intBuffer.hasRemaining()) {
       System.out.printf("%d ", intBuffer.get());
   }
   intBuffer.rewind();
   System.out.println("");
}

Print byteBuffer info:
> Capacity: 16
> Position: 4
> Limit: 16
Print intBuffer info:
> Capacity: 3
> Position: 0
> Limit: 3
Print intBuffer content:
> 0 0 0
Print byteBuffer content:
> 1 2 3 4 0 0 0 11 0 0 0 22 0 0 0 33
Besides ByteBuffer, other primitive type buffer does not provides method to creates direct buffer. However, this can be done indirectly by creating view buffer from a direct ByteBuffer object.

ByteBuffer byteBuffer = ByteBuffer.allocateDirect(16); // direct byte buffer
IntBuffer intBuffer = byteBuffer.asIntBuffer(); // direct int buffer

Flipping

flip() - An operation that is normally used to make the buffer ready for reading after writing. In this operation, the limit is set to the current position and then the position is set to zero. If the mark is defined then it is discarded. The diagrams below demonstrate how this operation causing the changing of buffer attribute.

Before flipping.




After flipping.







Rewind

rewind() - An operation that is used to make the buffer ready for re-reading. In this operation, the limit is unchanged and the position is set to zero. If the mark is defined then it is discarded. The diagrams below demonstrate how this operation causing the changing of buffer attribute.

Before rewind.






After rewind.







Clear

clear() - An operation that is used to reset buffer attributes to the default state, hence the buffer is ready for read/write again. Note, this operation will not clear the content. In this operation, the limit is set to the capacity, the position is set to zero, and if the mark is defined then it is discarded. The diagrams below demonstrate how this operation causing the changing of buffer attribute.

Before clear.






After clear







Mark and reset

mark() - An operation that is used to captures and remembers the current position.







reset() - An operation that is used to set the position to the mark. The diagrams below demonstrate how this operation causing the changing of buffer attribute.

Before reset.






After reset.







Compact

compact() - An operation that is used in conjunction with continuous series of buffer reading and writing operation. It serves the situation where buffer content is not completely read, but next writing is starting. In this operation, the bytes between the buffer's current position and its limit, if any, are copied to the beginning of the buffer. That is, the byte at index p = position() is copied to index zero, the byte at index p + 1 is copied to index one, and so forth until the byte at index limit() - 1 is copied to index n = limit() - 1 - p. The buffer's position is then set to n+1 and its limit is set to its capacity. The mark, if defined, is discarded.

public static void main(String[] args) {
   ByteBuffer byteBuffer = ByteBuffer.allocate(20);

   byte[][] dataToBeWroteIntoBuffer = new byte[][]{
       {10, 20, 30, 40, 50, 60, 70, 80, 90, 100}, 
       {11, 22, 33, 44, 55, 66, 77, 88, 99, 111}
   };
   byte[] dataToBeReadFromBuffer = new byte[5];
   for (int i = 0; i <= 1; i++) {
       System.out.println("Write data into buffer.");
       byteBuffer.put(dataToBeWroteIntoBuffer[i]);
       printBytes(byteBuffer.array());
       byteBuffer.flip();

       System.out.println("Read data from buffer.");
       // read partially. Only 5 elements are read.
byteBuffer.get(dataToBeReadFromBuffer); System.out.println("Compact buffer."); byteBuffer.compact(); printBytes(byteBuffer.array()); } } public static void printBytes(byte[] bytes) { for (byte b : bytes) { System.out.printf("%d ", b); } System.out.println(""); }

Write data into buffer.
10 20 30 40 50 60 70 80 90 100 0 0 0 0 0 0 0 0 0 0
Read data from buffer.
Compact buffer.
60 70 80 90 100 60 70 80 90 100 0 0 0 0 0 0 0 0 0 0
Write data into buffer.
60 70 80 90 100 11 22 33 44 55 66 77 88 99 111 0 0 0 0 0
Read data from buffer.
Compact buffer.
11 22 33 44 55 66 77 88 99 111 66 77 88 99 111 0 0 0 0 0
The code example above always reads partial content from the buffer and misses out the unread data. compact() in this case, helps to move the unread data to the beginning of the buffer content, and sub-sequence immediate data writing will not overwrite the unread data. Then, the unread data could be read in next reading operation. As you can see from the result above, data 10 to 100 has been fully read out from the buffer.

References:
http://docs.oracle.com/javase/7/docs/api/java/nio/Buffer.html http://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html http://howtodoinjava.com/2015/01/15/java-nio-2-0-working-with-buffers/ http://www.javaworld.com/article/2075575/core-java/core-java-master-merlin-s-new-i-o-classes.html http://tutorials.jenkov.com/java-nio/index.html

Monday, June 15, 2015

Bitwise Operators

Using bitwise operators may look complicated, especially if we mixed them up with short-circuit logical operators (&&, ||). In fact, they work differently and serving different purposes. The coolest benefit that bitwise operators offering is that it allow us to specify a set of flags/setting in just a single byte. This undoubtedly important during the days where memory is costly. Even until today, some of the Java APIs are still making use of bit field such as configuration flags in java.util.regex.Pattern and operation set in java.nio.SelectionKey. Therefore, knowing how to use bitwise operators allow us to manipulate the bit field and make use of the API efficiently.

Bitwise OR |

Only return 0 when both operands are 0, otherwise return 1. This characteristic is commonly used to turn on a particular bit (to be specific, set 1 to a bit) in a bit field.

Existing bit field 1 0 1 0
|
Bit Mask 1 1 0 0
=
Outcome 1 1 1 0

The outcome will be the new bit field that override existing bit field. As you can see in the new bit field's state, the second bit in the highlighted column has been turned on.

Bitwise XOR ^

Return 0 when both operands are same, otherwise return 1. This characteristic is an extension of bitwise OR, where it can be used to turn on a bit. But it will turn off the existing bit if the bit is currently on.

Existing bit field 1 0 1 0
^
Bit Mask 1 1 0 0
=
Outcome 0 1 1 0

Again, the outcome will be the new bit field that override existing bit field. And the new bit field's state shows that first bit has been turned off and the second bit has been turned on.

Bitwise AND &

Only return 1 when both operands are 1, otherwise return 0. This characteristic is commonly used to check the state of a bit.

Existing bit field 1 0 1 0
&
Bit Mask 1 1 0 0
=
Outcome 1 0 0 0

The outcome, in this case, is not meant for overriding the existing bit field. Instead, it reveals the status of the interested bit based on the bitmask. In the example above, outcome shows that the first bit in bit field is currently on.

Bitwise NOT ~

To invert a bit. From 1 to 0, or 0 to 1. It commonly used together with bitwise AND operator to un-set/turn off a bit. ~1 = 0 ~0 = 1

Using Bitwise Operators in Java

The characteristic of each bitwise operators is clear now. Below is the examples of using bitwise operators in Java. First of all, we need a set of numeric constants. There are a few ways to declare them.

Assigning Decimal Value

public static final byte RED = 1; //0000 0001
public static final byte ORANGE = 2; //0000 0010
public static final byte YELLOW = 4; //0000 0100
public static final byte GREEN = 8; //0000 1000
public static final byte BLUE = 16; //0001 0000
public static final byte INDIGO = 32; //0010 0000
public static final byte VIOLET = 64; //0100 0000

Using Bit Shift Operator

public static final byte RED = 2 >> 1; //0000 0001
public static final byte ORANGE = 2 << 0; //0000 0010
public static final byte YELLOW = 2 << 1; //0000 0100
public static final byte GREEN = 2 << 2; //0000 1000
public static final byte BLUE = 2 << 3; //0001 0000
public static final byte INDIGO = 2 << 4; //0010 0000
public static final byte VIOLET = 2 << 5; //0100 0000

Assigning Binary Value

public static final byte RED = 0b00000001;
public static final byte ORANGE = 0b00000010;
public static final byte YELLOW = 0b00000100;
public static final byte GREEN = 0b00001000;
public static final byte BLUE = 0b00010000;
public static final byte INDIGO = 0b00100000;
public static final byte VIOLET = 0b01000000;

All 3 set of constants above are giving the same value. It is up to you to choose which one giving you the best picture. One more important point to take away is that the max amount of constant variables is depend on the max positive size of the data type. I am using byte for the example above. 1 byte equals to 8 bits and the max positive size of byte is 127. Therefore, we could only have 7 constant variables. In short, it is up to you to choose the data type that's fit your needs.

Below is the code example in using bitwise operators to manipulate bitField.

byte bitField = 0;

/**************
 * Bitwise OR 
 **************/
// Turn on ORANGE bit in bitField.
bitField |= ORANGE; 
// bitField becomes 00000010. We got ORANGE now. 
System.out.println(Integer.toBinaryString(bitField)); 

// Turn on more bits in bitField. 
bitField |= BLUE | VIOLET; 
// bitField becomes 01010010. We got VIOLET, BLUE and ORANGE now.
System.out.println(Integer.toBinaryString(bitField)); 

/**************
 * Bitwise XOR 
 **************/
// Turn on YELLOW bit in bitField.
bitField ^= YELLOW; 
// bitField becomes 01010110. We got VIOLET, BLUE, YELLOW, and ORANGE now.
System.out.println(Integer.toBinaryString(bitField)); 

// Turn off BLUE bit in bitField.
bitField ^= BLUE; 
// bitField becomes 01000110. We got VIOLET, YELLOW, and ORANGE now.
System.out.println(Integer.toBinaryString(bitField)); 

/**************
 * Bitwise AND 
 **************/
// Check if the desired bit flag is on.
boolean isExist = (bitField & ORANGE) == ORANGE ? true : false; 
// isExist = true. ORANGE is exist in the bitField.
System.out.println(isExist); 

/***********************
 * Bitwise NOT with AND 
 ***********************/
// Turn off VIOLET bit in bitField.
bitField &= ~VIOLET; 
// bitField becomes 00000110. We got YELLOW, and ORANGE now.
System.out.println(Integer.toBinaryString(bitField)); 

Disadvantages of using Bit Field

Although bit field is a useful technique, but it come along with disadvantages as below.
  • Hard to interpret bit field by reading numbers.
  • No type safe as it using constant as bit masks.
  • No easy way to iterate through the bit field. 
Since version 5, Java introduced java.util.EnumSet class which could effectively replace bit field technique and avoids from all bit field advantages mentioned above. We could use EnumSet to do the same things in the Colour bit field example above. In addition, EnumSet provides a rich set of java.util.Set functionalities, type safe, and its performance is comparable to the bit field as it internally represented in with a single long.

In conclusion, familiar with bit field technique so we could use the old Java API (those still require bit field as a parameter) properly. But moving forward, make use of EnumSet in our new methods to gain all the benefits given by EnumSet.

References:
http://www.vipan.com/htdocs/bitwisehelp.html
Book: Effective Java Second Edition by Joshua Bloch

Sunday, June 14, 2015

Difference between Serializable and Externalizable

java.io.Serializable converts Java object into binary serialized form and vice versa. java.io.Externalizable is an extension of Serializable and hence it is meant to do the same thing. Not forget to mention that, they run on the same serialization framework. What I mean is, we serialize serializable or externalizable object by passing this object to java.io.ObjectOutputStream and deserialize this object by reading it from java.io.ObjectInputStream. So, what is the difference between them? For my point of view, the primary distinction between them is the metadata content that they capture and writes into the stream. Any other differences are actually derived from this.

Processing difference during serialization runtime

Basically, both of them has to construct and write metadata into the stream. This metadata then could be read back in order to re-constitute Java object during deserialization process. The metadata includes the class description for each of the class in the class hierarchy of the Java object that being serialized

For Serializable:

Below is a diagram that illustrates the serialization process against a Serializable Java object.

When a Serializable object is being serialized. Serialization runtime will traverse its class hierarchy from bottom to top, and constructs class description for each of the class in that class hierarchy. This class description contains the following information.
  • Class Identity
    • Stream Unique ID/serialVersionUID
    • Fully Qualified Class Name
  • Serializable Fields Information
    • Number of fields
    • Name and type of fields
  • Others
    • Operational flags
All these class descriptions, a.k.a metadata of the target object is written into the stream. After that, the class hierarchy is being traversed from bottom to top again but this time is to handle the serializable state in each of the class levels. For a particular class level, serializable primitive fields are first written into the stream. Next, if there is any Object field is found then that object field will become the next object to be serialized and the same process is happened to that object field. The same process will be repeated until no more object field is found. At that time, the serializable runtime will move on to the upper level in the class hierarchy and repeat the same process.

For Externalizable:

Below is a diagram that illustrates the serialization process against an Externalizable Java object.

When an Externalizable object is being serialized, the serialization runtime will also traverse its class hierarchy from bottom to top in order to construct class description. However, the class description for the externalizable object does not contain field information.
  • Class Identity
    • Stream Unique ID/serialVersionUID
    • Fully Qualified Class Name
  • Others
    • Operational flags
This is the crucial point that distinguish the Externalizable and Serializable. While Serializable requires fields information in order to reconstruct back the object fields, Externalizable requires programmers to handle the object's state writing and reading by themselves.

Programmatically difference

Serializable is a marker interface 

Serializable needs field information in order to re-constitute back the fields of the target object. This is done at the background and does not require any effort from a programmer. Even a novice Java programmer could create a Serializable class by merely implements `Serializable` interface and serialization just works like a magic! Below is an example of Serializable Person class.

import java.io.Serializable;
import java.util.Date;

public class Person implements Serializable {
    int id;
    String name;
    Date dob;
}

Externalizable is a contractual interface

On another hand, Externalizable never captures field information. The question is, how Externalizable reconstitute back the fields of target object? Externalizable does not want to do the magic thing like Serializable. Instead, it delegates this task to programmer to determine what fields he/she would like to serialize and deserialize. In other words, field data is directly written into/read from stream by calling the Externalizable API that implemented by the class. Below is an example of Externalizable Person class.

import java.io.Externalizable;
import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;
import java.util.Date;

public class Person implements Externalizable {
    int id;
    String name;
    Date dob;

    @Override
    public void writeExternal(ObjectOutput out) throws IOException {
        out.writeInt(id);
        out.writeUTF(name);
        out.writeObject(dob);
    }

    @Override
    public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
        id = in.readInt();
        name = in.readUTF();
        dob = (Date) in.readObject();
    }
}

As you can see, more line of codes is needed to be written to achieve the same thing done by Serializable. By the way, we could have Serializable and Externalizable class (in different class level) in the same class hierarchy of a target object. However, bear in mind that, the existence of Externalizable in any level of the class hierarchy will supersedes Serializable. This means that serialization runtime will expect to call the normal Externalizable methods instead of automatically serialize/deserialize the target object's fields.

Outcome difference

Size different

Obviously, the serialized form generated by Serializable is bigger than Externalizable because it includes fields information. The program below is able to serialize Person object, where this object could be an instance of Serializable or Externalizable Person class that given in the previous section.

private static final String FILENAME = "D:/person.ser";
  
public static void main(String[] args) throws Exception {
    Person person = new Person();
    person.id = 123;
    person.name = "HauChee";
    person.dob = new Date(new Long("1429945971467"));
    
    serialize(person);
    
    File f = new File(FILENAME);
    try (FileChannel channel = FileChannel.open(
      Paths.get("D:/", "person.ser"), StandardOpenOption.READ)) {
        System.out.println(channel.size());
    }
}

public static void serialize(Object obj) throws IOException {
    try (ObjectOutputStream out
      = new ObjectOutputStream(new FileOutputStream(FILENAME))) {
        out.writeObject(obj);
    }
}

The result is the Serializable serialized form has the size of 174 bytes, which bigger than the Externalizable serialized form with the size of 118 bytes. 

Processing speed different

Serializable will take longer time to inspect the object graph to construct the fields information during serialization process. It also take time to digest the fields information in order to construct back the objects during deserialization process. All these are done using Java Reflection, which commonly known with low performance.

Externalizable skips these kind of processing because it does not has the obligation as per Serializable. It is the programmer responsibility to provide the implementation to write/read field data into/from the stream. Serialization runtime basically just make normal method calls of the Externalizable API. As a result, Externalizable performs better in term of processing speed.

Tips:
Well, there is a way for Serializable to generate smaller serialized form and catch up the speed. This can be done by customizing the Serializable serialized form. Read designing serialized form for more detail.

Different in term of side effect toward programmer

Serializable giving out the fish directly

Serialiable is powerful to do all the hard work at the background. As a result, programmer will just take it for granted. Programmer tend to ignore to know more about Serializable because it works by default. They don't know that, there are ways to optimize and proper use of Serializable. Moreover, the problem brought up by inproper use of Serializable is not raised in the first place. Problems alway come after few releases where your classes have grown bigger (performance issue) or old algorithm has be to changed (backward compatibility issue). Read designing serialized form to know how to minimize to chances of getting such problems by designing your Serializable class. 

Externalizable require programmer to do fishing

The thing will not work simply in Externalizable. Programmer has to determine which fields needed to be serialized and which is not. The good part is, programmer is forced to know the fields of the target object's superclasses because they would not be serialized by default. Most of the time, this part has been ignored by programmer when using Serializable, which may causing unnecessary data or duplicate data get written into stream. In short, little or not, Externalizable is giving rooms for programmer to think and design the desired serialized form. 

Serializable or Externalizable?

This post so far seems like making Externalizable the one to be chosen. It avoid unnecessary metadata processing, and forcing programmer to learn and deal with the data serialization. However, Externalizable has a drawback that make you think twice before you use it. A class that has implemented Externalizable will have two additional public methods that callable from the outside world. In fact, these methods are only required in serialization runtime and not meant for other callers. Unintended call to these methods may change the state of the object. Worst case, it could become the weak point to be attacked.

I personally prefer and recommend Serializable. It is able to do what Externalizable could. It provides a few of special private methods for customizing serialized form to become lighten and efficient. Programmer who use Serializable has the options to choose default serialization or custom serialization, or even a mix of both of them. 

One last point that I would like to emphasize here. Using Externalizable does not means that our serialized form is proper designed. In the case where a programmer just write whatever field data into stream, most of the time, this is not help in minimising the impact backward compatibility issue. In short, either using Serializable or Externalizable, we still need to proper design our serialized form for future maintenance sake.

Inheritance of Serializable Class

Does my subclass inherit serialVersionUID of its serializable parent class?

The signature of serialVersionUID (SUID) is as below:

<ANY-ACCESS-MODIFIER> static final long serialVersionUID = <value>;

We are free to use any access modifier for SUID. Let's say, if we use protected access modifier, will it be inherited to the subclass? Check out the following code example. Guess what is the SUID value for Employee class?

class Person implements Serializable {
    protected final static long serialVersionUID = 1L;
}

class Employee extends Person {
}

By using serialver tool, we can inspect the SUID of Employee class.


"Employee" does not inherit the 1L SUID from its parent class. It is associated with a default SUID calculated by Java Serialization runtime. Although we are allowed to put any access modifier for SUID, but strongly recommended to only use the private access modifier. Mainly because SUID will not be inherited by subclass and it is useless for subclass. Moreover, it is declared as final, once it been initiated, its value can't be changed. Therefore, no point to declare it public or default as well. Read The Effect of serialVersionUID to know more about serialVersionUID.

Can I have a serializable subclass that extends non-serializable parent class?

We could have a serializable subclass that extends to a non-serializable parent class. Refer to the code example below:

class SerializationInheritanceExample {
    private static final String EMPLOYEE = "employee.ser";
    public static void main(String[] args) throws FileNotFoundException,
            IOException, ClassNotFoundException {
        
        Employee employee = new Employee();
        employee.name = "HauChee";
        employee.salary = 10000;
        
        try (ObjectOutputStream oos
                = new ObjectOutputStream(new FileOutputStream(EMPLOYEE))) {
            oos.writeObject(employee);
            oos.flush();
        }
        
        try (ObjectInputStream ois
                = new ObjectInputStream(new FileInputStream(EMPLOYEE))) {
            employee = (Employee) ois.readObject();
            // print Name: null, Salary: 10000
            System.out.printf("Name: %s, Salary: %.0f \n", 
                    employee.name, employee.salary);
        }
    }
}

class Person {
    String name;
}

class Employee extends Person implements Serializable {
    private static final long serialVersionUID = 1L;
    double salary;
}

Although "Employee" is an instance of "Person", but only "Employee's salary" field is serializable, not the name field which inherited from "Person". It is the serializable subclass's responsibility to saves and restores the non-serializable parent class's fields. This can be done by implementing special methods in Employee class. Java Serialization runtime will execute these special methods accordingly once it detected they are defined in the serializable class.

class Employee extends Person implements Serializable {
    
    private static final long serialVersionUID = 1L;
    
    double salary;
    
    // Special method to save fields into stream
    private void writeObject(ObjectOutputStream out) throws IOException {
        out.defaultWriteObject();
        out.writeObject(name); // save non-serializable name field
    }
    
    // Special method to restore fields from stream
    private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {
        in.defaultReadObject();
        name = (String) in.readObject(); // restore non-serialization name field
    }
}

Run the SerializationInheritanceExample again, and this time it will prints Name: HauChee, Salary: 10000. Everything working just fine. But actually to make this working, there is a prerequisite condition. The non-serializable parent class MUST have an accessible no-arg constructor to initializes the class's state. In the code example above, we do not specify any constructor for Person class. So it will by default having a public no-arg constructor, which meet the prerequisite condition. Let's make some changes to our classes.

class Person {
    String name;
    Person(String name) {
        this.name = name;
    }
}

class Employee extends Person implements Serializable {
    private static final long serialVersionUID = 1L;
    double salary;
    Employee() {
        super(null);
    }
    Employee(String name) {
        this.name = name;
    }
}

With the new changes above, during deserialization, we will hit java.io.InvalidClassException which complaining no valid constructor. This is because if we specify a constructor with argument(s) for "Person", Java compiler won't creates a public no-arg constructor for "Person" class. This break the prerequisite condition and fail the deserialization process.

When and how to use readObjectNoData method?

A sender serializes an object based on a serializable class which does not extends to a parent. The serialized object then sent to the receiver. The receiver deserializes this object based on the same SUID and same class, but this class now extends to a parent class. This causes an impact on deserialization result. readDataNoObject() is a special method for us to handling this kind of corner case. Below is the Employee class on the sender side.

class Employee implements Serializable {
    private static final long serialVersionUID = 1L;
    private double salary;
    public Employee(double salary) {
        this.salary = salary;
    }
}

By using the class definition above, sender created an "Employee" object with salary value, serialized and persisted this "Employee" object into a employee.ser file. This employee.ser file then pass to the receiver. However, same class on the receiver side has been evolved as below. Every instance of "Person" object on receiver side must associated with an id. "Employee" is an instance of "Person", therefore, the same rule applied to "Employee". Below is the Employee class on the receiver side.

class Employee extends Person implements Serializable { //Employee has a parent now
    private static final long serialVersionUID = 1L;
    private double salary;
    public Employee(String id, double salary) { //require id now
        super(id);
        this.salary = salary;
    }
}

class Person implements Serializable {
    private static final long serialVersionUID = 1L;
    private String id;
    public Person(String id) {
        if (id == null || id.length() == 0) { //enforcement, id is mandatory
            throw new IllegalArgumentException("Must provide id!");
        }
        this.id = id;
    }
}

If receiver simply deserializes employee.ser< as below,

class DeserializeEmployeeApp {
    public static void main(String[] args) throws IOException, ClassNotFoundException {
        try (ObjectInputStream ois
                = new ObjectInputStream(new FileInputStream("employee.ser"))) {
            Employee employee = (Employee) ois.readObject();
            System.out.printf("id: %s, salary: %.0f \n", // print id: null, salary: 1000
                    employee.id, employee.salary);
        }
    }
}

The receiver will get id: null, salary: 1000. This has violated the mandatory id rule. In this case, the best candidate for handling this situation is the parent class because it knows well of the rule it introduced, and how to deal with deserialized object properly. Parent class can do this by implementing readObjectNoData() special method.

If Person class choose to assign a default id for deserialized employee object, then it can add the following method.

private void readObjectNoData() {
    this.id = "TEMP_ID";
}

You may wondering why not just implement a no-arg constructor in Person and assign a default value to id field? This does not help because Person is also a serializable class. Java Serialization runtime won't call its no-arg constructor to constitutes "Person" fields. Instead, the "Person" fields are supposed to be re-constituted straight away from stream data. The is why readObjectNoData() method come in and play the role of constructor during deserialization.

If Person class decided that, no exceptional case for any "Person" instance without id, even a deserialized object, then it can throws java.io.InvalidObjectException in the readObjectNoData() method. In this case, the mandatory id invariant could be maintained. Read Security in Java Serialization to learn more.


private void readObjectNoData() {
    throw new InvalidObjectException("Id must not be null."); 
}

Reference:
http://docs.oracle.com/javase/7/docs/api/java/io/Serializable.html