Amazon

Thursday, August 18, 2011

Java Serialization

Java has no direct way of writing the Object to a File or Sending it to a communication channel, to achieve this , there is a scheme called Serialization [ Specification ].

Lets take a dive into what it is:

The Default Way:

java.io package has an Interface named Serializable, which actually contains no method. Now an Interface with no methods can only be called Marker's :).. so Serializable Interface is a marker interface. By implementing the Serializable Interface we Simply inform JVM that the Class implementing this interface can be Serialized or In Simpler words can be persisted.

Observation 1: A class to be able to be serialized  should implement Serializable either directly or from its Object hierarchy.

Example:

package com.abc.tutor;

import java.io.Serializable;
public class PersistMe implements Serializable{
private static final long serialVersionUID = 1L;
}

Observation 2: All the derived classes of a Class implementing Serializable interface are by Default Serializable.

Example:

package com.abc.tutor;

public class PersistMe2 extends PersistMe{
private static final long serialVersionUID = 0L;
}

How to Persist:

The same java.io package has a class named ObjectOutputStream [java.io.ObjectOutputStream]  which can be used t write to a file stream or a network stream. This Class [ObjectOutputStream]  contains a method named writeObject(), which actually writes to the file or network stream.

So if we see the constructor of ObjectOutputStream class they are:

ObjectOutputStream() - Default one.
ObjectOutputStream(OutputStream out)  - Takes a parameter of OutputStream Class, which is an abstract class. Which means we can pass any concrete implementation of this class to the constructor.  The available direct known subclasses are

ByteArrayOutputStream, FileOutputStream, FilterOutputStream,
ObjectOutputStream, OutputStream, PipedOutputStream.

Lets take an example where we try to write our object to a file:

The file we will be persisting will be:

package com.abc.tutor;

import java.io.Serializable;
public class PersistMe implements Serializable{
private static final long serialVersionUID = 1L;
private String name;
public PersistMe(String name){
this.name = name;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}

and the class which will try persisting the object can be:

package com.abc.tutor;

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
public class DoIt {
public static void main(String arg[]) {
String fileName = "myFile.ser";
PersistMe pme = new PersistMe("Mayank Awasthi");
FileOutputStream fos = null;
ObjectOutputStream out = null;
try {
fos = new FileOutputStream(fileName);
out = new ObjectOutputStream(fos);
out.writeObject(pme); // This method Serializes the Object PersistMe
out.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}

When we finish running this file. We will see that a file is created with name "myFile.ser" which contains the values of my object variables. So, by this way I was able to store the state of my object to a File.

How to get back the persisted Object:

We have a class named ObjectInputStream in the same java.io package [java.io.ObjectInputStream], which can be used to read the persisted object. The method readObject() reads the bytes persisted and creates a live object that is just a replica of the original object.

So if we see the constructor of ObjectInputStream class they are:
ObjectInputStream() - The default one.
ObjectInputStream(InputStream in) - Takes a parameter of InputStream Class, which is an abstract class. Which means we can pass any concrete implementation of this class to the constructor.  The available direct known subclasses are
AudioInputStream, ByteArrayInputStream, FileInputStream, FilterInputStream, InputStream,
ObjectInputStream, PipedInputStream, SequenceInputStream, StringBufferInputStream

Lets read the file we already persisted as "myFile.ser" in below example:


package com.abc.tutor;

import java.io.FileInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
public class GetItBack {
    public static void main(String arg[]) {
        String fileName = "myFile.ser";
        PersistMe pem = null;
        FileInputStream fis = null;
        ObjectInputStream ois = null;
        try {
            fis = new FileInputStream(fileName);
            ois = new ObjectInputStream(fis);
            pem = (PersistMe) ois.readObject();// This method restored PersistMe
            ois.close();
            System.out.print(pem.getName());
        } catch (IOException ex) {
            ex.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        }
    }
}

Here is you see the output of the program, you will see that it prints: Mayank Awasthi. Which was what we persisted :)....WOW !!!

Observation 3:  readObject() method just reads the raw bytes, so casting it to correct Object type is required.

Observation 4: The class file should be there in classpath else it will throw ClassNotFoundException.

Observation 5: On Observation 4, it clearly depicts that the persistence was only for the state of the Object not methods and variables, else we would have not been required to Cast the Object.

Note: While restoration using readObject() method as we see in the example above, we just cast the object. which leads us to another Observation. [Remember: Constructor is always called when we create an Object using new keyword]

Observation 6: The readObject() method never invokes the Constructor.
So in the above case the the constructor public PersistMe(String name) is not called when the Object state is restored.

Transient Variables:

Sometimes there are requirements where it does not make sense to persist a variable value, like variable of a Thread Class or a Socket Class or any other variable. In that case there is a way to tell the JVM to exclude these variables from being persisted.
To make such variables non-persistent you just need to mark them as transient.
So in the above example if we mark variable name as transient [private transient String name;] , the name value will not be persisted.

Observation 7: All transient variables in a class are not persisted.

My Own Implementation:

There is a way to actually have the JVM call methods readObject and writeObject implemented as private methods in your class. So you need to implement these methods in your class which is getting persisted.
private void writeObject(ObjectOutputStream out) throws IOException;
private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException;

Now where to use these? ...

Well, say you want to do the job which your constructor was doing when you create a new Object, so in readObject, you can actually do the operations which you do in constructor when you actually make object using new.

Another place would be to say stop the serialization process in the subclass. So, there may be  a case where the superclass is Serializable but we dont want our Subclass to be Serializable. So we can work this out by actually implementing the writeObject and readObject methods and throwing NotSerializableException.

private void readObject(ObjectInputStream in) throws IOException
{
throw new NotSerializableException();
}

private void writeObject(ObjectOutputStream out) throws IOException
{
throw new NotSerializableException();
}

Traps:

1) Caching: Serialize an object state once and then try again serializing it using the same instance of ObjectOutputStream, you will be surprised to see that the Serialized object still hold the old values. You need to call reset() method to actually make the new state of the object cached.

2) Versioning: When a class is compiled by the java compiler, the compiler adds a version to the class, which once serialized can only be serialized with the same version of the class. So it is always advisable to define variable "serialVersionUID" with a specific value is your class to prevent compiler creating new version number for the same class be compiled again.

Tips:

1) Cloning: While cloning it is required to take care of Objects which should be deep cloned, for which you need to write code to actually do the deep copy operation. Whereas if we Serialize the object and then de-serialize it we have the deep copy clone automatically.

2) Make Unwanted or Derived variables Transient: You should always make the unwanted variable or the variables which can be derived from the data serialized "transient" to have smaller serialized object footprint and make the process faster.

Summary
Java Serialization is powerful and useful tool but should be carefully used.