Java Serialization in Depth
Preface: Java serialization can present both performance and security vulnerabilities when compared to alternatives.
While it's internal serialization algorithm gives you the ability to serialize any object, it should be carefully considered against alternatives like Protocol buffers and Jackson JSON/XML serialization.
For information on how JSON serialization works, check out Java Object Mapper, What it is, how it works.
Java is an object oriented language. You define classes like this:
public class User {
private String name;
private String email;
public void setName(String name){
this.name = name;
}
public String getName(){
return this.name;
}
}
and create objects like this:
User user = new User();
user.setName("Sam");
user.getName(); //Sam
While the Java runtime (JVM) understands this structure, other environments may not. What if you want to save a user to a database? What if you want to write that user to a file or transfer it over a network?
Serialization makes this possible. Serialization saves a Java class as a series of bytes that can be reconstructed back into the original object by another system. The process of writing an object to a series of bytes is serialization. The process of reading those bytes back into object form is deserialization.
This article covers an in depth look at serialization including what it is, how it works, and examples.
What is serialization?
Serialization is the process of converting Java classes to data formats other systems can interpret. When you serialize an object, you write it as a byte sequence that other systems can understand. For example, a database may not know what this means...
new User()
but it sure knows what this means...
0000000 edac 0500 7273 1500 6f63 2e6d 7865 6d61
0000010 6c70 2e65 6564 6f6d 552e 6573 6372 ca34
0000020 73ee 0345 029d 0200 004c 7008 7361 7773
0000030 726f 7464 1200 6a4c 7661 2f61 616c 676e
0000040 532f 7274 6e69 3b67 004c 7508 6573 6e72
0000050 6d61 7165 7e00 0100 7078 7070
000005c
Serialization makes this conversion between objects and byte streams possible.
Serialization isn't specific to Java. Other languages like JavaScript, PhP, etc. have their own serialization mechanisms...
For example, this is how JavaScript deserializes JSON:
var obj = JSON.parse('{ "name":"Sam", "email":"sam@mail.com"}');
Why do we need serialization in Java?
Without serialization, you couldn't transfer data to other systems. For example, a web server couldn't send JSON to a web browser client. A Java entity couldn't be saved to a database without serialization.Serialization makes the transfer of data between Java and the outside world possible. It is the link between the POJO classes you write and their representations in a file, database, or server response.
Serialization makes it possible to save/persist the state of an object. Using serialization, objects can be stored in memory.
Also remember that reading files or other input from outside sources as Java relatable objects (Strings, classes, etc) is possible because of serialization. Specifically deserialization makes it possible to consume data from other systems in language specific formats Java can understand.
Serialization in Java: How it works
The java.io library includes classes for serializing objects. You can serialize objects using the ObjectOutputStream...
FileOutputStream fos = new FileOutputStream("temp.out");
ObjectOutputStream oos = new ObjectOutputStream(fos);
User user = new User();
oos.writeObject(user);
oos.flush();
oos.close();
The writeObject() method starts the serialization process....
1) Metadata associated with the object's class is written as a byte stream. This metadata includes a description of the object's class and any of it's super classes.
2) Data associated with the instance is written to the byte stream. This includes all non-static, non-transient fields.
3) Metadata associated with the object's members is written to the stream. This is the same as step 1 except applied to the object's members rather than the object itself.
4) Data associated with the object's members is written to the stream. This is the same as step 2 except applied to the object's members rather than the object itself.
Notice how the ObjectOutputStream wraps a FileOutputStream. If you aren't familiar with OutputStreams or basic java io, be sure to check out FileReader vs BufferedReader vs Scanner.
You can deserialize objects using the ObjectInputStream...
FileInputStream fis = new FileInputStream("temp.out");
ObjectInputStream oin = new ObjectInputStream(fis);
try {
User user = (User) oin.readObject();
} catch(Exception e) {
//handle exception
}
Notice how the same file is deserialized back into an instance of User using the readObject() method. When Java deserializes data from an input source (such as a file), it reads the metadata written during serialization to understand what type of class to reconstruct.
The Java Serializable Interface
The examples above only work if the User class implements the java.io Serializable interface...
public class User implements Serializable {
Otherwise you will get an exception like:
Exception in thread "main" java.io.NotSerializableException: com.example.demo.User
This is because the Serializable interface marks a class for serialization. The Serializable interface is just a marker. It doesn't actually specify any methods. It simply tells Java "hey this can be serialized".
Only objects that implement Serializable can be written to streams. This includes class members and their implementations. For example:
public class User {
private Address address;
}
If the Address class doesn't implement Serializable then you can't serialize User.
SerialVersionUID
Java associates an ID with each Serializable class at runtime. This ID is used to verify a source/destination is utilizing the same class during serialization.
While Java will automatically associate these IDS if they aren't defined, it is strongly recommended that these values are declared explicitly:
public class User implements Serializable {
private static final long serialVersionUID = 7148561634028749725L;
...
Most IDEs make it easy enough to generate these values. You can also use the serialver utility to create these ID's from the command line.
While Java will auto generate this field if it's missing, it's strongly recommended you add these explicitly. This is because the auto generated values are highly sensitive to class details. These details are small enough to introduce inconsistencies across different Java implementations.
Long story short, declare serialVersionUID on all objects you want to serialize.
Problems with Serialization?
Serialization can have unintended consequences. For example, serialization allows unintended access to non-transient private members. This means if you declare a member private the serialization algorithm still writes the data out to whoever is reading...
Serialization also presents inconsistencies when serialVersionUID isn't explicitly defined. This is because objects are tied to classes. Classes tend to change over time. A serialized object may be unrecognized depending on what has changed.
While Java's serialization algorithm makes it possible to serialize any object, alternatives like Protocol buffers and Jackson's JSON parser can be better for serializing objects.
Java Serialization Example
package com.example.demo;
import java.io.Serializable;
public class User implements Serializable {
private static final long serialVersionUID = 7148561634028749725L;
private String username;
private transient String password;
public void setUserName(String username){
this.username = username;
}
public String getUserName(){
return this.username;
}
public void setPassword(String password){
this.password = password;
}
public String getPassword(){
return this.password;
}
}
package com.example.demo;
import java.io.*;
public class DemoApplication {
public static void main(String[] args) throws IOException {
FileOutputStream fos = new FileOutputStream("temp.out");
ObjectOutputStream oos = new ObjectOutputStream(fos);
User user = new User();
user.setUserName("Sam");
user.setPassword(("password123"));
oos.writeObject(user);
oos.flush();
oos.close();
FileInputStream fis = new FileInputStream("temp.out");
ObjectInputStream oin = new ObjectInputStream(fis);
try {
User userFromFile = (User) oin.readObject();
System.out.println(userFromFile.getUserName()); //Sam
System.out.println(userFromFile.getPassword()); //null because transient field isn't serialized.
} catch(Exception e) {
//handle exception
}
}
}
Notice how the User class implements the Serializable interface. Also notice how a serialVersionUID is explicitly defined for a serializable class.
Notice the use of transient. This keyword excludes the password from serialization. Notice how userFromFile.getPassword() returns null because of this.
See how ObjectOutputStream and ObjectInputStream are used to perform serialization/deserialization on input streams of data (in this case a file).
Conclusion
Serialization is the process of writing objects to byte streams that other systems can understand. Serialization makes it possible to transfer data to other systems like web clients, datastores, and save an object's state in memory.
Serialization isn't specific to Java. It's a more universal process used to transfer and reconstruct data structures in a platform agnostic way. Serialization is important to Java because it translates Java POJOS to entities other systems can understand.