JVM
Li Wei
JVM
Introduction
Learning material: https://lisxpq12rl7.feishu.cn/wiki/ZaKnwhhhmiDu9ekUnRNcv2iNnof
Overview of the JVM
Basic Introduction
JVM stands for Java Virtual Machine. It is a specification that, in essence, is a program that runs on a computer. Its responsibility is to execute Java bytecode files, interacting directly with the operating system (but not directly with hardware). The OS handles the hardware interaction on our behalf.
Core Functions
Interpretation
The execution flow of Java source code is as follows:
It consists of three steps:
- Write a Java source file.
- Compile the source code into a Java bytecode file using the Java compiler (
javaccommand). - Load and run the bytecode file with the Java Virtual Machine, which starts a new process.
The bytecode file contains bytecode instructions that the CPU cannot execute directly. The JVM interprets these bytecode instructions in real time into machine code (assembly), which the computer can run.
Memory Management
Within the JVM, memory for objects, methods, etc., is allocated automatically, and a built‑in garbage collector reclaims objects that are no longer used. This is an advantage over languages like C/C++, where the programmer must manually write code to free objects. Forgetting to delete an object in C/C++ leaves it occupying memory forever. The JVM’s automatic memory management reduces the difficulty of writing correct code.
Just‑In‑Time (JIT) Compilation
Performance analysis
If Java code is run without any optimizations, its performance is generally lower than that of C or C++.
The main reason is that, during execution, the JVM must interpret bytecode into machine code on the fly, a process that may be repeated many times and is therefore inefficient.
C and C++ programs, by contrast, are compiled ahead of time into executables that already contain machine code, so no runtime interpretation is needed, resulting in higher performance.
Why Java chooses the slower approach
The trade‑off is intentional: it enables cross‑platform execution. The same bytecode can run on any platform (operating system + CPU architecture) such as Windows or Linux. Each platform’s JVM interprets the bytecode into the appropriate machine code, fulfilling the “Write Once, Run Anywhere” goal.
Optimization
Since JDK 1.1, JIT compilation has been used to improve performance. When the JVM detects that a method—or even a loop—is “hot” (called very frequently), the JIT compiler optimizes that code and stores the resulting machine code in memory. The next time the code runs, the JVM fetches the compiled machine code directly, skipping interpretation and using the optimized version. This brings Java’s performance close to that of C/C++, and in some scenarios it can even surpass it (see the detailed explanation below).
Structural Components
A rough diagram of the JVM architecture:
(diagram omitted)
A more detailed diagram:
(diagram omitted)
JVM vs. JRE vs. JDK
- JDK (Java SE Development Kit) – the standard development kit that provides all tools and resources needed to compile and run Java programs.
- JRE (Java Runtime Environment) – the runtime environment that interprets and executes Java bytecode.
(illustration omitted)
Architectural Model
Two common instruction‑set architectures are stack‑based and register‑based. They differ in how instructions operate on and store data.
Stack‑Based Instruction Set Architecture
In a stack‑based ISA, operands are stored on a stack data structure, and instructions operate on the top elements of the stack. An instruction typically pops its operands from the stack and pushes the result back. Because the operand access is handled via a stack pointer, the instructions are relatively simple.
Example
Evaluating the expression 2 + 3 could be compiled into the following stack‑based sequence:
push 2
push 3
iadd // add the two top values
push 4
imul // multiply the result by 4
Characteristics
- Simple design and implementation; suitable for resource‑constrained systems.
- Uses zero‑address instructions: the instruction set has no explicit address fields; operands are implicit (usually on the stack or in an accumulator register).
- One‑address instructions exist as well, where an address field specifies the operand.
- No special hardware support required, giving excellent portability across platforms.
Register‑Based Instruction Set Architecture
In a register‑based ISA, operands reside in registers, and instructions operate directly on those registers. This typically requires more registers but can produce more compact and efficient code.
Example
The same expression 2 + 3 might be compiled into a register‑based sequence:
load r1, #2
load r2, #3
add r3, r1, r2 // r3 = r1 + r2
load r4, #4
mul r5, r3, r4 // r5 = r3 * r4
Characteristics
- Requires hardware support; less portable.
- Generally higher performance because register access is faster than memory access.
- Primarily uses one‑, two‑, or three‑address instructions.
Comparison
- Stack‑based ISAs are simpler and well‑suited for virtual machines or limited‑resource environments, but may need more instructions to accomplish the same work because of push/pop overhead.
- Register‑based ISAs are more complex but can manipulate data more directly, often yielding better efficiency.
Java compilers generate stack‑based bytecode because the language was designed for cross‑platform portability; the same bytecode can run on any CPU architecture, which would not be possible with a register‑based design.
Lifecycle
The JVM lifecycle has three phases: Start, Run, and Exit.
Start
- When a Java program is launched, the bootstrap class loader creates an initial class—the class that contains the
mainmethod—serving as the entry point of the JVM instance.
Run
- The
main()method is the program’s entry point; any thread can begin execution here. - Inside the JVM there are two thread types: user threads and daemon threads. The JVM itself uses daemon threads, while
main()and other application threads are user threads. Daemon threads terminate automatically when all user threads finish. - Executing a Java program actually runs a Java Virtual Machine process.
- The JVM has two execution modes: Server and Client.
- Client mode starts quickly but is optimized for short‑lived applications.
- Server mode starts more slowly, uses a heavyweight VM, and applies many optimizations; for long‑running, steady‑state workloads it usually outperforms Client mode, which uses a lightweight VM.
Exit
- The JVM exits only after all user threads have terminated.
- Exit can occur because the program finishes normally, terminates due to an exception or error, or is forced by the operating system.
- A thread may also call
Runtime.halt()orSystem.exit(), provided the security manager permits the operation.
JVM Specification
The Java Virtual Machine Specification defines the requirements for JVM implementations—not for the Java language itself. Consequently, the JVM can execute class files generated by other languages such as Groovy or Scala. The specification, published by Oracle, covers class file format, class and interface loading and initialization, the instruction set, and more.
Official website:
VM Comparison
| Name | Author | Supported Versions | Community Activity (GitHub stars) | Features | Typical Use Cases |
|---|---|---|---|---|---|
| HotSpot (Oracle JDK) | Oracle | All versions | High (closed source) | Widest adoption, stable, active community, JIT support | Default JVM for Oracle JDK |
| HotSpot (OpenJDK) | Oracle | All versions | Medium (16.1k) | Same as above, open source, default for OpenJDK | Projects needing OpenJDK‑based VM |
| GraalVM | Oracle | 11, 17, 19 (Enterprise) | High (18.7k) | Polyglot, high‑performance JIT & AOT | Microservices, cloud‑native, multi‑language |
| Dragonwell (Alibaba) | Alibaba | 8, 11, 17 (standard); 11, 17 (extended) | Low (3.9k) | OpenJDK‑based, performance, bug fixes, security, JWarmup, ElasticHeap, Wisp | E‑commerce, logistics, finance |
| Eclipse OpenJ9 (formerly IBM J9) | IBM | 8, 11, 17, 19, 20 | Low (3.1k) | High performance, scalable JIT & AOT | Microservices, cloud‑native |
Class Loading
Object Memory Access
Storage Layout
The memory layout of an object on the heap consists of several parts:
A Java object’s memory is divided into three sections: Header, Instance Data, and Padding.
Object Header
Ordinary objects have two components:
- Mark word
**Mark Word**– stores runtime data such as hash code, GC generation age, lock state, owning thread, biased lock thread ID, biased lock timestamp, etc. - Class pointer
**Klass Pointer**– points to theInstanceKlassstructure in the method area.
- Mark word
Array objects include an additional field in the header that records the array length (12 bytes total).
Instance Data – the actual fields defined in the class (including inherited fields) are stored here.
Padding – filler bytes used to satisfy alignment requirements. On a 64‑bit system, HotSpot requires the object start address to be a multiple of 8 bytes, so the total object size must be a multiple of 8. If the instance data does not naturally align, padding is added.
32‑bit system example
- An
intoccupies 4 bytes, so anIntegerobject’s size is:int[] arr = new int[10]
Compressed Oops (ordinary object pointers)
On a 64‑bit JVM, class pointers and object references normally occupy 8 bytes. To reduce memory consumption, the JVM can compress these pointers to 4 bytes (the feature is enabled by default and can be disabled with -XX:-UseCompressedOops).
The idea behind pointer compression is to scale the addressing unit. Instead of addressing each byte, the VM addresses 8‑byte chunks. As shown in the figure, a single address now points to an 8‑byte block rather than a single byte, allowing more data to be accessed with a smaller address size.
Implications
- Memory must still be aligned to 8‑byte boundaries, which can cause some space waste (HotSpot always aligns, even without compression).
- The addressable space with compressed pointers is limited to 2³⁵ bytes (≈ 32 GB). If the heap exceeds 32 GB, compression is automatically disabled. Without compression, a 64‑bit pointer can address 2⁶⁴ bytes (≈ 16 EB).
Comparison
- Uncompressed: 8‑byte pointer → 2⁶⁴ possible addresses → 16 EB addressable space.
- Compressed: 4‑byte pointer → 2³² possible addresses, each representing an 8‑byte block → 32 GB addressable space.
Memory Alignment
Alignment also helps avoid false sharing in multi‑threaded environments.
Example
- Suppose cache line contains data for objects A and B.
- When thread writes to A, the entire cache line (including B) is invalidated, forcing a reload of B’s data even though B was not modified.
Solution: Align objects so that different objects never share the same cache line. After alignment, a cache line that is invalidated for A does not affect B.
Alignment requires each object’s size to be a multiple of 8 bytes, which may involve adding padding bytes and reordering fields. In HotSpot, each field’s offset (field address – object start) must be a multiple of the field’s size. For instance, a long id field must start at an offset that is a multiple of 8. Reordering fields to satisfy this is called field reordering and helps keep a field within a single cache line, improving cache‑line utilization.
Actual Size
The MAT (Memory Analyzer Tool) provides a Dominator Tree view that shows the domination relationship among object instances.
Key properties:
- The subtree rooted at object A (all objects dominated by A) is A’s retained set, i.e., the deep heap contributed by A.
- If A dominates B, then A’s immediate dominator also dominates B.
- Edges in the dominator tree do not correspond one‑to‑one with edges in the ordinary object reference graph.
Shallow Heap – the memory occupied only by the object itself, excluding any objects it references.
Example: In JDK 7, a String consists of two int fields (8 bytes), a reference to a char[] (4 bytes on a 32‑bit VM), an 8‑byte header, and padding, totaling 24 bytes. This size is independent of the actual length of the character array.
Retained Set – the set of objects that would be reclaimed together with a given object when the garbage collector frees it.
Deep Heap – the sum of the shallow heap of an object plus the shallow heaps of everything in its retained set. It measures how much memory would be freed if the object were collected. The calculation depends on the retained set.
Actual Object Size – the total shallow‑heap size of an object and all objects reachable from it; this is what we usually refer to as “the size of the object”.
References:
Object Access
When Java code accesses an object via a reference variable, the JVM must resolve that reference to the actual object instance. Two common mechanisms are handle‑based access and direct pointer access.
Handle access: The heap contains a handle pool. A reference stores the address of a handle; the handle, in turn, contains the real addresses of the object’s data and its type information.
- Pros: Objects can be moved (e.g., during GC) without changing every reference; only the handle’s internal pointer needs updating.
- Cons: Each access incurs an extra level of indirection, reducing performance.
Direct pointer access (used by HotSpot): The reference stores the actual object address directly.
- Pros: Faster access because there is no intermediate handle.
- Cons: When the object is moved (e.g., during compacting GC), every reference must be updated, which can be costly.
References:
Object Creation
Lifecycle
An object’s lifecycle in Java consists of several stages:
Created
- Allocate memory for the object.
- Begin construction.
- Initialize static members from super‑class to subclass.
- Initialize superclass instance fields, recursively invoking superclass constructors.
- Initialize subclass instance fields and invoke subclass constructor.
In Use – the object is held by at least one strong reference.
Invisible – program execution has moved beyond the object’s scope; no strong references remain.
Unreachable – the object is not reachable from any GC root (no strong references).
Collected
- The garbage collector prepares to reclaim the object’s memory.
- If the object overrides
finalize(), that method is invoked before reclamation.
Finalized – (the
finalizemethod has completed).
(content truncated)
Originally written by Li Wei (李唯_) and published in Chinese on 后端技术栈全书 (Full-Stack Backend Engineering). Translated and adapted for DriftSeas with permission.