Diagnosing frequent FullGC caused by AviatorEvaluator class loading leak

Title: Investigating Frequent Full GCs Caused by AviatorEvaluator Class‑Loader Memory Leaks

The purpose of a bug post‑mortem is to identify the root cause, capture viable solutions, and summarize lessons learned. It should be documented in a way that is reproducible, shareable, and reviewable, helping individuals and teams avoid repeated pitfalls, improve issue‑resolution efficiency, and provide traceable references for future knowledge accumulation and improvement.
Reproducible: includes step‑by‑step instructions so others can run it.
Shareable: written like a tutorial article so others can understand it.
Reviewable: serves as study notes for quick future reference.

Background

Date: September 15 2025

Project/Module: Ranking scheduled tasks

Scenario: Three scheduled tasks were written. After the last one was started, its execution time was unusually long. Investigation traced the slowdown to frequent Full GCs.

Symptom Investigation

The service was deployed in the Beta environment. After starting the scheduled task, the execution time was excessively long; normally a scheduled task should finish within a minute.

Based on this observation, the initial hypothesis was that the task’s workload was huge: each run queried 18 k live rooms, and the tagging process for each room involved two RPC calls. Each task took about 800 ms, the thread‑pool was tiny, and the rejection policy CallerRuns forced the caller to keep executing tasks, further extending the duration.

Monitoring confirmed this suspicion, so the thread‑pool parameters were tuned. The metrics looked better, but the total execution time remained long.

We then consulted the scheduled‑task support team, who suggested GC‑related slowness.

Checking the JVM dashboard revealed very frequent Full GCs, many lasting over 500 ms—clearly a serious issue.

In the lower‑left corner of the chart there is a Metaspace GC threshold duration metric that we had initially missed (no instrumentation).

From the graph it seemed memory wasn’t continuously growing, so a leak was unlikely. My first guess was that the data volume was too large or that RPC latency was causing excessive memory pressure, but that was quickly ruled out: 18 k records is tiny (tens of megabytes), and the same API had been called by other scheduled jobs without issue. Hence a large queue does not equal high memory usage.

The next step was to dump the heap for deeper analysis.

The dominator tree showed that the top‑ranked objects were class loaders, which puzzled me. Lacking experience, I didn’t know how else to pinpoint the root cause, so I also inspected other high‑retained‑size objects but still found no clues.

At the same time I generated a GC log:

Each Full GC was triggered by Metaspace exhaustion. This was odd because Metaspace was configured to 512 MB—how could it run out?

Returning to Arthas, I saw Metaspace usage hovering around 80 %.

I reminded myself what Metaspace stores: class metadata, static fields, and class loaders. Following an AI‑generated hint, I investigated the class loader and discovered that each instance of the Aviator expression engine creates a unique Java class (the class name includes a random suffix such as Expr_abc123) and loads it with a custom class loader (e.g., AviatorClassLoader).

After fixing the issue, the dashboard showed no more Full GCs:

Arthas also reported that the number of class‑loader instances returned to the expected level.

Why didn’t the problem appear in production?

Sampling a few production machines revealed that, although Full GCs were absent, Metaspace and class‑loader usage exhibited the same pattern.

Analyzing the scheduled‑task execution pattern and environment differences explained this: the production environment runs on 52 machines—far more than the test environment. In test, the task runs every minute; in production it runs every five minutes, but it is idempotent within an hour, meaning getCalculateScore is invoked only once per hour per machine. This spreads the Metaspace pressure across many instances.

Indeed, two hours after the investigation began, one production machine did experience a Full GC caused by Metaspace shortage.

GC log excerpt:

(log omitted)

Root‑Cause Analysis

Root cause: The Aviator expression engine was compiling expressions without enabling caching, repeatedly creating identical class loaders and generated classes, which eventually exhausted Metaspace and triggered Full GCs.

Mechanisms Involved

JVM Metaspace and class‑loading basics:
Since Java 8, the JVM stores class metadata in native‑memory Metaspace instead of the permanent generation. When Metaspace runs low, the JVM performs a Full GC to try to reclaim unused class data, but this process is often costly and only partially effective.

Class loaders play a key role: each loader maintains its own set of loaded classes. Only when a class loader becomes unreachable can its classes be reclaimed. Hence, a class‑loader leak directly leads to Metaspace pressure—a large number of loader instances prevents class unloading.
AviatorEvaluator’s dynamic compilation:
AviatorEvaluator is a high‑performance expression engine that compiles string expressions into Java bytecode, achieving near‑native execution speed. When AviatorEvaluator.compile(expression) is invoked, the engine:
1. Parses the expression into an AST.
2. Transforms the AST into Java bytecode, generating a concrete Java class that implements a specific interface.
3. Instantiates a dedicated AviatorClassLoader to load this generated class.
4. Returns a CompiledExpression wrapping the new class instance.
This design is clever because each compiled expression becomes a real Java class, eliminating interpretation overhead. However, it also creates a potential memory‑leak scenario—each compilation produces a new class and a new class loader.
How the problem manifested:
In the business logic, a scheduled task iterates over all live rooms and, for each host, calculates a shop score. The calculation calls getCalculateScore, which internally uses AviatorEvaluator.compile(computeRule).execute(paramMap) to evaluate a dynamic expression.

Although computeRule only has a few distinct values, the code calls the compile method on every iteration. Aviator treats each call as a brand‑new compilation request, unable to recognize that the expression has already been compiled. Consequently, it repeats the full compilation pipeline: new AST → new bytecode → new class loader → new class.

Repeated executions of the scheduled task cause this to happen thousands of times. Each compilation leaves a class loader and its generated class in Metaspace, and Young GC cannot reclaim them. Over time, Metaspace fills up, prompting frequent Full GCs.

Reproduction Steps

Run the task with expression caching disabled and observe GC activity in the console.
Enable caching by changing line 54 to double result = (double) AviatorEvaluator.compile(computeRule, **true** ).execute(paramMap);; Full GCs should disappear.

To see class‑loading events more clearly, you can add the following JVM options:

-verbose:class – prints class load and unload events.
-XX:+TraceClassLoading – prints class‑load information only.
-XX:+TraceClassUnloading – prints class‑unload information only.

Lessons Learned

Insufficient investigation experience: I focused narrowly on Full GC counts/durations and ignored other memory‑space metrics, missing correlations that could have led to a quicker diagnosis.
Tool proficiency gaps: I was not fully comfortable with Arthas commands and features.
Over‑reliance on production code: Assuming that code working in production guarantees correctness led me to overlook environment‑specific factors.

Overall, the debugging journey was winding. My limited experience narrowed the analysis scope, resulting in a lot of wasted effort. Nevertheless, the solo investigation taught me valuable skills—up up!

Debugging Timeline (summary)

Three scheduled tasks were written; after the last one started, execution time ballooned.
The task tags each live room, uses multithreading, and each sub‑task makes two RPC calls. The thread‑pool appeared saturated, but RPC latency was only 30‑40 ms, so it wasn’t the bottleneck.
The rejection policy CallerRuns made me suspect task overload; I switched to abort and logged warnings, but the slowness persisted.
Support suggested pod health issues (network/GC). JVM monitoring showed frequent Full GCs (~1 s each).
Heap dump revealed LaunchedUrlClassLoader dominating the dominator tree, though the task code didn’t reference it. GC logs confirmed Metaspace‑shortage‑driven Full GCs.
Turning off the task didn’t stop Full GCs, so I disabled other tasks one by one, eventually pinpointing a shop‑score calculation task.
Arthas showed Metaspace usage around 80 % while heap usage was only ~20 %. Using the classloader command, I discovered that AviatorClassLoader instances matched the number of generated classes one‑to‑one.
The root cause was identified: the expression engine compiled without caching, causing endless class‑loader creation. Enabling caching eliminated Full GCs.
I wondered why this didn’t affect production. Monitoring the past 7 days showed no Full GCs online. Sampling a few production nodes revealed the same Metaspace pattern but less severe. Production runs the task every 5 minutes with hourly idempotency, spreading the load across ~50 machines, whereas the test environment runs it every minute on only a few machines. This distribution mitigated the pressure.
Two hours later, a production node did experience a Full GC due to Metaspace, confirming the analysis. After deploying the fix, class‑loader counts and memory usage returned to normal.

References

https://juejin.cn/post/7280429214594514999
AI‑driven troubleshooting approach:

Originally written by Li Wei (李唯_) and published in Chinese on 后端技术栈全书 (Full-Stack Backend Engineering). Translated and adapted for DriftSeas with permission.

Diagnosing frequent FullGC caused by AviatorEvaluator class loading leak

Title: Investigating Frequent Full GCs Caused by AviatorEvaluator Class‑Loader Memory Leaks

Background

Symptom Investigation

Why didn’t the problem appear in production?

Root‑Cause Analysis

Mechanisms Involved

Reproduction Steps

Lessons Learned

Debugging Timeline (summary)

References

Sources & References

Keep reading

The Agent Economy: How RunbookHermes Is Reshaping Personal Productivity

Browser Agents Explained: How FinGPT Drives a Web Browser Autonomously

How ChatGPT Turns Market Data into Trading Signals in Real Time