Home

Distributed Scheduling

Li

Li Wei

December 27, 20258 min read

Distributed Scheduling

Overview

Distributed task scheduling refers to the process of allocating and managing tasks within a distributed system. This scheduling mechanism allows the system to disperse tasks across multiple nodes according to varying demands and resource conditions, thereby enhancing processing capacity, reliability, and scalability.

Implementation frameworks

Standalone

  • Timer – the built‑in java.util.Timer class. It lets you schedule a java.util.TimerTask. This approach lets your program run at a fixed frequency, but you cannot specify an exact start time. It is rarely used.
  • ScheduledExecutorService – also provided by the JDK; it is a thread‑pool‑based scheduler. Each scheduled task is assigned to a thread from the pool, so tasks run concurrently and do not interfere with each other.
  • Spring Task – the task framework bundled with Spring since version 3.0. It is easy to configure and feature‑rich; if you are running a single‑machine setup, consider using Spring’s scheduler first.

Distributed

  • Quartz – the de‑facto standard for Java scheduled jobs. Quartz focuses on timing rather than data processing, so it lacks a workflow tailored to data‑driven tasks. Although Quartz can achieve high availability via a database, it does not provide distributed parallel scheduling.
  • TBSchedule – an early open‑source distributed scheduler from Alibaba. The code is somewhat outdated and uses Timer instead of a thread pool, which is known to have shortcomings when handling exceptions. Its job type is limited to a simple fetch‑process pattern, and documentation is severely lacking.
  • elastic‑job – a flexible distributed scheduler developed by Dangdang (a Chinese e‑commerce platform). It offers many powerful features, uses ZooKeeper for coordination, supports high availability, sharding, and cloud deployment.
  • Saturn – a distributed scheduling platform built by Vipshop (a Chinese online retailer). It is based on elastic‑job v1 and can be easily deployed in Docker containers.
  • xxl‑job – a lightweight distributed scheduling framework released in 2015 by Xu Xueli, a Meituan‑Dianping employee. Its core goals are rapid development, simple learning curve, lightweight footprint, and easy extensibility. Within Vipshop it runs on more than 350 nodes, handling over 40 million scheduled executions per day. Management and statistics are also strong points.

Comparison diagram:

Comparison chart

XXL‑JOB

Basic Introduction

The xxl‑job framework is primarily used for handling distributed scheduled tasks. It consists of a scheduler (admin console) and executors.

  • Scheduling module (admin console):

    • Manages scheduling information and issues scheduling requests according to configuration; it does not contain business code. Decoupling the scheduler from the tasks improves system availability and stability, and the scheduler’s performance is no longer limited by the task modules.
    • Provides a visual, simple, and dynamic UI for creating, updating, deleting tasks, GLUE development, and alerts. All changes take effect immediately. It also supports monitoring of execution results and logs, as well as executor failover.
  • Execution module (executor):

    • Receives scheduling requests and runs the task logic. The task module focuses solely on execution, making development and maintenance easier and more efficient.
    • Handles execution, termination, and log requests from the scheduler.

Official documentation:

The scheduler and executors are deployed separately and communicate via RPC. The scheduler serves as a platform that manages scheduling data and sends execution requests, while the executors run the actual business logic.

Specific Operations

  • Environment preparation

  • Environment requirements

    • Maven 3+
    • JDK 1.8+
    • MySQL 5.7+
  • Source code download (repository: https://github.com/xuxueli/xxl-job)

    Directory layout:

    • doc – documentation
    • xxl-job-admin – scheduler (admin console) source code
    • xxl-job-core – common JAR dependencies
    • xxl-job-executor-samples – sample executor project (you can develop directly here or adapt an existing project into an executor)
  • Initialize the database

    • Run the SQL script doc/db/tables_xxl_job.sql to create the database and tables.

    Table descriptions:

    • xxl_job_group – executor information table; maintains executor instances.
    • xxl_job_info – scheduling metadata table; stores job group, name, address, executor, parameters, alert email, etc.
    • xxl_job_lock – job scheduling lock table.
    • xxl_job_log – scheduling log table; records history such as scheduling result, execution result, parameters, machine, executor, etc.
    • xxl_job_log_report – log reporting table; used by the admin console’s reporting pages.
    • xxl_job_logglue – GLUE log table; keeps GLUE update history for version rollback.
    • xxl_job_registry – executor registry; maintains online executor and scheduler address information.
    • xxl_job_user – system user table.
  • Scheduler configuration

    • Configuration file: src/main/resources/application.properties
    • (Details omitted for brevity)
  • Deploy the scheduler

    • Run xxl-job-admin.
    • Access URL: http://localhost:8080/xxl-job-admin
      Default login: admin / 123456. After logging in, the interface looks like this:
  • Executor configuration

    • Dependency import (details omitted)

    • Configuration notes
      XxlJobConfig in the executor generates an XxlJobSpringExecutor based on the properties file.

    • Bean mode (jobHandler defined on the executor side; tasks are matched to handlers via the JobHandler attribute) – create a new task:

      • Class mode: One Java class per task
        Extend the abstract class IJobHandler and implement execute(). IJobHandler also defines init() and destroy().

        public class MyJobHandler extends IJobHandler {
            @Override
            public void execute() throws Exception {
                // business logic here
            }
        }
        

        Register the handler with ``XxlJobExecutor.registJobHandler("JobHandler name", IJobHandler instance).

      • Add the executor instance in the admin console.

      • In the task management module, set the Cron expression, routing strategy, etc.

      • Start both the admin module and the executor module. In this example two executor instances are configured with a round‑robin routing strategy, resulting in the following output:

    • Method mode (tasks are defined as Spring beans) – one method per task
      Annotate the job method with @XxlJob(value=“自定义jobhandler名称”, init = “JobHandler初始化方法”, destroy = “JobHandler销毁方法”). The annotation’s value must match the JobHandler name created in the admin console.

      • Add the executor instance in the admin console.
      • Add the scheduled task in the admin console’s task management page.
      • Start the admin and executor modules.

Note:
The executor’s callback address must be consistent with xxl.job.admin.addresses; the executor uses this address for heartbeat registration and task result callbacks. Leaving it empty disables automatic registration.
All executors in the same cluster must share the same xxl.job.executor.appname value; the registry uses this to discover online executor instances for each cluster.

Adding a New Task

In a distributed scheduling system, the scheduler (Scheduler) manages and distributes tasks. The following concepts are key when adding a task:

  • Routing strategies (available when executors are clustered):

    • FIRST – always selects the first machine in the cluster.
    • LAST – always selects the last machine.
    • ROUND – round‑robin distribution across available nodes for load balancing.
    • RANDOM – randomly picks an online machine.
    • CONSISTENT_HASH – uses consistent hashing to bind a task to a specific machine, ensuring even distribution.
    • LEAST_FREQUENTLY_USED – selects the machine with the lowest usage frequency.
    • LEAST_RECENTLY_USED – selects the machine that has been idle the longest.
    • FAILOVER – checks heartbeats in order; the first responsive machine becomes the target executor.
    • BUSYOVER – checks for idle status in order; the first idle machine becomes the target executor.
    • SHARDING_BROADCAST – broadcasts the task to all machines in the cluster, automatically passing shard parameters. Useful for operations that must run on every node (e.g., data synchronization).
  • Task timeout – customizable per task; if a task exceeds its timeout it will be forcibly interrupted.

  • Failure retry count – customizable; when a task fails, it will automatically retry up to the configured number of times.

Execution Principle

Execution flow

  1. Executor registration and discovery

    • Involves two tables:
      • xxl_job_registry – executor instance table, storing instance info and heartbeat data.
      • xxl_job_group – list of registered service instances.
    • Each executor runs a thread that every 30 seconds contacts the registry xxl_job_registry to update its heartbeat. The scheduler runs a thread that every 30 seconds checks xxl_job_registry; any instance without a heartbeat for more than 90 seconds is removed from xxl_job_registry, and the service instance list xxl_job_group is refreshed.
  2. Scheduler invokes executors

    • The scheduler continuously loops to dispatch jobs.

    • Auto‑commit is disabled.

    • A MySQL pessimistic lock is used so that other transactions cannot interfere.

    • The scheduler reads xxl_job_info (the table that stores scheduled‑task metadata). This table contains a trigger_next_time column indicating the next trigger time. The scheduler fetches tasks whose next trigger time falls within the next 5 seconds and handles them in three ways:

      1. If currentTime – nextTriggerTime > 5 s, skip execution and reset trigger_next_time.
      2. If the next trigger time is within the window, spawn a thread to process the trigger logic and update the next trigger time based on the current time.
      3. If the new next trigger time is still in the future, update it accordingly.
    • For tasks whose next trigger time is after the current time, they are placed into a timing wheel; the wheel updates the next trigger time as needed.

    • The transaction is then committed, releasing the exclusive lock.

  3. Executor actions

    • Upon receiving a scheduling request, the executor puts the request into the appropriate task’s waiting queue.
    • Executor worker threads pull scheduling info from the queue, run the business logic, and place the result into a shared result queue (each task has its own worker thread and queue).
    • A dedicated callback thread periodically batches results from the result queue and notifies the scheduler.

Originally written by Li Wei (李唯_) and published in Chinese on 后端技术栈全书 (Full-Stack Backend Engineering). Translated and adapted for DriftSeas with permission.

Keep reading

More related articles from DriftSeas.