Rocket Job. Ruby's missing batch system

Directory Monitoring

A common task with many batch processing systems is to look for the appearance of new files and queue jobs to process them. DirmonJob is a job designed to do this task.

DirmonJob runs every 5 minutes by default, looking for new files that have appeared based on configured entries called DirmonEntry. These entries can be managed programmatically, or via Rocket Job Web Interface, the web management interface for Rocket Job.

Example, creating a DirmonEntry

entry = RocketJob::DirmonEntry.create!(
  pattern:           '/path_to_monitor/*',
  job_class_name:    'MyFileProcessJob',
  archive_directory: '/exports/archive'
)

When a Dirmon entry is created it is initially disabled and needs to be enabled before DirmonJob will start processing it:

entry.enable!

Active dirmon entries can also be disabled:

entry.disable!

The attributes of DirmonEntry:

pattern
- Wildcard path to search for files in. For details on valid path values, see: http://ruby-doc.org/core-2.2.2/Dir.html#method-c-glob
- Examples:
  - input_files/process1/.csv
  - input_files/process2/*/
job_class_name
- Name of the job to start
arguments
- Any user supplied arguments for the method invocation All keys must be UTF-8 strings. The values can be any valid BSON type:
  - Integer
  - Float
  - Time (UTC)
  - String (UTF-8)
  - Array
  - Hash
  - True
  - False
  - Mongoid::StringifiedSymbol
  - nil
  - Regular Expression
  - Note: Date is not supported, convert it to a UTC time
properties
- Any job properties to set.
  - Example, override the default job priority:

{ priority: 45 }

archive_directory
- Archive directory to move the file to before the job is started. It is important to move the file before it is processed so that it is not picked up again for processing. If no archive_directory is supplied the file will be moved to a folder called ‘_archive’ in the same folder as the file itself. If the path above is a relative path the relative path structure will be maintained when the file is moved to the archive path.

Starting the directory monitor

The directory monitor job only needs to be started once per installation by running the following code:

RocketJob::Jobs::DirmonJob.create!

Dirmon Job is a scheduled job which is set to run every 5 minutes. Once created, its cron_schedule can be changed at any time via the Rocket Job Web Interface (RJMC).

For example, to override the cron schedule when creating Dirmon Job:

RocketJob::Jobs::DirmonJob.create!(cron_schedule: "*/1 * * * * UTC")

The default priority for DirmonJob is 40, to increase it’s priority:

RocketJob::Jobs::DirmonJob.create!(
  cron_schedule: "*/5 * * * * UTC",
  priority:      25
)

Once DirmonJob has been started it’s priority and check interval can be changed at any time as follows:

RocketJob::Jobs::DirmonJob.first.update_attributes(
  cron_schedule: "*/5 * * * * UTC",
  priority:      20
)

High Availability

The DirmonJob will automatically re-schedule a new instance of itself to run in the future after it completes each scan/run. If successful the current job instance will destroy itself.

In this way it avoids having a single Directory Monitor process that constantly sits there monitoring folders for changes. More importantly it avoids a single point of failure that is typical for earlier directory monitoring solutions. Every time DirmonJob runs and scans the paths for new files it could be running on a different worker. If any worker is removed or shutdown it will not stop DirmonJob since it will just run on another worker instance.

There can only be one DirmonJob instance queued or running at a time.

If an exception occurs while running DirmonJob, a failed job instance will remain in the job list for problem determination. The failed job cannot be restarted and should be destroyed when no longer needed.