10 Aug / 2010

Ruby on Rails CRONjobs

Here are some tips about
setting up cronjobs in rails app.

My rails applications use cronjobs fequently.  I use them
to

  • refresh data from Internet : for example download book
    cover images
  • going through logs and producing a summary like ‘most
    viewed books today’,  ‘most favorited items ‘ ..etc(number of user_logs are too many to efficiently go through and
    calculate stats dynamically.  That is why I use ‘summary
    tables’)

Running Rails code in command line

Here is a ruby script, that access ‘Book’ model,

Book.find(:all).each do |book|
   book.update_stats
end

For this to work, this script needs ‘Rails Environment’. We can make any script
to have full rails environment (access to models ..etc).  We
run the ruby script ‘script/runner’
From your Rails directory:

./script/runner  myscripts/update-stats.rb

‘script/runner’  will read appropriate configuration files
and provide a rails environment for the script.

By default it is ‘development’.  To use another
environment

  ./script/runner  -e staging myscripts/update-stats.rb

Preventing Scripts Stepping Over Each Other

Some Scripts
run once a day, some run once an hour.  Some are run every
few minutes.  Cron process will kick off the script at the
specified intervals.  Lets say we run the ‘update-stats.rb’
every 5 minutes.  What happens if one run of the runs takes
longer than 5 minutes (because say the database is slow or there
is just too much data to consume).   Another instance
of the script is kicked off in the next 5 minutes, before the
first one had a chance to shutdown.   This can have
unintended consequences, like incorrect stats …etc.

We don’t want this to happen.  We want to prevent a script
running if another instances of the same script is running.

  • We can space the runs sufficiently apart.   For
    example if a script runs once a day and takes a few hours, it
    is highly unlikely it will run into the next run.
  • A script can set a flag in the database that it is
    running.  At the start of the script it checks the
    flag.  If it is set, script refuses to run.But this is not reliable, if the script died in the middle
    (exception / out of memory) the flag would still be set and no
    future runs is possible
  • A script can write a file to the file system
    (‘script_A_IN_PROGRESS’). But this also suffers the same fate
    that if the script died un-expectedly, the file will be
    ORPHANED, and no script will run untill the file is manually
    removed

So what we need is a SIMPLE, FOOL-PROOF way to indicate a
script is running.

Enter FILE_LOCKS.

On most UNIX/LINUX systems,  a process can LOCK a
file.  This lock can be a shared lock or exclusive
lock.  In case of exclusive lock, the file stays locked
untill the process terminates.  Duing that time, no other
process can get an exclusive lock on the file.   The
beauty is even if the process has died (of an exception, memory
exhaustion),  the lock is released.  The operating
system takes care of it.

So here is the process:

  • when a process starts up, it tries to acquire an exclusive
    lock on a file
  • if locking is successful, go ahead
  • if not, exit

Here is how to do this:   It is a ONE LINER, that
is SO SIMPLE 🙂

if File.new(__FILE__).flock(File::LOCK_EX | File::LOCK_NB) == false
  puts "*** can't lock file, another instance of script running?  exiting"
  exit 1
end

# the rest of the script
...

Parameters explained

  • __FILE__ : The process tries get an exclusive lock on itself.
  • File:LOCK_EX – obtain exclusive lock
  • FIle::LOCK_NB – non blocking. IF the lock is not acquirable, return rather the blocking

 

Keeping Tabs on CRONTABs

The beauty of CRON jobs is, once
you set them up correctly, you can forget about them.  They
dutifully run at specified times and do their jobs.  It is
still a good idea to produce some logging, so once in a while,
you can check that the job is alive and well.

Here is some simple logging.  It prints out the timestamp
that the job is starting and finishing.  And prints out how
much time it took.  We print out some ‘====’ so we can
seperate the outputs from each run

## run this with script/runner

t1 = Time.now
puts "================"
puts Time.now.strftime("%Y%m%d-%H%M%S") + " : " + __FILE__ + " starting..."

if File.new(__FILE__).flock(File::LOCK_EX | File::LOCK_NB) == false
  puts "*** can't lock file, another instance of script running?  exiting"
  exit 1
end

# do the processing...
# ...

t2 = Time.now
puts Time.now.strftime("%Y%m%d-%H%M%S") + " : " + __FILE__ + " finished  #{t2 - t1} secs"
puts "================"


This is the template I use for my cronjobs

 

Wrapping them in a SHELL script

I usually like to wrap
the ruby scripts in a shell script,  because it saves me
some typing 🙂

Here is a sample script:

#!/bin/bash

## lives in RAILS_APP/myscripts

if [ -z "$1" ] 
then
    echo "usage : $0  "
    echo "eg : $0  staging"
    exit -1
fi
environment=$1

mydir=$(readlink -f $(dirname  $0))
scriptdir=$(readlink -f "$mydir/../script")
## get RAILS_APP/script

$scriptdir/runner -e "$environment" $mydir/update-stats.rb

See how it figures out the script directory?   This
way, you can run this script from any directory and it will work
correctly.  It doesn’t depend on you to run the script from
the rails application directory.   Make sure you have
an uptodate version of ‘readlink’  we use.  (Linux
distros come with the right one,  Mac-OSX  users can
get it from mac-ports)

 

Setting up the CRON

Before setting up the CRON job, make sure the script runs.

  • make sure shell script is executable (chmod 755 script.sh )
  • If you are using a shell script, run it to make sure it
    works.
  • Also run it from a different directory other than
    Rails application dir.

 

$ crontab -e

will open your editor to enter cronjobs

Here is couple of entries:

  # run every 5 minutes
  */5 * * * *   /var/www/myapp_prod/current/myscripts/update-stats.sh production >> /var/www/myapp_prod/update-stats.log 2>&1

  # runs at 1am
  0 1 * * *   /var/www/myapp_prod/current/db-backup.sh

Note I love the short-hand syntax (*/5).  It means every 5
minutes.  Beats writing

  0,5,10,15,20,25,30,35,49,45,50,55 * * * * /var/www/myapp_prod/current/myscripts/update-stats.sh production >> /var/www/myapp_prod/update-stats.log 2>&1

Also note I redirect output a log file, so I can check it later
if required.  Also I am sending stderr and stdout  to
the same file (2>&1)
Thats is it..

Sujee Maniyam
Sujee is a founder, principal at Elephant Scale where he provides consulting and training on Big Data technologies

1 Comment:



Leave a Reply



Copyright 2015 Sujee Maniyam (