10 Aug / 2010
Ruby on Rails CRONjobs
Here are some tips about
setting up cronjobs in rails app.
My rails applications use cronjobs fequently. I use them
to
- refresh data from Internet : for example download book
cover images - going through logs and producing a summary like ‘most
viewed books today’, ‘most favorited items ‘ ..etc(number of user_logs are too many to efficiently go through and
calculate stats dynamically. That is why I use ‘summary
tables’)
Running Rails code in command line
Here is a ruby script, that access ‘Book’ model,
Book.find(:all).each do |book| book.update_stats end
For this to work, this script needs ‘Rails Environment’. We can make any script
to have full rails environment (access to models ..etc). We
run the ruby script ‘script/runner’
From your Rails directory:
./script/runner myscripts/update-stats.rb
‘script/runner’ will read appropriate configuration files
and provide a rails environment for the script.
By default it is ‘development’. To use another
environment
./script/runner -e staging myscripts/update-stats.rb
Preventing Scripts Stepping Over Each Other
Some Scripts
run once a day, some run once an hour. Some are run every
few minutes. Cron process will kick off the script at the
specified intervals. Lets say we run the ‘update-stats.rb’
every 5 minutes. What happens if one run of the runs takes
longer than 5 minutes (because say the database is slow or there
is just too much data to consume). Another instance
of the script is kicked off in the next 5 minutes, before the
first one had a chance to shutdown. This can have
unintended consequences, like incorrect stats …etc.
We don’t want this to happen. We want to prevent a script
running if another instances of the same script is running.
- We can space the runs sufficiently apart. For
example if a script runs once a day and takes a few hours, it
is highly unlikely it will run into the next run. - A script can set a flag in the database that it is
running. At the start of the script it checks the
flag. If it is set, script refuses to run.But this is not reliable, if the script died in the middle
(exception / out of memory) the flag would still be set and no
future runs is possible - A script can write a file to the file system
(‘script_A_IN_PROGRESS’). But this also suffers the same fate
that if the script died un-expectedly, the file will be
ORPHANED, and no script will run untill the file is manually
removed
So what we need is a SIMPLE, FOOL-PROOF way to indicate a
script is running.
Enter FILE_LOCKS.
On most UNIX/LINUX systems, a process can LOCK a
file. This lock can be a shared lock or exclusive
lock. In case of exclusive lock, the file stays locked
untill the process terminates. Duing that time, no other
process can get an exclusive lock on the file. The
beauty is even if the process has died (of an exception, memory
exhaustion), the lock is released. The operating
system takes care of it.
So here is the process:
- when a process starts up, it tries to acquire an exclusive
lock on a file - if locking is successful, go ahead
- if not, exit
Here is how to do this: It is a ONE LINER, that
is SO SIMPLE 🙂
if File.new(__FILE__).flock(File::LOCK_EX | File::LOCK_NB) == false puts "*** can't lock file, another instance of script running? exiting" exit 1 end # the rest of the script ...
Parameters explained
- __FILE__ : The process tries get an exclusive lock on itself.
- File:LOCK_EX – obtain exclusive lock
- FIle::LOCK_NB – non blocking. IF the lock is not acquirable, return rather the blocking
Keeping Tabs on CRONTABs
The beauty of CRON jobs is, once
you set them up correctly, you can forget about them. They
dutifully run at specified times and do their jobs. It is
still a good idea to produce some logging, so once in a while,
you can check that the job is alive and well.
Here is some simple logging. It prints out the timestamp
that the job is starting and finishing. And prints out how
much time it took. We print out some ‘====’ so we can
seperate the outputs from each run
## run this with script/runner t1 = Time.now puts "================" puts Time.now.strftime("%Y%m%d-%H%M%S") + " : " + __FILE__ + " starting..." if File.new(__FILE__).flock(File::LOCK_EX | File::LOCK_NB) == false puts "*** can't lock file, another instance of script running? exiting" exit 1 end # do the processing... # ... t2 = Time.now puts Time.now.strftime("%Y%m%d-%H%M%S") + " : " + __FILE__ + " finished #{t2 - t1} secs" puts "================"
This is the template I use for my cronjobs
Wrapping them in a SHELL script
I usually like to wrap
the ruby scripts in a shell script, because it saves me
some typing 🙂
Here is a sample script:
#!/bin/bash ## lives in RAILS_APP/myscripts if [ -z "$1" ] then echo "usage : $0 " echo "eg : $0 staging" exit -1 fi environment=$1 mydir=$(readlink -f $(dirname $0)) scriptdir=$(readlink -f "$mydir/../script") ## get RAILS_APP/script $scriptdir/runner -e "$environment" $mydir/update-stats.rb
See how it figures out the script directory? This
way, you can run this script from any directory and it will work
correctly. It doesn’t depend on you to run the script from
the rails application directory. Make sure you have
an uptodate version of ‘readlink’ we use. (Linux
distros come with the right one, Mac-OSX users can
get it from mac-ports)
Setting up the CRON
Before setting up the CRON job, make sure the script runs.
- make sure shell script is executable (chmod 755 script.sh )
- If you are using a shell script, run it to make sure it
works. - Also run it from a different directory other than
Rails application dir.
$ crontab -e
will open your editor to enter cronjobs
Here is couple of entries:
# run every 5 minutes */5 * * * * /var/www/myapp_prod/current/myscripts/update-stats.sh production >> /var/www/myapp_prod/update-stats.log 2>&1 # runs at 1am 0 1 * * * /var/www/myapp_prod/current/db-backup.sh
Note I love the short-hand syntax (*/5). It means every 5
minutes. Beats writing
0,5,10,15,20,25,30,35,49,45,50,55 * * * * /var/www/myapp_prod/current/myscripts/update-stats.sh production >> /var/www/myapp_prod/update-stats.log 2>&1
Also note I redirect output a log file, so I can check it later
if required. Also I am sending stderr and stdout to
the same file (2>&1)
Thats is it..

1 Comment:
By Luis Castillo 01 May 2012
Thansk, this tutorial help me a lot!!!