Scheduling scripts with CRON

Setting up cron to run something at an specific interval on some random web server in a controlled way is a very typical task every server administrator will face during its career. Most of the time you end up saying “Ok, this will be a breeze”, and usually end up forgetting the same 3 things over and over again and messing it a couple of times before actually getting it done properly, unless you do it every day, in which case this post will be useless. But if you are like me and only do this once or twice a month, this will be quite handy. Let’s get to it:

Creating a bash script for cron to run, this will be our cronjob:

#!/bin/bash

echo "Reloading apache at $(date)"
service apache2 reload

exit 0

Lets save it as myscript.cron and give it execute permissions with chmod +x myscript.cron

Now lets add it to the crontab file for the root user. Obviously, login as the root user first, and then run crontab -e

Lets say we want to run it at 02:00 daily, so the crontab entry should read:

0 2 * * * /path/to/my/script/myscript.cron

Save the cronfile and you are ready to go. Remember that if the file is located on a noexec mount, you will need to launch a shell process like bash or sh and pass the script file as an argument for it to be executed, in which case the execute bit of the file is completely useless.

We can check if the job is being run by the CRON daemon by looking at the syslog file on /var/log/syslog

But what about the output from the echo call on the script? And what if you need the entire output of the service binary? In that case, the easiest way is to log to a separate file by redirecting the output of the script with standard shell redirection from inside the cronfile itself.

0 2 * * * /path/to/my/script/myscript.cron >> /path/to/my/script/myscript.log 2>&1

Now you are ready to go, your script’s output will be automatically appended to the given log file.

And if we need to execute a binary at certain intervals instead of a fixed time and date?, Lets say each 5 minutes. Well, in that case we can tweak the script a little, and since we are at it, lets make it more customizable by using variables for the target binary and argument strings:

#!/bin/bash

mybin="$(which php)"
myargs="-f cron.php"

echo "BEGIN $0 at $('date')"

$mybin $myargs

echo "END $0 at $('date')"
exit 0

As you can see, we can further duplicate our script and customize the mybin and myargs variables for any other binary we want to schedule.

And on the crontab file we can do:

*/5 * * * * /path/to/my/script/myscript.cron >> /path/to/my/script/myscript.log 2>&1

The little change of 0 2 to */5 * will suffice. The script will be automatically executed at a “fixed” 5 minute interval. And yes, I quote fixed because its not guaranteed to be executed at exactly 5 minute intervals, its actual execution depends entirely on the CRON daemon and its inner workings, but generally speaking you can get away with this and call it a day.

Since the script is now executing the php binary, we most likely need to (should) run it as the www-data user. For this you will need to move the crontab entry from the root user to the crontab file for www-data:

crontab -u www-data -e

Don’t forget that the log directory must also exists and be owned, or be writable by the www-data user or the script will fail when run by the CRON daemon. Lets create a new empty file for it and change its owner accordingly.

# Logs directory
mkdir /path/to/my/script/logs
chown -R www-data:www-data /path/to/my/script/logs

We have so far an script that logs to an specified file owned by the forking user, and its executed by the CRON daemon at specified intervals or hardcoded dates. As i said, this should suffice most scenarios.


But wait! What if the mybin binary takes too much time to quit? Then your script execution can effectively overlap, since the CRON daemon don’t check if the task is already running before spawning another instance at then next trigger. This could easily be avoided by using flock, but since it requires another binary to be installed system-wide, and this is pretty easy to implement, we are gonna do this all by hand from within the script itself.

#!/bin/bash

mybin="$(which php)"
myargs="-f cron.php"
lockfile="/path/to/my/lockfile/myscript.lock"

echo "BEGIN $0 at $('date')"

if [ -f "$lockfile" ]; then
	pid=$(cat "$lockfile" | awk '{print $1}')
    ps -p "$pid" > /dev/null 2>&1
    if [ $? -eq 0 ]; then
		echo "ERROR: The last background task is already running. Skipping trigger to avoid overlapping."
        echo "END $0 at $('date') due to a previous triggered task still running"
        exit 1
    fi
    
    echo "WARNING: The last background task did not remove the lockfile at exit. Doing it now."
    rm "$lockfile"
fi

echo -e "$$\t$(date +'%D\t%H:%M:%S')\t$0" > "$lockfile"
result="$?"

if [ ! "$result" -eq 0 ]; then
	echo "ERROR: Cannot create lockfile $lockfile. $result"
    echo "END $0 at $('date') due to critical error"
    exit 2
fi

$mybin $myargs

rm "$lockfile"

echo "END $0 at $('date') normally"
exit 0

This script will check if a lockfile already exists before executing the actual payload and skip triggering if found, also have some pretty simple logic that check if the lockfile, when present, is orphaned, and if so, will delete it and continue normally. This should also cover unexpected script exits due to errors, termination or system power losses.

Since we are now using lock files, we need to ensure /path/to/my/lockfile/ exists and is also owned or writeable by the www-data user. Lets do this as well:

# Locks directory
mkdir /path/to/my/script/locks
chown -R www-data:www-data /path/to/my/script/locks

If you strictly have cronjobs writing to this directory or the logs one, that run on different user accounts, you will need to tweak their permissions accordingly or even make then globally writable, but beware this is strongly discouraged.


One last thing… What if the binary at $mybin is stuck? If a background task is expected to run for at most 5 minutes, and ends up running for more than 2 hours, then something must be wrong. Or maybe not, more on that later; For now, lets tweak the script one more time to allow it to kill already running instances if they were running for longer than expected.

#!/bin/bash

mybin="$(which php)"
myargs="-f cron.php"
lockfile="/path/to/my/lockfile/myscript.lock"
maxetime=7200

echo "BEGIN $0 at $('date')"

if [ -f "$lockfile" ]; then
	pid=$(cat "$lockfile" | awk '{print $1}')
    ps -p "$pid" > /dev/null 2>&1
    if [ $? -eq 0 ]; then
    	etime=$(ps -p "$pid" -o etimes= | awk '{print $1}')
        if [ "$etime" -lt "$maxetime" ]; then
			echo "ERROR: The last background task is already running. Skipping trigger to avoid overlapping."
        	echo "END $0 at $('date') due to a previous triggered task still running"
        	exit 1
        fi
        
        echo "WARNING: The last background task timed out at $etime seconds. Killing it before continuing..."
        kill "$pid"
    else
    	echo "WARNING: The last background task did not remove the lockfile at exit. Doing it now."
    fi
    
    rm "$lockfile"
fi

echo -e "$$\t$(date +'%D\t%H:%M:%S')\t$0" > "$lockfile"
result="$?"

if [ ! "$result" -eq 0 ]; then
	echo "ERROR: Cannot create lockfile $lockfile. $result"
    echo "END $0 at $('date') due to critical error"
    exit 2
fi

$mybin $myargs

rm "$lockfile"

echo "END $0 at $('date') normally"
exit 0

Here, the maxetime variable is introduced, and indicates the maximum amount of seconds that a previous instance of the binary is allowed to run.

This last version of our script looks a lot better than our starting 4 liner. But as i said, this last version should be used in case your binary has a limited time-frame for doing its work. Ideally you should study your case and choose whichever version best suit the binary you are trying to schedule, if there is a chance for it to increase its runtime beyond the scheduled interval, then you definitely should implement an anti-overlapping technique like the one described here. Also, you should implement a killing mechanism for stuck processes ONLY on those scenarios where you can afford killing an old instance and starting again from scratch, and also deal with potentially damaging any file that wasn’t synced to disk.

Share this post

Leave a Reply