Scheduling Tasks with Cron Jobs
Cron Jobs are used for scheduling tasks to run on the server. They’re most commonly used for automating system maintenance or administration. However, they are also relevant to web application development. There are many situations when a web application may need certain tasks to run periodically. Today we are going to explore the fundamentals of Cron Jobs.
Definitions
First let’s familiarize ourselves with the terms related to this subject.
“Cron” is a time-based job scheduler in Unix-like operating systems (Linux, FreeBSD, Mac OS etc…). And these jobs or tasks are referred to as “Cron Jobs”.
There is a cron “daemon” that runs on these systems. A daemon is a program that runs in the background all the time, usually initiated by the system. This cron daemon is responsible for launching these cron jobs on schedule.
The schedule resides in a configuration file named “crontab”. That’s where all the tasks and their timers are listed.
Why Use Cron Jobs?
Server admins have been using cron jobs for a long time. But since the target audience of this article is web developers, let’s look at a few use cases of cron jobs that are relevant in this area:
- If you have a membership site, where accounts have expiration dates, you can schedule cron jobs to regularly deactivate or delete accounts that are past their expiration dates.
- You can send out daily newsletter e-mails.
- If you have summary tables (or materialized views) in your database, they can be regularly updated with a cron job. For example you may store every web page hit in a table, but another summary table may contain daily traffic summaries.
- You can expire and erase cached data files in a certain interval.
- You can auto-check your website content for broken links and have a report e-mailed to yourself regularly.
- You can schedule long-running tasks to run from a command line script, rather than running it from a web script. Like encoding videos, or sending out mass e-mails.
- You can even perform something as simple as fetching your most recent Tweets, to be cached in a text file.
Syntax
Here is a simple cron job:
1 | 10 * * * * /usr/bin/php /www/virtual/username/cron.php > /dev/null 2>&1 |
There are two main parts:
- The first part is “10 * * * *”. This is where we schedule the timer.
- The rest of the line is the command as it would run from the command line.
The command itself in this example has three parts:
- “/usr/bin/php”. PHP scripts usually are not executable by themselves. Therefore we need to run it through the PHP parser.
- “/www/virtual/username/cron.php”. This is just the path to the script.
- “> /dev/null 2>&1”. This part is handling the output of the script. More on this later.
Timing Syntax
This is the first part of the cron job string, as mentioned above. It determines how often and when the cron job is going to run.
It consists of five parts:
- minute
- hour
- day of month
- month
- day of week
Here is an illustration:
Asterisk
Quite often, you will see an asterisk (*) instead of a number. This represents all possible numbers for that position. For example, asterisk in the minute position would make it run every minute.
We need to look at a few examples to fully understand this Syntax.
Examples:
This cron job will run every minute, all the time:
1 | * * * * * [command] |
This cron job will run at minute zero, every hour (i.e. an hourly cron job):
1 | 0 * * * * [command] |
This is also an hourly cron job but run at minute 15 instead (i.e. 00:15, 01:15, 02:15 etc.):
1 | 15 * * * * [command] |
This will run once a day, at 2:30am:
1 | 30 2 * * * [command] |
This will run once a month, on the second day of the month at midnight (i.e. January 2nd 12:00am, February 2nd 12:00am etc.):
1 | 0 0 2 * * [command] |
This will run on Mondays, every hour (i.e. 24 times in one day, but only on Mondays):
1 | 0 * * * 1 [command] |
You can use multiple numbers separated by commas. This will run three times every hour, at minutes 0, 10 and 20:
1 | 0,10,20 * * * * [command] |
Division operator is also used. This will run 12 times per hour, i.e. every 5 minutes:
1 | */5 * * * * [command] |
Dash can be used to specify a range. This will run once every hour between 5:00am and 10:00am:
1 | 0 5-10 * * * [command] |
Also there is a special keyword that will let you run a cron job every time the server is rebooted:
1 | @reboot [command] |
Setting Up and Managing Cron Jobs
There are a few different ways to create and manage your cron jobs.
Editing the Crontab
Running this command will launch vi (text editor) and will let you edit the contents of the crontab:
1 | crontab -e |
So it would help to be familiar with the basic vi commands as it is quite different than any other text editor you might have worked with.
If you would just like to see the existing crontab without editing it, you can run this command:
1 | crontab -l |
To delete the contents of the crontab:
1 | crontab -r |
Loading a File
You can write all of your cron jobs into a file and then push it into the crontab:
1 | crontab cron.txt |
Be careful, because this will overwrite all existing cron jobs with this files contents, without warning.
Comments
You can add comments followed by the # character.
12 | # This cron job does something very important 10 * * * * /usr/bin/php /www/virtual/username/cron.php > /dev/null 2>&1 |
Setting the E-mail
As I mentioned earlier, by default the output from the crons get sent via e-mail, unless you discard them or redirect them to a file. The MAILTO setting let’s you set or change which e-mail address to send them to:
123 | MAILTO="username@example.com" # This cron job does something very important 10 * * * * /usr/bin/php /www/virtual/username/cron.php > /dev/null 2>&1 |
Using the PHP Parser
CGI scripts are executable by default, but PHP scripts are not. They need to run through the PHP parser. That’s why we need to put the path to the parser before the path of the script.
1 | * * * * * /usr/bin/php [path to php script] |
Sometimes it might be under another location like: “/usr/local/bin/php”. To find out, you can try running this in the command line:
1 | which php |
Handling the Output
If you do not handle the output of the cron script, it will send them as e-mails to your user account on the server.
Discarding Output
If you put “> /dev/null 2>&1” at the end of the cron job command (or any command), the output will be discarded.
The closing bracket (>) is used for redirecting output. “/dev/null” is like a black hole for output. Anything that goes there is ignored by the system.
This part “2>&1” causes the STDERR (error) output to be redirected to the STDOUT (normal) output. So that also ends up in the “/dev/null”.
Outputting to a File
To store the cron output in a file, use the closing bracket (>) again:
1 | 10 * * * * /usr/bin/php /www/virtual/username/cron.php > /var/log/cron.log |
That will rewrite the output file every time. If you would like to append the output at the end of the file instead of a complete rewrite, use double closing bracket (>>) instead:
1 | 10 * * * * /usr/bin/php /www/virtual/username/cron.php >> /var/log/cron.log |
Executable Scripts
Normally you need to specify the parser at the beginning of the command as we have been doing. But there is actually a way to make your PHP scripts executable from the command line like a CGI script.
You need is to add the path to the parser as the first line of the script:
12345678 | #!/usr/local/bin/php <?php echo "hello world\n" ; // ... ?> |
Also make sure to set proper chmod (like 755) to make the file executable.
When you have an executable script, the cron job can be shorter like this:
1 | 10 * * * * /www/virtual/username/hello.php |
Preventing Cron Job Collision
In some cases you may have frequent running cron jobs, and you may not want them to collide if they take longer to run than the frequency itself.
For example, you may have a cron job running every minute. Yet, every once in a while it may take longer than one minute to run. This can cause another instance of the same cron script to start running before the previous one finishes. You can create too many busy processes this way and possibly crash the server if they keep slowing down each other, and cause even more processes to be created over time..
This problem can be addressed via file locking, and more specifically the non-blocking (LOCK_NB) type of file locks. (If you are not familiar with file locking, I suggest you read about it first.)
You can add this code to the cron job script:
01020304050607080910 | $fp = fopen ( '/tmp/lock.txt' , 'r+' ); if (! flock ( $fp , LOCK_EX | LOCK_NB)) { echo 'Unable to obtain lock' ; exit (-1); } /* ... */ fclose( $fp ); |
With regular file locks the flock() function call would block the script if there is an existing lock. And it would release once that lock is gone. However, with a non-blocking lock, such as in the code above, the function call does not stop the script, but it immediately returns FALSE if there is an existing lock. So in this case, we can immediately exit the script when we see there is an existing lock, which indicates that another cron job is currently running.
Blocking Web Access to Cron Jobs
When you write a cron job in a web scripting language like PHP, you may want to make sure that nobody can execute it by just loading it from their browser. One easy option would be to store these script outside of your web folder. However this may not be practical or preferable for some developers, if they want to keep their cron job scripts right within their web application folders.
If you put all of the cron job scripts in a folder, you block access by putting this line in an .htaccess file:
1 | deny from all |
Or you can also deny access to scripts on an individual basis by putting this line at the beginning:
1 | if (isset( $_SERVER [ 'REMOTE_ADDR' ])) die ( 'Permission denied.' ); |
This will ensure that, when the script is accessed from the web, it will abort immediately.