UPDATE: Thanks so much for all the feedback! I’m going to look at using flock as well, and I’ll write that up soon.
Imagine you have a script that archives a bunch of data by copying it to another box. You use cron to schedule that script to run every hour, because normally, the script finishes in about thirty (30) minutes or so.
But every so often, maybe when your application gets really popular, the cron job takes more than an hour. Maybe it takes three hours this one time.
And during that time, cron starts up two more copies of your script. That can cause all sorts of havoc, where two or more scripts each try to modify the same file, for example.
In this scenario, you need a way to prevent those second and third (and maybe fourth and fifth, etc) scripts from starting as long as one is already going.
It would be very helpful when the script started, it first checked if another process was already running. If one is already running, then this new script should just immediately exit. But if no other script is running, then this script should get to work.
Here’s a simple method for doing that:
1. When the script starts, the first thing it does it look for a file in /tmp named something like /tmp/myscript.pid.
2. If that file exists, then the script reads that file. The file holds a process ID (pid). The script now checks if that any process with that pid is running.
3. If there is not a process running with this pid, then probably what happened was the old script crashed without cleaning up this pid file. So, this script should get to work. But if there is a process running with that pid, then there is already a running instance of this script, and so this script should just immediately exit. There’s a tiny risk with this approach that I’ll discuss at the end of this post.
4. Depending on what happened in step 3, the script should exit at this point, or it should get to work. Before the script gets to the real work though, it should write its own process ID into /tmp/myscript.pid.
That’s the pseudocode, now here’s two python functions to help make it happen:
import os
def pid_is_running(pid):
"""
Return pid if pid is still going.
>>> import os
>>> mypid = os.getpid()
>>> mypid == pid_is_running(mypid)
True
>>> pid_is_running(1000000) is None
True
"""
try:
os.kill(pid, 0)
except OSError:
return
else:
return pid
def write_pidfile_or_die(path_to_pidfile):
if os.path.exists(path_to_pidfile):
pid = int(open(path_to_pidfile).read())
if pid_is_running(pid):
print("Sorry, found a pidfile! Process {0} is still running.".format(pid))
raise SystemExit
else:
os.remove(path_to_pidfile)
open(path_to_pidfile, 'w').write(str(os.getpid()))
return path_to_pidfile
And here’s a trivial script that does nothing but check for a pidfile and then sleep for a few seconds:
if __name__ == '__main__':
write_pidfile_or_die('/tmp/pidfun.pid')
time.sleep(5) # placeholder for the real work
print('process {0} finished work!'.format(os.getpid()))
Try running this in two different terminals, and you’ll see that the second process immediately exits as long as the first process is still running.
In the worst case, this isn’t perfect
Imagine that the first process started up and the operating system gave it process ID 99. Then imagine that the process crashed without cleaning up its pidfile. Now imagine that some completely different process started up, and the operating system happens to recycle that process ID 99 again and give that to the new process.
Now, when our cron job comes around, and starts up a new version of our script, then our script will read the pid file and check for a running process with process ID 99. And in this scenario, the script will be misled and will shut down.
So, what to do?
Well, first of all, understand this is an extremely unlikely scenario. But if you want to prevent this from happening, I suggest you make two tweaks:
1. Do your absolute best to clean up that pidfile. For example, use python’s sys.excepthook or atexit functions to make sure that the pid file is gone.
2. Write more than just the process ID into the pid file. For example, you can use ps and then write the process name to the pid file. Then change how you check if the process exists. In addition to checking for a running process with the same pid, check for the same pid and the same data returned from ps for that process.
Check back soon and I’ll likely whip up some kind of some simple library that offers a context manager that does it to the extreme case described above.