Signal HandlingMost queing systems send a signal to the user job some period before it runs out of CPU-time. This signal is a tagged interrupt that has no effect if it is not caught. However, if the signal is caught (by including a signal handler in your code), then cleanup actions (like checkpointing) can occur. Note that when a signal is sent, execution is interrupted and the signal handler is called immediately. This may not be the best time to checkpoint. It may well be that the signal handler just sets a global variable (common block or external variable) as a flag to indicate to the program that it should checkpoint at the next convenient time (eg. the end of the present iteration).You must create a subroutine (called a signal handler) which writes or instigates the checkpoint. You then pass the address of this subroutine to a call to signal(), along with the signal number (in general, it is SIGTERM = 15, and also SIGXCPU = 24). The call to signal() should be near the beginning of your program.
Try man -s 3c signal or man -s 3f signal to find out more about signals. You will also need to check the queing system documentation to see what signals it sends.
Fortran example:
subroutine sig_handler(sig) implicit none integer sig C checkpoint code goes here write (*,*) 'SIGTERM Caught' call exit(1) end program main implicit none external sig_handler integer old_handler, signal,i old_handler=signal(15,sig_handler,-1) i=signal(24,sig_handler,-1) C Calculation goes here i=0 do while (.true.) i=i+1 enddo stopC example:#includevoid sighand() { int node; /* checkpoint code goes here */ printf("SIGTERM caught\n"); exit(1); } main() { signal(SIGTERM,sighand); signal(SIGXCPU,sighand); /* calculation code goes here */ for(;;); }