Last edited: July 2018
resource_monitor is Copyright (C) 2013 The University of Notre Dame.
All rights reserved.
This software is distributed under the GNU General Public License.
See the file COPYING for details.
resource_monitor generates up to three log files: a JSON encoded summary file with the maximum values of resource used and the time they occurred, a time-series that shows the resources used at given time intervals, and a list of files that were opened during execution.
Additionally, resource_monitor may be set to produce measurement snapshots according to events in some files (e.g., when a file is created, deleted, or a regular expression pattern appears in the file.). Maximum resource limits can be specified in the form of a file, or a string given at the command line. If one of the resources goes over the limit specified, then the monitor terminates the task, and reports which resource went over the respective limits.
In systems that support it, resource_monitor wraps some libc functions to obtain a better estimate of the resources used.
% resource_monitor -O mymeasurements -- ls
      This will generate the file mymeasurements.summary, describing the resource usage of the command "ls".
      % resource_monitor -O mymeasurements --with-time-series --with-inotify -O mymeasurements -- ls
     This will generate three files describing the resource usage of
     the command "ls". These files are 
mymeasurements.summary, 
mymeasurements.series, and mymeasurements.files, in
which PID represents the
corresponding process id. 
    By default, measurements are taken every second, and each time an event
    such as a file is opened, or a process forks, or exits.  We can specify the
    output names, and the sampling intervals:
      % resource_monitor -O log-sleep -i 2 -- sleep 10
      The previous command will monitor "sleep 10", at two second
      intervals, and will generate the
      files log-sleep.summary, log-sleep.series,
      and log-sleep.files.
      The monitor assumes that the application monitored is not interactive. To
      change this behaviour use the -f switch:
      % resource_monitor -O mybash -f -- /bin/bash
command:                  the command line given as an argument
start:                    time at start of execution, since the epoch
end:                      time at end of execution, since the epoch
exit_type:                one of "normal", "signal" or "limit" (a string)
signal:                   number of the signal that terminated the process
                          Only present if exit_type is signal
cores:                    maximum number of cores used
cores_avg:                number of cores as cpu_time/wall_time
exit_status:              final status of the parent process
max_concurrent_processes: the maximum number of processes running concurrently
total_processes:          count of all of the processes created
wall_time:                duration of execution, end - start
cpu_time:                 user+system time of the execution
virtual_memory:           maximum virtual memory across all processes
memory:                   maximum resident size across all processes
swap_memory:              maximum swap usage across all processes
bytes_read:               amount of data read from disk
bytes_written:            amount of data written to disk
bytes_received:           amount of data read from network interfaces
bytes_sent:               amount of data written to network interfaces
bandwidth:                maximum bandwidth used
total_files:              total maximum number of files and directories of
                          all the working directories in the tree
disk:                     size of all working directories in the tree
limits_exceeded:          resources over the limit with -l, -L options (JSON object)
peak_times:               seconds from start when a maximum occured (JSON object)
snapshots:                List of intermediate measurements, identified by
                          snapshot_name (JSON object)
      The time-series log has a row per time sample. For each row, the columns have the following meaning:
      
wall_clock                the sample time, since the epoch, in microseconds
cpu_time                  accumulated user + kernel time, in microseconds
cores                     current number of cores used
max_concurrent_processes  concurrent processes at the time of the sample
virtual_memory            current virtual memory size, in MB
memory                    current resident memory size, in MB
swap_memory               current swap usage, in MB
bytes_read                accumulated number of bytes read, in bytes
bytes_written             accumulated number of bytes written, in bytes
bytes_received            accumulated number of bytes received, in bytes
bytes_sent                accumulated number of bytes sent, in bytes
bandwidth                 current bandwidth, in bps
total_files               current number of files and directories, across all
                          working directories in the tree
disk                      current size of working directories in the tree, in MB
{
    "wall_time": [3600, "s"],
    "swap_memory": [5, "GB"]
}
resource_monitor -O output --monitor-limits=limits.json -- myapp 
 snapshots.json:
{
    "my.log":
        {
            "events":[
                {
                    "label":"file-created",
                    "on-creation":true
                },
                {
                    "label":"started",
                    "on-pattern":"^# START"
                },
                {
                    "label":"end-of-start",
                    "on-pattern":"^# PROCESSING"
                }
                {
                    "label":"end-of-processing",
                    "on-pattern":"^# ANALYSIS"
                }
                {
                    "label":"file-deleted",
                    "on-deletion":true
                }
            ]
        }
}
resource_monitor -O output --snapshots-file=snapshots.json -- myapp 
Snapshots are included in the output summary file as an array of JSON objects under the key snapshots. Additionally, each snapshot is written to a file output.snapshot.N, where  N  is 0,1,2,... For other examples, please visit the man page resource_monitor.q = work_queue_create(port);
work_queue_enable_monitoring(q, some-log-file, /* kill tasks on exhaustion */ 1);
      wraps every task with the monitor, and appends all generated
      summary files into the file some-log-file. Currently
      only summary reports are generated from work queue.
	  universe = vanilla
executable = matlab
arguments = -r "run script.m"
output = matlab.output
transfer_input_files=script.m
should_transfer_files = yes
when_to_transfer_output = on_exit
log = condor.matlab.logfile
queue
      This can be rewritten, for example, as:
 universe = vanilla
executable = resource_monitor
arguments = -O matlab-resources --limits-file=limits.json -r "run script.m"
output = matlab.output
transfer_input_files=script.m,limits.json,/path/to/resource_monitor
transfer_output_files=matlab-resources.summary
should_transfer_files = yes
when_to_transfer_output = on_exit
log = condor.matlab.logfile
queue