Condor Changelog

What's new in Condor 7.9.0 Dev

Sep 11, 2012
  • New Features:
  • Machine slots can now be configured to identify and divide customized local resources. Jobs may then request these resources. See section 3.12.8 for details. (Ticket #2905).
  • Condor now supports and implements the caching of ClassAds to reduce memory footprints. This feature is experimental and is currently disabled by default. It can be enabled by setting the new configuration variable ENABLE_CLASSAD_CACHING to True. (Ticket #2541). (Ticket #3127).
  • condor_status now returns the condor_schedd ClassAd directly from the condor_schedd daemon, if both options -direct and -schedd are given on the command line. (Ticket #2492).
  • The new -status and -echo command line options to condor_wait command cause it to show job start and terminate information, and to print events to stdout. (Ticket #2926).
  • Added a DEBUG logging level output flag D_CATEGORY, which causes Condor to include the logging level flags in effect for each line of logged output. (Ticket #2712).
  • condor_status and condor_q each have a new -autoformat option to make some output format specifications easier than the existing -format option. See the condor_status manual page located on page [*] and the condor_q manual page located on page [*] for details. (Ticket #2941).
  • Enhanced the ClassAd log system to report the log line number on parse failures, and improved the ability to detect parse failures closer to the point of corruption. (Ticket #2934).
  • Added an -evaluate option to condor_config_val, which causes the configured value queried from a given daemon to be evaluated with respect to that daemon's ClassAd. (Ticket #856).
  • Added code to condor_dagman, such that a VARS assignment in a top-level DAG is applied to splices. (Ticket #1780).
  • Condor now uses libraries from Globus 5.2.1. (Ticket #2838).
  • When authenticating Condor daemons with GSI and configuration variable GSI_DAEMON_NAME is undefined, Condor checks that the server name in the certificate matches the host name that the client is connecting to. When GSI_DAEMON_NAME is defined, the old behavior is preserved: only certificates matching GSI_DAEMON_NAME pass the authentication step, and no host name check is performed. The behavior of the host name check may be further controlled with the new configuration variables GSI_SKIP_HOST_CHECK and GSI_SKIP_HOST_CHECK_CERT_REGEX. (Ticket #1605).
  • Added new capability to condor_submit to allow recursive macros in submit description files. This facility allows one to update variables recursively. Before this new capability was added, recursive definition would send condor_submit into an infinite loop of expanding the macro, such that the expansion would fill up memory. See section 10 for details. (Ticket #406).
  • A DAGMan limitation and restriction has been removed. It is now permitted to define a log command using a macro, within a node job's submit description file. (Ticket #2428).
  • To enforce the dependencies of a DAG, DAGMan now uses and watches only the default node user log of the condor_dagman job for events. DAGMan requests the condor_schedd and condor_shadow daemons to write each event to this default log, in addition to writing to a log specified by the node job. condor_dagman writes POST script terminate events only to its default log; these terminate events are not written to the user log. This behavior can be turned off by setting the configuration variable DAGMAN_ALWAYS_USE_NODE_LOG to False. For correct behavior, DAGMAN_ALWAYS_USE_NODE_LOG should be set to False if condor_dagman version 7.9.0 or later is submitting jobs to an older version of a condor_schedd daemon or of a condor_submit executable. (Ticket #2807).
  • condor_submit has a new -interactive option for platforms other than Windows, which schedules and runs a job that provides a shell prompt on the execute machine. Documentation of this feature is not yet available. (Ticket #3088).
  • Configuration Variable and ClassAd Attribute Additions and Changes:
  • The new configuration variables MACHINE_RESOURCE_NAMES (see section 3.3.10) and MACHINE_RESOURCE_ (see section 3.3.10) identify and specify the use of customized local machine resources. (Ticket #2905).
  • The new configuration variable ENABLE_CLASSAD_CACHING controls whether the new caching feature of ClassAds is used. The default value is False. (Ticket #3127).
  • The new configuration variable CLASSAD_LOG_STRICT_PARSING controls whether ClassAd log files such as the job queue log are read with strict parse checking on ClassAd expressions. (Ticket #3069).
  • The default value for configuration variable USE_PROCD is now True for the condor_master daemon. This means that by default the condor_master will start a condor_procd daemon to be used by it and all other daemons on that machine. (Ticket #2911).
  • There is a new configuration variable used by the condor_starter. If STARTER_RLIMIT_AS is set to an integer value, the condor_starter will use the setrlimit() system call with the RLIMIT_AS parameter to limit the virtual memory size of each process in the user job. The value of this configuration variable is a ClassAd expression, evaluated in the context of both the machine and job ClassAds, where the machine ClassAd is the my ClassAd, and the job ClassAd is the target ClassAd. (Ticket #1663).
  • New configuration variables were added to to the condor_schedd to define statistics that count subsets of jobs. These variables have the form SCHEDD_COLLECT_STATS_BY_ , and should be defined by a ClassAd expression that evaluates to a string. See section 3.3.11 for the complete definition. The optional configuration variable of the form SCHEDD_EXPIRE_STATS_BY_ can be used to set an expiration time, in seconds, for each set of statistics. (Ticket #2862).
  • The new batch_queue submit description file command and new job ClassAd attribute BatchQueue specify which job queue to use for grid universe jobs of type pbs, lsf, and sge. (Ticket #2996).
  • The new configuration variable GSI_SKIP_HOST_CHECK is a boolean that controls whether a check is performed during GSI authentication of a Condor daemon. When the default value False, the check is not skipped, so the daemon host name must match the host name in the daemon's certificate, unless otherwise exempted by values of GSI_DAEMON_NAME or GSI_SKIP_HOST_CHECK_CERT_REGEX. When True, this check is skipped, and hosts will not be rejected due to a mismatch of certificate and host name. (Ticket #1605).
  • The new configuration variable GSI_SKIP_HOST_CHECK_CERT_REGEX may be set to a regular expression. GSI certificates of Condor daemons with a subject name that are matched in full by this regular expression are not required to have a matching daemon host name and certificate host name. The default is an empty regular expression, which will not match any certificates, even if they have an empty subject name. (Ticket #1605).
  • Bugs Fixed:
  • Fixed a bug in which usage of cgroups incorrectly included the page cache in the maximum memory usage. This bug fix is also included in Condor version 7.8.2. (Ticket #3003).
  • The EC2 GAHP will now respect the value of the environment variable X509_CERT_DIR and the configuration variable GSI_DAEMON_TRUSTED_CA_DIR for all secure connections. (Ticket #2823).
  • Condor will avoid selecting down (disabled) network interfaces. Previously Condor could select a down interface over an up (active) interface. (Ticket #2893).
  • Made logic in the condor_negotiator that computes submitter limits properly aware of the configuration variable NEGOTIATOR_CONSIDER_PREEMPTION . (Ticket #2952).
  • Condor no longer back-dates file modification times by 3 minutes when transferring job input files into the job spool directory or the job execute directory. (Ticket #2423).
  • Fixed a bug in which the use of a pipe in the configuration file on Windows platforms would cause a visible console window to show up whenever the configuration was read. (Ticket #1534).

New in Condor 7.8.3 (Sep 11, 2012)

  • New Features:
  • The libcondorapi library for reading and writing job event logs is again available as a shared library on Linux and Mac OS platforms. Since Condor 7.5.x, it had only been available as a static library. (Ticket #3047).
  • Configuration Variable and ClassAd Attribute Additions and Changes:
  • To avoid the output of an unnecessary DAGMan error message, the value of DAGMAN_LOG_ON_NFS_IS_ERROR is ignored when both CREATE_LOCKS_ON_LOCAL_DISK and ENABLE_USERLOG_LOCKING are True. (Ticket #3087).
  • Bugs Fixed:
  • Fixed a bug in which usage of cgroups incorrectly included the page cache in the maximum memory usage. This bug fix is also included in Condor version 7.9.0. (Ticket #3003).
  • Jobs from a hook to fetch work, where the hook is defined by configuration variable _HOOK_FETCH_WORK, now correctly receive dynamic slots from a partitionable slot instead of claiming the entire partitionable slot. (Ticket #2819).
  • Fixed a bug in which a slot might become stuck in the Preempting state when a condor_startd is configured with a hook to fetch work, as defined by _HOOK_FETCH_WORK . (Ticket #3076).
  • Fixed a bug that caused Condor to transfer a job's input files from the execute machine back to the submit machine as if they were output files. This would happen if the job's input files were stored in Condor's spool directory; occurred if the job was submitted via Condor-C or via condor_submit with the -spool or -remote options. (Ticket #2406).
  • Fixed a bug that could cause the first grid-type cream jobs destined for a particular CREAM server to never be submitted to that server. This bug was probably introduced in Condor version 7.6.5. (Ticket #3054).
  • Fixed several problems with the XML parsing class ClassAdXMLParser in the ClassAds library: Several methods named ParseClassAd() were declared, but never implemented. (Ticket #3049). The parser silently dropped leading white space in string values. (Ticket #3042). The parser could go into an infinite loop or leak memory when reading a malformed ClassAd XML document. (Ticket #3045).
  • Fixed a bug that prevented the -f command line option to condor_history from being recognized. The -f option was being interpreted as -forward. At least four letters are now required for the -forward option (-forw) to prevent ambiguity. (Ticket #3044).
  • The implementation of the condor_history -backwards option, which is the default ordering for reading the history file, in the 7.7 series did not work on Windows platforms. This has been fixed. (Ticket #3055).
  • Fixed a bug that caused an invalid proxy to be delegated when refreshing the job's X.509 proxy when configuration variable DELEGATE_JOB_GSI_CREDENTIALS_LIFETIME was set to 0. (Ticket #3059).
  • Fixed a bug in which DAGMan did not account properly for jobs being suspended and then unsuspended. (Ticket #3108).
  • condor_dagman now takes note of job reconnect failed events (event code 24) in the user log, for counting idle jobs. (Ticket #3189).
  • Job IDs generated by NorduGrid ARC 12.05 and above are now properly recognized. (Ticket #3062).
  • Fixed a bug in which Condor would not mark grid-type nordugrid jobs as Running due to variation in the format of the job status value. NorduGrid ARC job statuses of the form INLRMS: ? are now properly recognized both with and without the space after the colon. (Ticket #3118).
  • The condor_gridmanager now properly handles X.509 proxy files that are specified in the job ClassAd with a relative path name. (Ticket #3027).
  • Fixed a bug that caused daemon names, as set in configuration variables such as STARTD_NAME, containing a period character to be ignored. (Ticket #3172).
  • Fixed a bug that prevented the condor_schedd from removing old execute directories for local universe jobs on start up. (Ticket #3176).
  • The condor_defrag daemon sometimes scheduled fewer draining attempts than specified. (Ticket #3199).
  • Fixed a bug that could cause the condor_gridmanager to crash if a grid universe job's X.509 user certificate did not contain an e-mail address. (Ticket #3203).
  • Fixed a bug introduced in Condor version 7.7.5 that caused multiple condor_schedd daemons running on the same machine to share the job queue with each other due to way in which the default value of configuration variable JOB_QUEUE_LOG was set. (Ticket #3196).
  • Fixed a bug that could cause condor_q to not print all jobs when it thought it was querying an old condor_schedd daemon. (Ticket #3206).
  • Fixed a bug that could cause a job's standard output and standard error files to be written in the job's initial working directory, despite the submit description file's specification to write them to a different directory. This would happen when the file transfer mechanism was used, the execution machine was running Condor version 7.7.1 or earlier, and either Condor's security negotiation was disabled or the configuration variable SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION was set to True. (Ticket #3208).
  • The log message generated when the EXECUTE directory is missing is now more helpful. (Ticket #3194).
  • The load average was incorrect for non-English versions on Windows platforms. This has been fixed for Windows Vista and more recent versions. (Ticket #3182).

New in Condor 7.6.1 (Jun 10, 2011)

  • Bugs Fixed:
  • condor_q -analyze failed to provide detailed analysis of the job's requirements expression when the expression contained ClassAd function calls in some cases.
  • Fixed a segmentation fault bug introduced in Condor version 7.5.5, in the checkpoint and restart of jobs using compressed checkpoint images under the standard universe. By default, Condor will not compress checkpoints under the standard universe. Jobs which do not compress their checkpoints were immune to this bug. Compressed checkpoints are only available in 32-bit versions of Condor. Generation of checkpoints in 64-bit versions of Condor are unaffected.
  • In Condor version 7.6.0, the condor_schedd would create a spool directory for every job. The corrected and previous behavior has now been restored, which is to create a spool directory only when needed.
  • Fixed a bug introduced in Condor version 7.5.2, that caused the condor_negotiator daemon to crash if any machine ClassAds contained cyclical attribute references.
  • Fixed a bug that caused usage by nice_user jobs to be charged to the user directly rather than `nice-user.user'. This bug was introduced in the 7.5 series.

New in Condor 7.3.0 (Feb 25, 2009)

  • This release is incompatible when communicating with previous versions of Condor if CCB is enabled or if PRIVATE_NETWORK_NAME is configured.
  • Updated the DRMAA version. This new version is compliant with GFD.133, the DRMAA 1.0 grid recommendation standard. Three new functions were added to meet the specification's requirements, and several bugs were fixed.
  • New Features:
  • Added support for using any recognized script as an executable in a submit file on Windows. For more information please see section 6.2.6 on page [*].
  • Improved support for private networks: Added CCB, the Condor Connection Broker. It is similar in functionality to GCB, the Generic Connection Broker, but it has several advantages, including ease of use and working on Windows as well as Unix platforms. GCB continues to work, but we may remove it some time in the 7.3 development series. The main missing feature in CCB at the moment that prevents it from replacing GCB, is support for connectivity from one private network to another. CCB only works when connecting from a public network to a private one. For example, jobs may be sent from a condor_schedd on the public Internet to condor_startd daemons on a private network, if the condor_startd daemons are configured to use a CCB server that is accessible to the condor_schedd daemon. However, if the condor_schedd daemon is on one private network and the condor_startd daemons are on a different private network, CCB does not help. For more information on CCB, see section 3.7.3.
  • Added support for a CPU affinity on Linux platforms.
  • Added support for the condor_q -better-analyze option on Windows.
  • Added WANT_HOLD. When PREEMPT becomes true, if WANT_HOLD is true, the job is put on hold for the reason (optionally) specified by WANT_HOLD_REASON and WANT_HOLD_SUBCODE. These policy expressions are evaluated by the execute machine. As usual, the job owner may specify periodic_release and/or periodic_remove expressions to react to specific hold states automatically.
  • Added the ClassAd function debug(). See section 4.1.1 for the details of this function.
  • The condor_schedd can now use MD5 check sums to avoid storing multiple copies of the same executable in its SPOOL directory. Note that this feature only affects executables sent to the condor_schedd via the copy_to_spool command within a submit description file.
  • Reduced the number of sleeps condor_dagman does to maintain log file consistency when a DAG uses multiple user logs for node jobs. DAGMan now does one sleep per submit cycle, instead of one sleep for each submit.
  • Added the -import_env command-line flag to condor_submit_dag. This explicitly puts the submittor's environment into the .condor.sub file.
  • Optimized the removal of large numbers of jobs. Previously, removal of tens of thousands of jobs caused the condor_schedd daemon to consume a lot of CPU time for several minutes.
  • Reduced memory usage by the condor_shadow daemon. Since there is one condor_shadow process per running job, this helps increase the number of running jobs that a submit machine can handle. Under Linux 2.6, we found that running 10,000 jobs from a single submit machine requires about 10GBytes of system RAM. We also found in this case that to run more than 10,000 simultaneous jobs requires a 64-bit submit machine. On a 32-bit Linux platform, kernel memory is exhausted, regardless of how much additional RAM the system has.
  • Reduced the memory usage of the condor_collector daemon, when UPDATE_COLLECTOR_WITH_TCP = True.
  • Configuration Variable Additions and Changes:
  • The new configuration variable OPEN_VERB_FOR__FILES allows the default interpreter for scripts with an extension EXT to be changed. For more information please see section 6.2.6 on page [*].
  • The new configuration variable CCB_ADDRESS configures a daemon to use one or more CCB servers to allow communication with Condor components outside of the private network. See page [*].
  • The new configuration variable MAX_FILE_DESCRIPTORS (on Unix platforms only) specifies the required file descriptor limit for a Condor daemon. File descriptors are a system resource used for open files and for network connections. Condor daemons that make many simultaneous network connections may require an increased number of file descriptors. For example, see page [*] for information on file descriptor requirements of CCB.
  • The new configuration variables ENFORCE_CPU_AFFINITY and SLOTx_CPU_AFFINITY on Linux platforms allow for Condor to lock slots to given CPUs.
  • The new configuration variable DEBUG_TIME_FORMAT allows a custom specification for the format of the time printed at the start of each line in a daemon's log file. See 3.3.4 for the complete definition of this variable.
  • The new configuration variable SHARE_SPOOLED_EXECUTABLES is a boolean value that determines whether the condor_schedd daemon will use MD5 check sums to avoid storing multiple copies of the same executable in the SPOOL directory. The default setting is True.

New in Condor 7.2.1 (Feb 21, 2009)

  • New Features:
  • Condor now has a clipped port to i386 Debian 5.0 (Lenny).
  • Added standard universe support for gfortran.
  • Added support for standard output and standard error to be greater than 2 Gigabytes.
  • Configuration Variable Additions and Changes:
  • The configuration variable JAVA_MAXHEAP_ARGUMENT now defaults to the value -Xmx1024m. The installation process of Condor resets this value to UNDEFINED in the local configuration file, if the detected JVM is not from Sun Microsystems.
  • A new feature has been added to the condor_master that makes it possible to append to the DC_DAEMON_LIST configuration variable, instead of overwriting it. To take advantage of this, place the plus character ('+') as the first character in the DC_DAEMON_LIST definition.
  • The new configuration variable DAGMAN_COPY_TO_SPOOL controls whether the condor_dagman binary gets copied to the spool directory when a DAG is submitted. See 3.3.25 for details.
  • Added -version and -help command line options to condor_submit_dag.
  • Bugs Fixed:
  • Fixed a bug in the condor_collector which could cause it to hang indefinitely while reading network input in rare conditions.
  • Fixed a bug in condor_chirp for Windows which was causing it to crash on invocation.
  • Fixed a bug in the Windows condor_mail program, which was causing it to become unresponsive when run. If left running, the application also increased its memory consumption.
  • Fixed a bug that could cause the condor_schedd to never evaluate periodic expressions.
  • Fixed a bug on Unix platforms where condor_configure would provide incorrect defaults for the JAVA_MAXHEAP_ARGUMENT attribute in the installed configuration files. The new current default for Sun Java JVMs is -Xmx1024m.
  • Fixed a bug on Unix platforms where condor_configure would imply that using the Unix user root or UID 0 for the -owner option is a good thing. It is not, and would then complain that it could not find user root in the password file.
  • Fixed a bug on Unix platforms where condor_configure would emit errors about not being able to execute ldd when installing Condor on the Mac OS X 10.5 platform. condor_configure now correctly detects shared library requirements when installing the Condor binaries on the Mac OS X 10.5 platform.
  • Fixed a bug where execute-side daemons started before the condor_credd would fail to match with Windows jobs with run_as_owner set. This condition persisted until the execute-side daemons were either restarted or reconfigured.
  • Fixed a problem affecting the Job Router and Condor-C. When jobs spool input files, they enter a temporary hold state, which could trigger actions by a naive periodic remove or release expression. Periodic expressions are no longer evaluated when in this temporary hold state, which has the hold reason "Spooling input data files".
  • The example init script condor.boot.generic erroneously claimed that the condor_master would begin sending SIGKILL to child processes after 20 seconds if SIGQUIT (the fast shutdown) failed. The condor_master will actually wait $(SHUTDOWN_FAST_TIMEOUT) seconds, a value that currently defaults to 300 seconds.
  • Environment variable names are now properly treated as case-insensitive on Windows. The most common symptom of this bug was the inability to specify a custom PATH environment variable for a job from its submit description file.
  • Changed condor_submit -debug to issue a warning when ignoring environment variables. This occurs with getenv = True set in a submit description file.
  • Fixed a long-standing memory leak in SOAP interface. This caused the leak of a few hundred bytes of memory for each connection. This could eventually have caused the condor_schedd daemon to crash.
  • Fixed Job Router hooks so that their output is properly propagated where appropriate.
  • Implemented a fix for the condor_startd that prevents it from crashing if the user specified the configuration variable NUM_SLOTS_TYPE_N, without also specifying SLOT_TYPE_N.
  • The sample configuration files now correctly set the default universe to vanilla. This default has been true since 7.2.0, but was not reflected in the sample configuration files.
  • Fixed a bug that incorrectly set the value of the job ClassAd attribute RequestMemory to be 1024 times its correct size due to a mismatch in units; the attribute RequestMemory is given in Mbytes, while the attribute ImageSize is given in Kbytes.
  • Fixed a memory leak in condor_dagman that leaked a small amount of memory for each job submitted.
  • Fixed a bug that was causing the network mask to be advertised as a Condor sinful string, rather than a dotted-quad.
  • Fixed a handle leak in the condor_procd on Windows.
  • Additions and Changes to the Manual:
  • Added a FAQ entry for Windows describing how machines with miss-configured performance counters may cause the condor_procd to crash.
  • Added a manual page for the command condor_router_history.