MySQL Cluster Changelog

What's new in MySQL Cluster 7.6.7

Jul 27, 2018
  • Bugs Fixed":
  • ndbinfo Information Database: It was possible following a restart for (sometimes incomplete) fallback data to be used in populating the ndbinfo.processes table, which could lead to rows in this table with empty process_name values. Such fallback data is no longer used for this purpose. (Bug #27985339)
  • NDB Client Programs: The executable file host_info is no longer used by ndb_setup.py. This file, along with its parent directory share/mcc/host_info, has been removed from the NDB Cluster distribution. In addition, installer code relating to an unused dojo.zip file was removed. (Bug #90743, Bug #27966467, Bug #27967561)
  • MySQL NDB ClusterJ: ClusterJ could not be built from source using JDK 9. (Bug #27977985)
  • Local checkpoints did not always handle DROP TABLE operations correctly. (Bug #27926532)
  • During the execution of CREATE TABLE ... IF NOT EXISTS, the internal open_table() function calls ha_ndbcluster::get_default_num_partitions() implicitly whenever open_table() finds out that the requested table already exists. In certain cases, get_default_num_partitions() was called without the associated thd_ndb object being initialized, leading to failure of the statement with MySQL error 157 Could not connect to storage engine. Now get_default_num_partitions() always checks for the existence of this thd_ndb object, and initializes it if necessary.
  • Functionality Added or Changed:
  • As part of ongoing work to improve handling of local checkpoints and minimize the occurrence of issues relating to Error 410 (REDO log overloaded) during LCPs, NDB now implements adaptive LCP control, which moderates LCP scan priority and LCP writes according to redo log usage.

New in MySQL Cluster 7.6.6 (Jun 6, 2018)

  • Functionality Added or Changed:
  • When performing an NDB backup, the ndbinfo.logbuffers table now displays information regarding buffer usage by the backup process on each data node. This is implemented as rows reflecting two new log types in addition to REDO and DD-UNDO. One of these rows has the log type BACKUP-DATA, which shows the amount of data buffer used during backup to copy fragments to backup files. The other row has the log type BACKUP-LOG, which displays the amount of log buffer used during the backup to record changes made after the backup has started. One each of these log_type rows is shown in the logbuffers table for each data node in the cluster. Rows having these two log types are present in the table only while an NDB backup is currently in progress. (Bug #25822988)
  • Added the --logbuffer-size option for ndbd and ndbmtd, for use in debugging with a large number of log messages. This controls the size of the data node log buffer; the default (32K) is intended for normal operations. (Bug #89679, Bug #27550943)
  • The previously experimental shared memory (SHM) transporter is now supported in production. SHM works by transporting signals through writing them into memory, rather than on a socket. NDB already attempts to use SHM automatically between data nodes and API nodes sharing the same host. To enable explicit shared memory connections, set the UseShm SHM configuration parameter to true. When explicitly defining shared memory as the connection method, it is also necessary to identify the nodes at either end of the connection (NodeId1 and NodeId2 parameters), and to provide a shared memory key (ShmKey). In addition, to improve performance, it is also possible to set a spin time (ShmSpinTime) for the SHM transporter.
  • Configuration of SHM is otherwise similar to that of the TCP transporter. NDB Cluster Shared-Memory Connections, provides additional information.
  • The SPJ kernel block now takes into account when it is evaluating a join request in which at least some of the tables are used in inner joins. This means that SPJ can eliminate requests for rows or ranges as soon as it becomes known that a preceding request did not return any results for a parent row. This saves both the data nodes and the SPJ block from having to handle requests and result rows which never take part in a result row from an inner join.
  • The poll receiver which NDB uses to read from sockets, execute messages from the sockets, and wake up other threads now offloads wakeup of other threads to a new thread that wakes up the other threads on request, and otherwise simply sleeps. This improves the scalability of a single cluster connection by keeping the receive thread from becoming overburdened by tasks including wakeup of other threads.
  • Bugs Fixed:
  • Important Change; NDB Client Programs: ndb_top ignored short forms of command-line options, and did not in all cases handle misformed long options correctly.
  • References: See also: Bug #88236, Bug #20733646.
  • NDB Cluster APIs: A previous fix for an issue, in which the failure of multiple data nodes during a partial restart could cause API nodes to fail, did not properly check the validity of the associated NdbReceiver object before proceeding. Now in such cases an invalid object triggers handling for invalid signals, rather than a node failure. (Bug #25902137)
  • References: This issue is a regression of: Bug #25092498.
  • NDB Cluster APIs: Incorrect results, usually an empty result set, were returned when setBound() was used to specify a NULL bound. This issue appears to have been caused by a problem in gcc, limited to cases using the old version of this method (which does not employ NdbRecord), and is fixed by rewriting the problematic internal logic in the old implementation. (Bug #89468, Bug #27461752)
  • NDB Cluster APIs: Released NDB API objects are kept in one or more Ndb_free_list structures for later reuse. Each list also keeps track of all objects seized from it, and makes sure that these are eventually released back to it. In the event that the internal function NdbScanOperation::init() failed, it was possible for an NdbApiSignal already allocated by the NdbOperation to be leaked. Now in such cases, NdbScanOperation::release() is called to release any objects allocated by the failed NdbScanOperation before it is returned to the free list.
  • This fix also handles a similar issue with NdbOperation::init(), where a failed call could also leak a signal. (Bug #89249, Bug #27389894)
  • NDB Client Programs: ndb_top did not support a number of options common to most NDB programs.
  • In addition, ndb_top now supports a --socket option (short form -S) for specifying a socket file to use for the connection. (Bug #86614, Bug #26236298)
  • MySQL NDB ClusterJ: ClusterJ quit unexpectedly as there was no error handling in the scanIndex() function of the ClusterTransactionImpl class for a null returned to it internally by the scanIndex() method of the ndbTransaction class. (Bug #27297681, Bug #88989)
  • In some circumstances, when a transaction was aborted in the DBTC block, there remained links to trigger records from operation records which were not yet reference-counted, but when such an operation record was released the trigger reference count was still decremented. (Bug #27629680)
  • An NDB online backup consists of data, which is fuzzy, and a redo and undo log. To restore to a consistent state it is necessary to ensure that the log contains all of the changes spanning the capture of the fuzzy data portion and beyond to a consistent snapshot point. This is achieved by waiting for a GCI boundary to be passed after the capture of data is complete, but before stopping change logging and recording the stop GCI in the backup's metadata.
  • At restore time, the log is replayed up to the stop GCI, restoring the system to the state it had at the consistent stop GCI. A problem arose when, under load, it was possible to select a GCI boundary which occurred too early and did not span all the data captured. This could lead to inconsistencies when restoring the backup; these could be be noticed as broken constraints or corrupted BLOB entries.
  • Now the stop GCI is chosen is so that it spans the entire duration of the fuzzy data capture process, so that the backup log always contains all data within a given stop GCI. (Bug #27497461)
  • References: See also: Bug #27566346.
  • For NDB tables, when a foreign key was added or dropped as a part of a DDL statement, the foreign key metatdata for all parent tables referenced should be reloaded in the handler on all SQL nodes connected to the cluster, but this was done only on the mysqld on which the statement was executed. Due to this, any subsequent queries relying on foreign key metadata from the corresponding parent tables could return inconsistent results. (Bug #27439587)
  • References: See also: Bug #82989, Bug #24666177.
  • ANALYZE TABLE used excessive amounts of CPU on large, low-cardinality tables. (Bug #27438963)
  • Queries using very large lists with IN were not handled correctly, which could lead to data node failures. (Bug #27397802)
  • A data node overload could in some situations lead to an unplanned shutdown of the data node, which led to all data nodes disconnecting from the management and nodes.
  • This was due to a situation in which API_FAILREQ was not the last received signal prior to the node failure.
  • As part of this fix, the transaction coordinator's handling of SCAN_TABREQ signals for an ApiConnectRecord in an incorrect state was also improved. (Bug #27381901)
  • References: See also: Bug #47039, Bug #11755287.
  • In a two-node cluster, when the node having the lowest ID was started using --nostart, API clients could not connect, failing with Could not alloc node id at HOST port PORT_NO: No free node id found for mysqld(API). (Bug #27225212)
  • Changing MaxNoOfExecutionThreads without an initial system restart led to an unplanned data node shutdown. (Bug #27169282)
  • References: This issue is a regression of: Bug #26908347, Bug #26968613.
  • Race conditions sometimes occurred during asynchronous disconnection and reconnection of the transporter while other threads concurrently inserted signal data into the send buffers, leading to an unplanned shutdown of the cluster.
  • As part of the work fixing this issue, the internal templating function used by the Transporter Registry when it prepares a send is refactored to use likely-or-unlikely logic to speed up execution, and to remove a number of duplicate checks for NULL. (Bug #24444908, Bug #25128512)
  • References: See also: Bug #20112700.
  • ndb_restore sometimes logged data file and log file progress values much greater than 100%. (Bug #20989106)
  • The internal function BitmaskImpl::setRange() set one bit fewer than specified. (Bug #90648, Bug #27931995)
  • It was not possible to create an NDB table using PARTITION_BALANCE set to FOR_RA_BY_LDM_X_2, FOR_RA_BY_LDM_X_3, or FOR_RA_BY_LDM_X_4. (Bug #89811, Bug #27602352)
  • References: This issue is a regression of: Bug #81759, Bug #23544301.
  • Adding a [tcp] or [shm] section to the global configuration file for a cluster with multiple data nodes caused default TCP connections to be lost to the node using the single section. (Bug #89627, Bug #27532407)
  • As a result of the reuse of code intended for send threads when performing an assist send, all of the local release send buffers were released to the global pool, which caused the intended level of the local send buffer pool to be ignored. Now send threads and assisting worker threads follow their own policies for maintaining their local buffer pools. (Bug #89119, Bug #27349118)
  • When sending priority A signals, we now ensure that the number of pending signals is explicitly initialized. (Bug #88986, Bug #27294856)
  • In a MySQL Cluster with one MySQL Server configured to write a binary log failure occurred when creating and using an NDB table with non-stored generated columns. The problem arose only when the product was built with debugging support. (Bug #86084, Bug #25957586)
  • ndb_restore --print_data --hex did not print trailing 0s of LONGVARBINARY values. (Bug #65560, Bug #14198580)
  • When the internal function ha_ndbcluster::copy_fk_for_offline_alter() checked dependent objects on a table from which it was supposed to drop a foreign key, it did not perform any filtering for foreign keys, making it possible for it to attempt retrieval of an index or trigger instead, leading to a spurious Error 723 (No such table).

New in MySQL Cluster 7.6.6 Dev (Jun 1, 2018)

  • Functionality Added or Changed:
  • When performing an NDB backup, the ndbinfo.logbuffers table now displays information regarding buffer usage by the backup process on each data node. This is implemented as rows reflecting two new log types in addition to REDO and DD-UNDO. One of these rows has the log type BACKUP-DATA, which shows the amount of data buffer used during backup to copy fragments to backup files. The other row has the log type BACKUP-LOG, which displays the amount of log buffer used during the backup to record changes made after the backup has started. One each of these log_type rows is shown in the logbuffers table for each data node in the cluster. Rows having these two log types are present in the table only while an NDB backup is currently in progress. (Bug #25822988)
  • Added the --logbuffer-size option for ndbd and ndbmtd, for use in debugging with a large number of log messages. This controls the size of the data node log buffer; the default (32K) is intended for normal operations. (Bug #89679, Bug #27550943)
  • The previously experimental shared memory (SHM) transporter is now supported in production. SHM works by transporting signals through writing them into memory, rather than on a socket. NDB already attempts to use SHM automatically between data nodes and API nodes sharing the same host. To enable explicit shared memory connections, set the UseShm SHM configuration parameter to true. When explicitly defining shared memory as the connection method, it is also necessary to identify the nodes at either end of the connection (NodeId1 and NodeId2 parameters), and to provide a shared memory key (ShmKey). In addition, to improve performance, it is also possible to set a spin time (ShmSpinTime) for the SHM transporter.
  • Configuration of SHM is otherwise similar to that of the TCP transporter. NDB Cluster Shared-Memory Connections, provides additional information.
  • The SPJ kernel block now takes into account when it is evaluating a join request in which at least some of the tables are used in inner joins. This means that SPJ can eliminate requests for rows or ranges as soon as it becomes known that a preceding request did not return any results for a parent row. This saves both the data nodes and the SPJ block from having to handle requests and result rows which never take part in a result row from an inner join.Note
  • When upgrading from NDB 7.6.5 or earlier, you should be aware that this optimization depends on both API client and data node functionality, and so is not available until all of these have been upgraded.
  • The poll receiver which NDB uses to read from sockets, execute messages from the sockets, and wake up other threads now offloads wakeup of other threads to a new thread that wakes up the other threads on request, and otherwise simply sleeps. This improves the scalability of a single cluster connection by keeping the receive thread from becoming overburdened by tasks including wakeup of other threads.
  • Bugs Fixed:
  • Important Change; NDB Client Programs: ndb_top ignored short forms of command-line options, and did not in all cases handle misformed long options correctly.
  • NDB Cluster APIs: A previous fix for an issue, in which the failure of multiple data nodes during a partial restart could cause API nodes to fail, did not properly check the validity of the associated NdbReceiver object before proceeding. Now in such cases an invalid object triggers handling for invalid signals, rather than a node failure. (Bug #25902137)
  • References: This issue is a regression of: Bug #25092498.
  • NDB Cluster APIs: Incorrect results, usually an empty result set, were returned when setBound() was used to specify a NULL bound. This issue appears to have been caused by a problem in gcc, limited to cases using the old version of this method (which does not employ NdbRecord), and is fixed by rewriting the problematic internal logic in the old implementation. (Bug #89468, Bug #27461752)
  • NDB Cluster APIs: Released NDB API objects are kept in one or more Ndb_free_list structures for later reuse. Each list also keeps track of all objects seized from it, and makes sure that these are eventually released back to it. In the event that the internal function NdbScanOperation::init() failed, it was possible for an NdbApiSignal already allocated by the NdbOperation to be leaked. Now in such cases, NdbScanOperation::release() is called to release any objects allocated by the failed NdbScanOperation before it is returned to the free list.
  • This fix also handles a similar issue with NdbOperation::init(), where a failed call could also leak a signal. (Bug #89249, Bug #27389894)
  • NDB Client Programs: ndb_top did not support a number of options common to most NDB programs. In addition, ndb_top now supports a --socket option (short form -S) for specifying a socket file to use for the connection. (Bug #86614, Bug #26236298)
  • MySQL NDB ClusterJ: ClusterJ quit unexpectedly as there was no error handling in the scanIndex() function of the ClusterTransactionImpl class for a null returned to it internally by the scanIndex() method of the ndbTransaction class. (Bug #27297681, Bug #88989)
  • In some circumstances, when a transaction was aborted in the DBTC block, there remained links to trigger records from operation records which were not yet reference-counted, but when such an operation record was released the trigger reference count was still decremented. (Bug #27629680)
  • An NDB online backup consists of data, which is fuzzy, and a redo and undo log. To restore to a consistent state it is necessary to ensure that the log contains all of the changes spanning the capture of the fuzzy data portion and beyond to a consistent snapshot point. This is achieved by waiting for a GCI boundary to be passed after the capture of data is complete, but before stopping change logging and recording the stop GCI in the backup's metadata.
  • At restore time, the log is replayed up to the stop GCI, restoring the system to the state it had at the consistent stop GCI. A problem arose when, under load, it was possible to select a GCI boundary which occurred too early and did not span all the data captured. This could lead to inconsistencies when restoring the backup; these could be be noticed as broken constraints or corrupted BLOB entries.
  • Now the stop GCI is chosen is so that it spans the entire duration of the fuzzy data capture process, so that the backup log always contains all data within a given stop GCI. (Bug #27497461)
  • References: See also: Bug #27566346.
  • For NDB tables, when a foreign key was added or dropped as a part of a DDL statement, the foreign key metatdata for all parent tables referenced should be reloaded in the handler on all SQL nodes connected to the cluster, but this was done only on the mysqld on which the statement was executed. Due to this, any subsequent queries relying on foreign key metadata from the corresponding parent tables could return inconsistent results. (Bug #27439587)
  • References: See also: Bug #82989, Bug #24666177.
  • ANALYZE TABLE used excessive amounts of CPU on large, low-cardinality tables. (Bug #27438963)
  • Queries using very large lists with IN were not handled correctly, which could lead to data node failures. (Bug #27397802)
  • A data node overload could in some situations lead to an unplanned shutdown of the data node, which led to all data nodes disconnecting from the management and nodes.
  • This was due to a situation in which API_FAILREQ was not the last received signal prior to the node failure.
  • As part of this fix, the transaction coordinator's handling of SCAN_TABREQ signals for an ApiConnectRecord in an incorrect state was also improved. (Bug #27381901)
  • References: See also: Bug #47039, Bug #11755287.
  • In a two-node cluster, when the node having the lowest ID was started using --nostart, API clients could not connect, failing with Could not alloc node id at HOST port PORT_NO: No free node id found for mysqld(API). (Bug #27225212)
  • Changing MaxNoOfExecutionThreads without an initial system restart led to an unplanned data node shutdown. (Bug #27169282)
  • References: This issue is a regression of: Bug #26908347, Bug #26968613.
  • Race conditions sometimes occurred during asynchronous disconnection and reconnection of the transporter while other threads concurrently inserted signal data into the send buffers, leading to an unplanned shutdown of the cluster.
  • As part of the work fixing this issue, the internal templating function used by the Transporter Registry when it prepares a send is refactored to use likely-or-unlikely logic to speed up execution, and to remove a number of duplicate checks for NULL. (Bug #24444908, Bug #25128512)
  • References: See also: Bug #20112700.
  • ndb_restore sometimes logged data file and log file progress values much greater than 100%. (Bug #20989106)
  • The internal function BitmaskImpl::setRange() set one bit fewer than specified. (Bug #90648, Bug #27931995)
  • It was not possible to create an NDB table using PARTITION_BALANCE set to FOR_RA_BY_LDM_X_2, FOR_RA_BY_LDM_X_3, or FOR_RA_BY_LDM_X_4. (Bug #89811, Bug #27602352)
  • References: This issue is a regression of: Bug #81759, Bug #23544301.
  • Adding a [tcp] or [shm] section to the global configuration file for a cluster with multiple data nodes caused default TCP connections to be lost to the node using the single section. (Bug #89627, Bug #27532407)
  • As a result of the reuse of code intended for send threads when performing an assist send, all of the local release send buffers were released to the global pool, which caused the intended level of the local send buffer pool to be ignored. Now send threads and assisting worker threads follow their own policies for maintaining their local buffer pools. (Bug #89119, Bug #27349118)
  • When sending priority A signals, we now ensure that the number of pending signals is explicitly initialized. (Bug #88986, Bug #27294856)
  • In a MySQL Cluster with one MySQL Server configured to write a binary log failure occurred when creating and using an NDB table with non-stored generated columns. The problem arose only when the product was built with debugging support. (Bug #86084, Bug #25957586)
  • ndb_restore --print_data --hex did not print trailing 0s of LONGVARBINARY values. (Bug #65560, Bug #14198580)
  • When the internal function ha_ndbcluster::copy_fk_for_offline_alter() checked dependent objects on a table from which it was supposed to drop a foreign key, it did not perform any filtering for foreign keys, making it possible for it to attempt retrieval of an index or trigger instead, leading to a spurious Error 723 (No such table).

New in MySQL Cluster 7.6.5 Dev (Apr 21, 2018)

  • Bugs Fixed:
  • NDB Client Programs: On Unix platforms, the Auto-Installer failed to stop the cluster when ndb_mgmd was installed in a directory other than the default. (Bug #89624, Bug #27531186)
  • NDB Client Programs: The Auto-Installer did not provide a mechanism for setting the ServerPort parameter. (Bug #89623, Bug #27539823)
  • Writing of LCP control files was not always done correctly, which in some cases could lead to an unplanned shutdown of the cluster. This fix adds the requirement that upgrades from NDB 7.6.4 (or earlier) to this release (or a later one) include initial node restarts. (Bug #26640486)

New in MySQL Cluster 7.5.10 (Apr 20, 2018)

  • Bugs Fixed:
  • NDB Cluster APIs: A previous fix for an issue, in which the failure of multiple data nodes during a partial restart could cause API nodes to fail, did not properly check the validity of the associated NdbReceiver object before proceeding. Now in such cases an invalid object triggers handling for invalid signals, rather than a node failure. (Bug #25902137)
  • NDB Cluster APIs: Incorrect results, usually an empty result set, were returned when setBound() was used to specify a NULL bound. This issue appears to have been caused by a problem in gcc, limited to cases using the old version of this method (which does not employ NdbRecord), and is fixed by rewriting the problematic internal logic in the old implementation. (Bug #89468, Bug #27461752)
  • In some circumstances, when a transaction was aborted in the DBTC block, there remained links to trigger records from operation records which were not yet reference-counted, but when such an operation record was released the trigger reference count was still decremented. (Bug #27629680)
  • ANALYZE TABLE used excessive amounts of CPU on large, low-cardinality tables. (Bug #27438963)
  • Queries using very large lists with IN were not handled correctly, which could lead to data node failures. (Bug #27397802) A data node overload could in some situations lead to an unplanned shutdown of the data node, which led to all data nodes disconnecting from the management and nodes. This was due to a situation in which API_FAILREQ was not the last received signal prior to the node failure. As part of this fix, the transaction coordinator's handling of SCAN_TABREQ signals for an ApiConnectRecord in an incorrect state was also improved. (Bug #27381901)
  • In a two-node cluster, when the node having the lowest ID was started using --nostart, API clients could not connect, failing with Could not alloc node id at HOST port PORT_NO: No free node id found for mysqld(API). (Bug #27225212)
  • Race conditions sometimes occurred during asynchronous disconnection and reconnection of the transporter while other threads concurrently inserted signal data into the send buffers, leading to an unplanned shutdown of the cluster. As part of the work fixing this issue, the internal templating function used by the Transporter Registry when it prepares a send is refactored to use likely-or-unlikely logic to speed up execution, and to remove a number of duplicate checks for NULL. (Bug #24444908, Bug #25128512)
  • ndb_restore sometimes logged data file and log file progress values much greater than 100%. (Bug #20989106)
  • As a result of the reuse of code intended for send threads when performing an assist send, all of the local release send buffers were released to the global pool, which caused the intended level of the local send buffer pool to be ignored. Now send threads and assisting worker threads follow their own policies for maintaining their local buffer pools. (Bug #89119, Bug #27349118)
  • When sending priority A signals, we now ensure that the number of pending signals is explicitly initialized. (Bug #88986, Bug #27294856)
  • ndb_restore --print_data --hex did not print trailing 0s of LONGVARBINARY values. (Bug #65560, Bug #14198580)

New in MySQL Cluster 7.5.9 (Jan 17, 2018)

  • Bugs Fixed:
  • NDB Replication: On an SQL node not being used for a replication channel with sql_log_bin=0 it was possible after creating and populating an NDB table for a table map event to be written to the binary log for the created table with no corresponding row events. This led to problems when this log was later used by a slave cluster replicating from the mysqld where this table was created.
  • A query against the INFORMATION_SCHEMA.FILES table returned no results when it included an ORDER BY clause. (Bug #26877788)
  • During a restart, DBLQH loads redo log part metadata for each redo log part it manages, from one or more redo log files. Since each file has a limited capacity for metadata, the number of files which must be consulted depends on the size of the redo log part. These files are opened, read, and closed sequentially, but the closing of one file occurs concurrently with the opening of the next.
  • In certain circumstances where multiple Ndb objects were being used in parallel from an API node, the block number extracted from a block reference in DBLQH was the same as that of a SUMA block even though the request was coming from an API node. Due to this ambiguity, DBLQH mistook the request from the API node for a request from a SUMA block and failed. This is fixed by checking node IDs before checking block numbers. (Bug #88441, Bug #27130570)
  • When the duplicate weedout algorithm was used for evaluating a semi-join, the result had missing rows. (Bug #88117, Bug #26984919)
  • A table used in a loose scan could be used as a child in a pushed join query, leading to possibly incorrect results. (Bug #87992, Bug #26926666)
  • When representing a materialized semi-join in the query plan, the MySQL Optimizer inserted extra QEP_TAB and JOIN_TAB objects to represent access to the materialized subquery result. The join pushdown analyzer did not properly set up its internal data structures for these, leaving them uninitialized instead. This meant that later usage of any item objects referencing the materialized semi-join accessed an initialized tableno column when accessing a 64-bit tableno bitmask, possibly referring to a point beyond its end, leading to an unplanned shutdown of the SQL node. (Bug #87971, Bug #26919289)
  • When a data node was configured for locking threads to CPUs, it failed during startup with Failed to lock tid. This was is a side effect of a fix for a previous issue, which disabled CPU locking based on the version of the available glibc. The specific glibc issue being guarded against is encountered only in response to an internal NDB API call (Ndb_UnlockCPU()) not used by data nodes (and which can be accessed only through internal API calls). The current fix enables CPU locking for data nodes and disables it only for the relevant API calls when an affected glibc version is used. (Bug #87683, Bug #26758939)
  • The NDBFS block's OM_SYNC flag is intended to make sure that all FSWRITEREQ signals used for a given file are synchronized, but was ignored by platforms that do not support O_SYNC, meaning that this feature did not behave properly on those platforms. Now the synchronization flag is used on those platforms that do not support O_SYNC. (Bug #76975, Bug #21049554)

New in MySQL Cluster 7.5.8 (Oct 18, 2017)

  • Bugs Fixed:
  • Replication: With GTIDs generated for incident log events, MySQL error code 1590 (ER_SLAVE_INCIDENT) could not be skipped using the --slave-skip-errors=1590 startup option on a replication slave. (Bug #26266758)
  • Errors in parsing NDB_TABLE modifiers could cause memory leaks. (Bug #26724559)
  • Added DUMP code 7027 to facilitate testing of issues relating to local checkpoints. For more information, see DUMP 7027. (Bug #26661468)
  • A previous fix intended to improve logging of node failure handling in the transaction coordinator included logging of transactions that could occur in normal operation, which made the resulting logs needlessly verbose. Such normal transactions are no longer written to the log in such cases. (Bug #26568782)
  • Due to a configuration file error, CPU locking capability was not available on builds for Linux platforms. (Bug #26378589)
  • Some DUMP codes used for the LGMAN kernel block were incorrectly assigned numbers in the range used for codes belonging to DBTUX. These have now been assigned symbolic constants and numbers in the proper range (10001, 10002, and 10003). (Bug #26365433)
  • Node failure handling in the DBTC kernel block consists of a number of tasks which execute concurrently, and all of which must complete before TC node failure handling is complete. This fix extends logging coverage to record when each task completes, and which tasks remain.
  • Error 899 Rowid already allocated... was reported under certain conditions involving high loads on a cluster having 4 or more data nodes. This occurred when a row ID was available at the primary but not available at the backup; a transaction that seized the row ID at the primary failed to seize it at the backup since the ID was unavailable.
  • NDB Cluster did not compile successfully when the build used WITH_UNIT_TESTS=OFF. (Bug #86881, Bug #26375985)
  • A potential hundredfold signal fan-out when sending a START_FRAG_REQ signal could lead to a node failure due to a job buffer full error in start phase 5 while trying to perform a local checkpoint during a restart. (Bug #86675, Bug #26263397)
  • Compilation of NDB Cluster failed when using -DWITHOUT_SERVER=1 to build only the client libraries. (Bug #85524, Bug #25741111)

New in MySQL Cluster 7.5.6 (Apr 11, 2017)

  • Bugs Fixed:
  • Partitioning: The output of EXPLAIN PARTITIONS displayed incorrect values in the partitions column when run on an explicitly partitioned NDB table having a large number of partitions.
  • CPU usage of the data node's main thread by the DBDIH master block as the end of a local checkpoint could approach 100% in certain cases where the database had a very large number of fragment replicas. This is fixed by reducing the frequency and range of fragment queue checking during an LCP. (Bug #25443080)
  • The ndb_print_backup_file utility failed when attempting to read from a backup file when the backup included a table having more than 500 columns. (Bug #25302901)
  • Multiple data node failures during a partial restart of the cluster could cause API nodes to fail. This was due to expansion of an internal object ID map by one thread, thus changing its location in memory, while another thread was still accessing the old location, leading to a segmentation fault in the latter thread. The internal map() and unmap() functions in which this issue arose have now been made thread-safe. (Bug #25092498)
  • During the initial phase of a scan request, the DBTC kernel block sends a series of DIGETNODESREQ signals to the DBDIH block in order to obtain dictionary information for each fragment to be scanned. If DBDIH returned DIGETNODESREF, the error code from that signal was not read, and Error 218 Out of LongMessageBuffer was always returned instead. Now in such cases, the error code from the DIGETNODESREF signal is actually used. (Bug #85225, Bug #25642405)
  • There existed the possibility of a race condition between schema operations on the same database object originating from different SQL nodes; this could occur when one of the SQL nodes was late in releasing its metadata lock on the affected schema object or objects in such a fashion as to appear to the schema distribution coordinator that the lock release was acknowledged for the wrong schema change. This could result in incorrect application of the schema changes on some or all of the SQL nodes or a timeout with repeated waiting max ### sec for distributing... messages in the node logs due to failure of the distribution protocol. (Bug #85010, Bug #25557263)
  • When a foreign key was added to or dropped from an NDB table using an ALTER TABLE statement, the parent table's metadata was not updated, which made it possible to execute invalid alter operations on the parent afterwards.
  • Until you can upgrade to this release, you can work around this problem by running SHOW CREATE TABLE on the parent immediately after adding or dropping the foreign key; this statement causes the table's metadata to be reloaded. (Bug #82989, Bug #24666177)
  • Transactions on NDB tables with cascading foreign keys returned inconsistent results when the query cache was also enabled, due to the fact that mysqld was not aware of child table updates. This meant that results for a later SELECT from the child table were fetched from the query cache, which at that point contained stale data. This is fixed in such cases by adding all children of the parent table to an internal list to be checked by NDB for updates whenever the parent is updated, so that mysqld is now properly informed of any updated child tables that should be invalidated from the query cache. (Bug #81776, Bug #23553507)
  • NDB Disk Data: Stale data from NDB Disk Data tables that had been dropped could potentially be included in backups due to the fact that disk scans were enabled for these. To prevent this possibility, disk scans are now disabled—as are other types of scans—when taking a backup. (Bug #84422, Bug #25353234)
  • NDB Disk Data: In some cases, setting dynamic in-memory columns of an NDB Disk Data table to NULL was not handled correctly. (Bug #79253, Bug #22195588)
  • NDB Cluster APIs: When signals were sent while the client process was receiving signals such as SUB_GCP_COMPLETE_ACK and TC_COMMIT_ACK, these signals were temporary buffered in the send buffers of the clients which sent them. If not explicitly flushed, the signals remained in these buffers until the client woke up again and flushed its buffers. Because there was no attempt made to enforce an upper limit on how long the signal could remain unsent in the local client buffers, this could lead to timeouts and other misbehavior in the components waiting for these signals. In addition, the fix for a previous, related issue likely made this situation worse by removing client wakeups during which the client send buffers could have been flushed. The current fix moves responsibility for flushing messages sent by the receivers, to the receiver (poll_owner client). This means that it is no longer necessary to wake up all clients merely to have them flush their buffers. Instead, the poll_owner client (which is already running) performs flushing the send buffer of whatever was sent while delivering signals to the recipients. (Bug #22705935)
  • CMake now detects whether a GCC 5.3.0 loop optimization bug occurs and attempts a workaround if so. (Bug #25253540)

New in MySQL Cluster 7.5.5 (Jan 18, 2017)

  • Bugs Fixed:
  • Packaging: NDB Cluster Auto-Installer RPM packages for SLES 12 failed due to a dependency on python2-crypto instead of python-pycrypto. (Bug #25399608)
  • Packaging: The RPM installer for the MySQL Cluster auto-installer package had a dependency on python2-crypt instead of python-crypt. (Bug #24924607)
  • Microsoft Windows: Installation failed when the Auto-Installer (ndb_setup.py) was run on a Windows host that used Swedish as the system language. This was due to system messages being issued using the cp1252 character set; when these messages contained characters that did not map directly to 7-bit ASCII (such as the ä character in Tjänsten ... startar), conversion to UTF-8—as expected by the Auto-Installer web client—failed.
  • This fix has been tested only with Swedish as the system language, but should work for Windows systems set to other European languages that use the cp1252 character set. (Bug #83870, Bug #25111830)
  • ndb_restore did not restore tables having more than 341 columns correctly. This was due to the fact that the buffer used to hold table metadata read from .ctl files was of insufficient size, so that only part of the table descriptor could be read from it in such cases. This issue is fixed by increasing the size of the buffer used by ndb_restore for file reads. (Bug #25182956)
  • No traces were written when ndbmtd received a signal in any thread other than the main thread, due to the fact that all signals were blocked for other threads. This issue is fixed by the removal of SIGBUS, SIGFPE, SIGILL, and SIGSEGV signals from the list of signals being blocked. (Bug #25103068)
  • The rand() function was used to produce a unique table ID and table version needed to identify a schema operation distributed between multiple SQL nodes, relying on the assumption that rand() would never produce the same numbers on two different instances of mysqld. It was later determined that this is not the case, and that in fact it is very likely for the same random numbers to be produced on all SQL nodes.
  • This fix removes the usage of rand() for producing a unique table ID or version, and instead uses a sequence in combination with the node ID of the coordinator. This guarantees uniqueness until the counter for the sequence wraps, which should be sufficient for this purpose.
  • The effects of this duplication could be observed as timeouts in the log (for example NDB create db: waiting max 119 sec for distributing) when restarting multiple mysqld processes simultaneously or nearly so, or when issuing the same CREATE DATABASE or DROP DATABASE statement on multiple SQL nodes. (Bug #24926009)
  • The ndb_show_tables utility did not display type information for hash maps or fully replicated triggers. (Bug #24383742)
  • Long message buffer exhaustion when firing immediate triggers could result in row ID leaks; this could later result in persistent RowId already allocated errors (NDB Error 899). (Bug #23723110)
  • The NDB Cluster Auto-Installer did not show the user how to force an exit from the application (CTRL+C). (Bug #84235, Bug #25268310)
  • The NDB Cluster Auto-Installer failed to exit when it was unable to start the associated service. (Bug #84234, Bug #25268278)
  • The NDB Cluster Auto-Installer failed when the port specified by the --port option (or the default port 8081) was already in use. Now in such cases, when the required port is not available, the next 20 ports are tested in sequence, with the first one available being used; only if all of these are in use does the Auto-Installer fail. (Bug #84233, Bug #25268221)
  • Multiples instances of the NDB Cluster Auto-Installer were not detected. This could lead to inadvertent multiple deployments on the same hosts, stray processes, and similar issues. This issue is fixed by having the Auto-Installer create a PID file (mcc.pid), which is removed upon a successful exit. (Bug #84232, Bug #25268121)
  • when a parent NDB table in a foreign key relationship was updated, the update cascaded to a child table as expected, but the change was not cascaded to a child table of this child table (that is, to a grandchild of the original parent).
  • Table child is a child of table parent; table grandchild is a child of table child, and a grandchild of parent. In this scenario, a change to column col1 of parent cascaded to ref1 in table child, but it was not always propagated in turn to ref2 in table grandchild. (Bug #83743, Bug #25063506)
  • The first of these could hold the mutex for long periods of time (on the order of 10ms), while locking it again extremely quickly. This could keep it from obtaining the lock for data access (“starved”) for unnecessarily great lengths of time. To address these problems, the injector mutex has been refactored into two—one to handle each of the two tasks just listed. It was also found that initialization of the binlog injector thread held the injector mutex in several places unnecessarily, when only local thread data was being initialized and sent signals with condition information when nothing being waited for was updated. These unneeded actions have been removed, along with numerous previous temporary fixes for related injector mutex starvation issues. (Bug #83676, Bug #83127, Bug #25042101, Bug #24715897)
  • When a data node running with StopOnError set to 0 underwent an unplanned shutdown, the automatic restart performed the same type of start as the previous one. In the case where the data node had previously been started with the --initial option, this meant that an initial start was performed, which in cases of multiple data node failures could lead to loss of data. This issue also occurred whenever a data node shutdown led to generation of a core dump. A check is now performed to catch all such cases, and to perform a normal restart instead.
  • In addition, in cases where a failed data node was unable prior to shutting down to send start phase information to the angel process, the shutdown was always treated as a startup failure, also leading to an initial restart. This issue is fixed by adding a check to execute startup failure handling only if a valid start phase was received from the client. (Bug #83510, Bug #24945638)
  • When ndbmtd was built on Solaris/SPARC with version 5.3 of the GNU tools, data nodes using the resulting binary failed during startup. (Bug #83500, Bug #24941880)
  • MySQL Cluster failed to compile using GCC 6. (Bug #83308, Bug #24822203)
  • When a data node was restarted, the node was first stopped, and then, after a fixed wait, the management server assumed that the node had entered the NOT_STARTED state, at which point, the node was sent a start signal. If the node was not ready because it had not yet completed stopping (and was therefore not actually in NOT_STARTED), the signal was silently ignored.
  • To fix this issue, the management server now checks to see whether the data node has in fact reached the NOT_STARTED state before sending the start signal.
  • CMake now avoids configuring the -fexpensive-optimizations option for GCC versions for which the option triggers faulty shift-or optimizations. (Bug #24947597, Bug #83517)
  • The NDB binlog injector thread used an injector mutex to perform two important tasks:
  • Protect against client threads creating or dropping events whenever the injector thread waited for pollEvents().
  • Maintain access to data shared by the injector thread with client threads.

New in MySQL Cluster 7.5.3 RC (Jul 8, 2016)

  • FUNCTIONALITY ADDED OR CHANGED:
  • Added three new tables to the ndbinfo information database to provide running information about locks and lock attempts in an active MySQL Cluster. These tables, with brief descriptions, are listed here:
  • cluster_locks: Current lock requests which are waiting for or holding locks; this information can be useful when investigating stalls and deadlocks. Analogous to cluster_operations.
  • locks_per_fragment: Counts of lock claim requests, and their outcomes per fragment, as well as total time spent waiting for locks successfully and unsuccessfully. Analogous to operations_per_fragment and memory_per_fragment.
  • server_locks: Subset of cluster transactions—those running on the local mysqld, showing a connection ID per transaction. Analogous to server_operations.
  • Important Change: It is now possible to set READ_BACKUP for an existing table online using an SQL statement such as ALTER TABLE ... ALGORITHM=INPLACE, COMMENT="NDB_TABLE=READ_BACKUP=1". See Setting NDB_TABLE options in table comments, for further information about the READ_BACKUP option. (Bug #80858, Bug #23001617)
  • BUGS FIXED:
  • The ndbinfo cpustat_1sec and cpustat_20sec tables did not provide any history information. (Bug #23520271)
  • During shutdown, the mysqld process could sometimes hang after logging NDB Util: Stop ... NDB Util: Wakeup. (Bug #23343739)
  • During expansion or reduction of a hash table, allocating a new overflow page in the DBACC kernel block caused the data node to fail when it was out of index memory. This could sometimes occur after a large table had increased or decreased very rapidly in size. (Bug #23304519)
  • When tracking time elapsed from sending a SCAN_FRAGREQ signal to receiving a corresponding SCAN_FRAGCONF, the assumption was made in the DBTC kernel block that a SCAN_FRAGCONF can occur only after sending a SCAN_FRAGREQ or SCAN_NEXTREQ signal, which is not always the case: It it is actually possible that a local query handler can, immediately after sending a SCAN_FRAGCONF, send an additional SCAN_FRAGCONF signal upon reporting that the scan is closed. This problem is fixed by ensuring that the timer value is initialized each time before use. (Bug #81907, Bug #23605817, Bug #23306695)
  • Table indexes were listed in the output of ndb_desc in a nondeterministic order that could vary between platforms. Now these indexes are ordered by ID in the output. (Bug #81763, Bug #23547742)
  • Following a restart of the cluster, the first attempt to read from any of the ndbinfo cpustat, cpustat_50ms, cpustat_1sec, or cpustat_20sec tables generated a warning to the effect that columns were missing from the table. Subsequently, the thread_sleeping and spin_time columns were found to be missing from each of these tables. (Bug #81681, Bug #23514557)
  • Using a ThreadConfig parameter value with a trailing comma led to an assertion. (Bug #81588, Bug #23344374)
  • Cluster API: The scan lock takeover issued by NdbScanOperation::lockCurrentTuple() did not set the operation type for the takeover operation. (Bug #23314028)

New in MySQL Cluster 7.4.11 (Jun 1, 2016)

  • Functionality Added or Changed:
  • Cluster API: Added the Ndb::setEventBufferQueueEmptyEpoch() method, which makes it possible to enable queuing of empty events (event type TE_EMPTY). (Bug #22157845)
  • Bugs Fixed:
  • Important Change: The minimum value for the BackupDataBufferSize data node configuration parameter has been lowered from 2 MB to 512 KB. The default and maximum values for this parameter remain unchanged. (Bug #22749509)
  • Microsoft Windows: Performing ANALYZE TABLE on a table having one or more indexes caused ndbmtd to fail with an InvalidAttrInfo error due to signal corruption. This issue occurred consistently on Windows, but could also be encountered on other platforms. (Bug #77716, Bug #21441297)
  • During node failure handling, the request structure used to drive the cleanup operation was not maintained correctly when the request was executed. This led to inconsistencies that were harmless during normal operation, but these could lead to assertion failures during node failure handling, with subsequent failure of additional nodes. (Bug #22643129)
  • The previous fix for a lack of mutex protection for the internal TransporterFacade::deliver_signal() function was found to be incomplete in some cases. (Bug #22615274)
  • References: This issue is a regression of: Bug #77225, Bug #21185585.
  • Compilation of MySQL with Visual Studio 2015 failed in ConfigInfo.cpp, due to a change in Visual Studio's handling of spaces and concatenation. (Bug #22558836, Bug #80024)
  • When setup of the binary log as an atomic operation on one SQL node failed, this could trigger a state in other SQL nodes in which they appeared to detect the SQL node participating in schema change distribution, whereas it had not yet completed binary log setup. This could in turn cause a deadlock on the global metadata lock when the SQL node still retrying binary log setup needed this lock, while another mysqld had taken the lock for itself as part of a schema change operation. In such cases, the second SQL node waited for the first one to act on its schema distribution changes, which it was not yet able to do. (Bug #22494024)
  • Duplicate key errors could occur when ndb_restore was run on a backup containing a unique index. This was due to the fact that, during restoration of data, the database can pass through one or more inconsistent states prior to completion, such an inconsistent state possibly having duplicate values for a column which has a unique index. (If the restoration of data is preceded by a run with --disable-indexes and followed by one with --rebuild-indexes, these errors are avoided.)
  • Added a check for unique indexes in the backup which is performed only when restoring data, and which does not process tables that have explicitly been excluded. For each unique index found, a warning is now printed. (Bug #22329365)
  • Restoration of metadata with ndb_restore -m occasionally failed with the error message Failed to create index... when creating a unique index. While disgnosing this problem, it was found that the internal error PREPARE_SEIZE_ERROR (a temporary error) was reported as an unknown error. Now in such cases, ndb_restore retries the creation of the unique index, and PREPARE_SEIZE_ERROR is reported as NDB Error 748 Busy during read of event table. (Bug #21178339)
  • References: See also: Bug #22989944.
  • When setting up event logging for ndb_mgmd on Windows, MySQL Cluster tries to add a registry key to HKEY_LOCAL_MACHINE, which fails if the user does not have access to the registry. In such cases ndb_mgmd logged the error Could neither create or open key, which is not accurate and which can cause confusion for users who may not realize that file logging is available and being used. Now in such cases, ndb_mgmd logs a warning Could not create or access the registry key needed for the application to log to the Windows EventLog. Run the application with sufficient privileges once to create the key, or add the key manually, or turn off logging for that application. An error (as opposed to a warning) is now reported in such cases only if there is no available output at all for ndb_mgmd event logging. (Bug #20960839)
  • NdbDictionary metadata operations had a hard-coded 7-day timeout, which proved to be excessive for short-lived operations such as retrieval of table definitions. This could lead to unnecessary hangs in user applications which were difficult to detect and handle correctly. To help address this issue, timeout behaviour is modified so that read-only or short-duration dictionary interactions have a 2-minute timeout, while schema transactions of potentially long duration retain the existing 7-day timeout.
  • Such timeouts are intended as a safety net: In the event of problems, these return control to users, who can then take corrective action. Any reproducible issue with NdbDictionary timeouts should be reported as a bug. (Bug #20368354)
  • Optimization of signal sending by buffering and sending them periodically, or when the buffer became full, could cause SUB_GCP_COMPLETE_ACK signals to be excessively delayed. Such signals are sent for each node and epoch, with a minimum interval of TimeBetweenEpochs; if they are not received in time, the SUMA buffers can overflow as a result. The overflow caused API nodes to be disconnected, leading to current transactions being aborted due to node failure. This condition made it difficult for long transactions (such as altering a very large table), to be completed. Now in such cases, the ACK signal is sent without being delayed. (Bug #18753341)
  • An internal function used to validate connections failed to update the connection count when creating a new Ndb object. This had the potential to create a new Ndb object for every operation validating the connection, which could have an impact on performance, particularly when performing schema operations. (Bug #80750, Bug #22932982)
  • When an SQL node was started, and joined the schema distribution protocol, another SQL node, already waiting for a schema change to be distributed, timed out during that wait. This was because the code incorrectly assumed that the new SQL node would also acknowledge the schema distribution even though the new node joined too late to be a participant in it.
  • As part of this fix, printouts of schema distribution progress now always print the more significant part of a bitmask before the less significant; formatting of bitmasks in such printouts has also been improved. (Bug #80554, Bug #22842538)
  • MySQL Cluster did not compile correctly with Microsoft Visual Studio 2015, due to a change from previous versions in the VS implementation of the _vsnprintf() function. (Bug #80276, Bug #22670525)
  • When setting CPU spin time, the value was needlessly cast to a boolean internally, so that setting it to any nonzero value yielded an effective value of 1. This issue, as well as the fix for it, apply both to setting the SchedulerSpinTimer parameter and to setting spintime as part of a ThreadConfig parameter value. (Bug #80237, Bug #22647476)
  • Processing of local checkpoints was not handled correctly on Mac OS X, due to an uninitialized variable. (Bug #80236, Bug #22647462)
  • A logic error in an if statement in storage/ndb/src/kernel/blocks/dbacc/DbaccMain.cpp rendered useless a check for determining whether ZREAD_ERROR should be returned when comparing operations. This was detected when compiling with gcc using -Werror=logical-op. (Bug #80155, Bug #22601798)
  • References: This issue is a regression of: Bug #21285604.
  • The ndb_print_file utility failed consistently on Solaris 9 for SPARC. (Bug #80096, Bug #22579581)
  • Builds with the -Werror and -Wextra flags (as for release builds) failed on SLES 11. (Bug #79950, Bug #22539531)
  • When using CREATE INDEX to add an index on either of two NDB tables sharing circular foreign keys, the query succeeded but a temporary table was left on disk, breaking the foreign key constraints. This issue was also observed when attempting to create an index on a table in the middle of a chain of foreign keys—that is, a table having both parent and child keys, but on different tables. The problem did not occur when using ALTER TABLE to perform the same index creation operation; and subsequent analysis revealed unintended differences in the way such operations were performed by CREATE INDEX.
  • To fix this problem, we now make sure that operations performed by a CREATE INDEX statement are always handled internally in the same way and at the same time that the same operations are handled when performed by ALTER TABLE or DROP INDEX. (Bug #79156, Bug #22173891)
  • NDB failed to ignore index prefixes on primary and unique keys, causing CREATE TABLE and ALTER TABLE statements using them to be rejected. (Bug #78441, Bug #21839248)
  • Cluster API: Executing a transaction with an NdbIndexOperation based on an obsolete unique index caused the data node process to fail. Now the index is checked in such cases, and if it cannot be used the transaction fails with an appropriate error. (Bug #79494, Bug #22299443)
  • Integer overflow could occur during client handshake processing, leading to a server exit. (Bug #22722946)
  • For busy servers, client connection or communication failure could occur if an I/O-related system call was interrupted. The mysql_options() C API function now has a MYSQL_OPT_RETRY_COUNT option to control the number of retries for interrupted system calls. (Bug #22336527)

New in MySQL Cluster 7.5.1 M2 (Mar 18, 2016)

  • FUNCTIONALITY ADDED OR CHANGED:
  • Scans have been improved by replacing the DIH_SCAN_GET_NODES_REQ signal, formerly used for communication between the DBTC and DBDIH kernel blocks in NDB, with the DIGETNODESREQ signal, which supports direct execution and allows for the elimination of DIH_SCAN_GET_NODES_REF and DIH_SCAN_GET_NODES_CONF, as well as for DIH_SCAN_TAB_REQ and DIH_SCAN_TAB_COMPLETE_REP signals to EXECUTE_DIRECT. This enables enable higher scalability of data nodes when used for scan operations by decreasing the use of CPU resources for scan operations by an estimated five percent in some cases. This should also improve response times, which could help prevent issues with overload of the main threads. As part of these changes, scans made in the BACKUP kernel block have also been improved and made more efficient.
  • The following improvements have been made to event buffer reporting in the cluster log:
  • Each report now identifies the API node that sent it. 
The fields shown in the report have been improved and expanded. Percentages are now better specified and used only when appropriate. For improved clarity, the apply_epoch and latest_epoch fields have been renamed to latest_consumed_epoch and latest_buffered_epoch, respectively. The ID of the Ndb object serving as the source of the report is now shown, as is the reason for making the report (as the report_reason field). 
The frequency of unnecessary reporting has been reduced by limiting reports to those showing only significant changes in event buffer usage. 
The MGM API adds a new NDB_LE_EventBufferStatus2 event type to handle the additional information provided by the new event buffer reporting. The NDB_LE_EventBufferStatus event type used in older versions of MySQL Cluster is now deprecated, and will eventually be removed. For more information, see Event Buffer Reporting in the Cluster Log, as well as The ndb_logevent Structure. 

  • BUGS FIXED:
  • Important Change: The minimum value for the BackupDataBufferSize data node configuration parameter has been lowered from 2 MB to 512 KB. The default and maximum values for this parameter remain unchanged. (Bug #22749509)
During node failure handling, the request structure used to drive the cleanup operation was not maintained correctly when the request was executed. This led to inconsistencies that were harmless during normal operation, but these could lead to assertion failures during node failure handling, with subsequent failure of additional nodes. (Bug #22643129)
The previous fix for a lack of mutex protection for the internal TransporterFacade::deliver_signal() function was found to be incomplete in some cases. (Bug #22615274)
When setup of the binary log as an atomic operation one one SQL node failed, this could trigger a state in other SQL nodes in which they appeared to detect the SQL node participating in schema change distribution, whereas it had not yet completed binary log setup. This could in turn cause a deadlock on the global metadata lock when the SQL node still retrying binary log setup needed this lock, while another mysqld had taken the lock for itself as part of a schema change operation. In such cases, the second SQL node waited for the first one to act on its schema distribution changes, which it was not yet able to do. (Bug #22494024)
Duplicate key errors could occur when ndb_restore was run on a backup containing a unique index. This was due to the fact that, during restoration of data, the database can pass through one or more inconsistent states prior to completion, such an inconsistent state possibly having duplicate values for a column which has a unique index. (If the restoration of data is preceded by a run with --disable-indexes and followed by one with --rebuild-indexes, these errors are avoided.) Added a check for unique indexes in the backup which is performed only when restoring data, and which does not process tables that have explicitly been excluded. For each unique index found, a warning is now printed. (Bug #22329365)
When setting up event logging for ndb_mgmd on Windows, MySQL Cluster tries to add a registry key to HKEY_LOCAL_MACHINE, which fails if the user does not have access to the registry. In such cases ndb_mgmd logged the error Could neither create or open key, which is not accurate and which can cause confusion for users who may not realize that file logging is available and being used. Now in such cases, ndb_mgmd logs a warning Could not create or access the registry key needed for the application to log to the Windows EventLog. Run the application with sufficient privileges once to create the key, or add the key manually, or turn off logging for that application. An error (as opposed to a warning) is now reported in such cases only if there is no available output at all for ndb_mgmd event logging. (Bug #20960839)
NdbDictionary metadata operations had a hard-coded 7-day timeout, which proved to be excessive for short-lived operations such as retrieval of table definitions. This could lead to unnecessary hangs in user applications which were difficult to detect and handle correctly. To help address this issue, timeout behaviour is modified so that read-only or short-duration dictionary interactions have a 2-minute timeout, while schema transactions of potentially long duration retain the existing 7-day timeout. Such timeouts are intended as a safety net: In the event of problems, these return control to users, who can then take corrective action. Any reproducible issue with NdbDictionary timeouts should be reported as a bug. (Bug #20368354)
Optimization of signal sending by buffering and sending them periodically, or when the buffer became full, could cause SUB_GCP_COMPLETE_ACK signals to be excessively delayed. Such signals are sent for each node and epoch, with a minimum interval of TimeBetweenEpochs; if they are not received in time, the SUMA buffers can overflow as a result. The overflow caused API nodes to be disconnected, leading to current transactions being aborted due to node failure. This condition made it difficult for long transactions (such as altering a very large table), to be completed. Now in such cases, the ACK signal is sent without being delayed. (Bug #18753341)
MySQL Cluster did not compile correctly with Microsoft Visual Studio 2015, due to a change from previous versions in the VS implementation of the _vsnprintf() function. (Bug #80276, Bug #22670525)
When setting CPU spin time, the value was needlessly cast to a boolean internally, so that setting to it any nonzero value yielded an effective value of 1. This issue, as well as the fix for it, apply both to setting the SchedulerSpinTimer parameter and to setting spintime as part of a ThreadConfig parameter value. (Bug #80237, Bug #22647476)
Processing of local checkpoints was not handled correctly on Mac OS X, due to an uninitialized variable. (Bug #80236, Bug #22647462)
A logic error in an if statement in storage/ndb/src/kernel/blocks/dbacc/DbaccMain.cpp rendered useless a check for determining whether ZREAD_ERROR should be returned when comparing operations. This was detected when compiling with gcc using -Werror=logical-op. (Bug #80155, Bug #22601798)
Suppressed a CMake warning that was caused by use of an incorrectly quoted variable name. (Bug #80066, Bug #22572632)
When using CREATE INDEX to add an index on either of two NDB tables sharing circular foreign keys, the query succeeded but a temporary table was left on disk, breaking the foreign key constraints. This issue was also observed when attempting to create an index on a table in the middle of a chain of foreign keys—that is, a table having both parent and child keys, but on different tables. The problem did not occur when using ALTER TABLE to perform the same index creation operation; and subsequent analysis revealed unintended differences in the way such operations were performed by CREATE INDEX. To fix this problem, we now make sure that operations performed by a CREATE INDEX statement are always handled internally in the same way and at the same time that the same operations are handled when performed by ALTER TABLE or DROP INDEX. (Bug #79156, Bug #22173891)
The PortNumber SCI, SHM, and TCP configuration parameters, which were deprecated in MySQL 5.1.3, have now been removed and are no longer accepted in configuration files. This change does not affect the PortNumber management node configuration parameter, whose behavior remains unaltered. (Bug #77405, Bug #21280456)
Compilation of MySQL with Visual Studio 2015 failed in ConfigInfo.cpp, due to a change in Visual Studio's handling of spaces and concatenation. 
For busy servers, client connection or communication failure could occur if an I/O-related system call was interrupted. The mysql_options() C API function now has a MYSQL_OPT_RETRY_COUNT option to control the number of retries for interrupted system calls. (Bug #22336527)
References: See also Bug #22389653.

New in MySQL Cluster 7.4.10 (Jan 29, 2016)

  • Fixes a major regression in performance during restarts found in MySQL Cluster NDB 7.4.8 which also affected MySQL Cluster NDB 7.4.9. Users of previous releases of MySQL Cluster can and should bypass the 7.4.8 and 7.4.9 releases when performing an upgrade, and upgrade directly to MySQL Cluster NDB 7.4.10 or later.

New in MySQL Cluster 7.4.9 (Jan 19, 2016)

  • Functionality Added or Changed:
  • Important Change: Previously, the NDB scheduler always optimized for speed against throughput in a predetermined manner (this was hard coded); this balance can now be set using the SchedulerResponsiveness data node configuration parameter. This parameter accepts an integer in the range of 0-10 inclusive, with 5 as the default. Higher values provide better response times relative to throughput. Lower values provide increased throughput, but impose longer response times. (Bug #78531, Bug #21889312)
  • Added the tc_time_track_stats table to the ndbinfo information database. This table provides time-tracking information relating to transactions, key operations, and scan operations performed by NDB. (Bug #78533, Bug #21889652)
  • Cluster Replication: Normally, RESET SLAVE causes all entries to be deleted from the mysql.ndb_apply_status table. This release adds the ndb_clear_apply_status system variable, which makes it possible to override this behavior. This variable is ON by default; setting it to OFF keeps RESET SLAVE from purging the ndb_apply_status table. (Bug #12630403)
  • Bugs Fixed:
  • Important Change: Users can now set the number and length of connection timeouts allowed by most NDB programs with the --connect-retries and --connect-retry-delay command line options introduced for the programs in this release. For ndb_mgm, --connect-retries supersedes the existing --try-reconnect option. (Bug #57576, Bug #11764714)
  • In debug builds, a WAIT_EVENT while polling caused excessive logging to stdout. (Bug #22203672)
  • When executing a schema operation such as CREATE TABLE on a MySQL Cluster with multiple SQL nodes, it was possible for the SQL node on which the operation was performed to time out while waiting for an acknowledgement from the others. This could occur when different SQL nodes had different settings for --ndb-log-updated-only, --ndb-log-update-as-write, or other mysqld options effecting binary logging by NDB. This happened due to the fact that, in order to distribute schema changes between them, all SQL nodes subscribe to changes in the ndb_schema system table, and that all SQL nodes are made aware of each others subscriptions by subscribing to TE_SUBSCRIBE and TE_UNSUBSCRIBE events. The names of events to subscribe to are constructed from the table names, adding REPL$ or REPLF$ as a prefix. REPLF$ is used when full binary logging is specified for the table. The issue described previously arose because different values for the options mentioned could lead to different events being subscribed to by different SQL nodes, meaning that all SQL nodes were not necessarily aware of each other, so that the code that handled waiting for schema distribution to complete did not work as designed. To fix this issue, MySQL Cluster now treats the ndb_schema table as a special case and enforces full binary logging at all times for this table, independent of any settings for mysqld binary logging options. (Bug #22174287, Bug #79188)
  • Attempting to create an NDB table having greater than the maximum supported combined width for all BIT columns (4096) caused data node failure when these columns were defined with COLUMN_FORMAT DYNAMIC. (Bug #21889267)
  • Creating a table with the maxmimum supported number of columns (512) all using COLUMN_FORMAT DYNAMIC led to data node failures. (Bug #21863798)
  • In certain cases, a cluster failure (error 4009) was reported as Unknown error code. (Bug #21837074)
  • For a timeout in GET_TABINFOREQ while executing a CREATE INDEX statement, mysqld returned Error 4243 (Index not found) instead of the expected Error 4008 (Receive from NDB failed). The fix for this bug also fixes similar timeout issues for a number of other signals that are sent the DBDICT kernel block as part of DDL operations, including ALTER_TAB_REQ, CREATE_INDX_REQ, DROP_FK_REQ, DROP_INDX_REQ, INDEX_STAT_REQ, DROP_FILE_REQ, CREATE_FILEGROUP_REQ, DROP_FILEGROUP_REQ, CREATE_EVENT, WAIT_GCP_REQ, DROP_TAB_REQ, and LIST_TABLES_REQ, as well as several internal functions used in handling NDB schema operations. (Bug #21277472)
  • Using ndb_mgm STOP -f to force a node shutdown even when it triggered a complete shutdown of the cluster, it was possible to lose data when a sufficient number of nodes were shut down, triggering a cluster shutodwn, and the timing was such that SUMA handovers had been made to nodes already in the process of shutting down. (Bug #17772138)
  • The internal NdbEventBuffer::set_total_buckets() method calculated the number of remaining buckets incorrectly. This caused any incomplete epoch to be prematurely completed when the SUB_START_CONF signal arrived out of order. Any events belonging to this epoch arriving later were then ignored, and so effectively lost, which resulted in schema changes not being distributed correctly among SQL nodes. (Bug #79635, Bug #22363510)
  • Compilation of MySQL Cluster failed on SUSE Linux Enterprise Server 12. (Bug #79429, Bug #22292329)
  • Schema events were appended to the binary log out of order relative to non-schema events. This was caused by the fact that the binlog injector did not properly handle the case where schema events and non-schema events were from different epochs. This fix modifies the handling of events from the two schema and non-schema event streams such that events are now always handled one epoch at a time, starting with events from the oldest available epoch, without regard to the event stream in which they occur. (Bug #79077, Bug #22135584, Bug #20456664)
  • When executed on an NDB table, ALTER TABLE ... DROP INDEX made changes to an internal array referencing the indexes before the index was actually dropped, and did not revert these changes in the event that the drop was not completed. One effect of this was that, after attempting to drop an index on which there was a foreign key dependency, the expected error referred to the wrong index, and subsequent attempts using SQL to modify indexes of this table failed. (Bug #78980, Bug #22104597)
  • NDB failed during a node restart due to the status of the current local checkpoint being set but not as active, even though it could have other states under such conditions. (Bug #78780, Bug #21973758)
  • ndbmtd checked for signals being sent only after a full cycle in run_job_buffers, which is performed for all job buffer inputs. Now this is done as part of run_job_buffers itself, which avoids executing for extended periods of time without sending to other nodes or flushing signals to other threads. (Bug #78530, Bug #21889088)
  • Disk Data: A unique index on a column of an NDB table is implemented with an associated internal ordered index, used for scanning. While dropping an index, this ordered index was dropped first, followed by the drop of the unique index itself. This meant that, when the drop was rejected due to (for example) a constraint violation, the statement was rejected but the associated ordered index remained deleted, so that any subsequent operation using a scan on this table failed. We fix this problem by causing the unique index to be removed first, before removing the ordered index; removal of the related ordered index is no longer performed when removal of a unique index fails. (Bug #78306, Bug #21777589)
  • Cluster Replication: While the binary log injector thread was handling failure events, it was possible for all NDB tables to be left indefinitely in read-only mode. This was due to a race condition between the binlog injector thread and the utility thread handling events on the ndb_schema table, and to the fact that, when handling failure events, the binlog injector thread places all NDB tables in read-only mode until all such events are handled and the thread restarts itself. When the binlog inject thread receives a group of one or more failure events, it drops all other existing event operations and expects no more events from the utility thread until it has handled all of the failure events and then restarted itself. However, it was possible for the utility thread to continue attempting binary log setup while the injector thread was handling failures and thus attempting to create the schema distribution tables as well as event subscriptions on these tables. If the creation of these tables and event subscriptions occurred during this time, the binlog injector thread's expectation that there were no further event operations was never met; thus, the injector thread never restarted, and NDB tables remained in read-only as described previously. To fix this problem, the Ndb object that handles schema events is now definitely dropped once the ndb_schema table drop event is handled, so that the utility thread cannot create any new events until after the injector thread has restarted, at which time, a new Ndb object for handling schema events is created. (Bug #17674771, Bug #19537961, Bug #22204186, Bug #22361695)
  • Cluster API: The binlog injector did not work correctly with TE_INCONSISTENT event type handling by Ndb::nextEvent(). (Bug #22135541)
  • Cluster API: Ndb::pollEvents() and pollEvents2() were slow to receive events, being dependent on other client threads or blocks to perform polling of transporters on their behalf. This fix allows a client thread to perform its own transporter polling when it has to wait in either of these methods. Introduction of transporter polling also revealed a problem with missing mutex protection in the ndbcluster_binlog handler, which has been added as part of this fix. (Bug #20957068, Bug #22224571, Bug #79311)
  • Cluster API: Garbage collection is performed on several objects in the implementation of NdbEventOperation, based on which GCIs have been consumed by clients, including those that have been dropped by Ndb::dropEventOperation(). In this implementation, the assumption was made that the global checkpoint index (GCI) is always monotonically increasing, although this is not the case during an initial restart, when the GCI is reset. This could lead to event objects in the NDB API being released prematurely or not at all, in the latter case causing a resource leak. To prevent this from happening, the NDB event object's implementation now tracks, internally, both the GCI and the generation of the GCI; the generation is incremented whenever the node process is restarted, and this value is now used to provide a monotonically increasing sequence. (Bug #73781, Bug #21809959)

New in MySQL Cluster 7.4.7 (Jul 14, 2015)

  • Functionality Added or Changed:
  • Deprecated MySQL Cluster node configuration parameters are now indicated as such by ndb_config --configinfo --xml. For each parameter currently deprecated, the corresponding tag in the XML output now includes the attribute deprecated="true". (Bug #21127135)
  • ClusterJ: Under high workload, it was possible to overload the direct memory used to back domain objects, because direct memory is not garbage collected in the same manner as objects allocated on the heap. Two strategies have been added to the ClusterJ implementation: first, direct memory is now pooled, so that when the domain object is garbage collected, the direct memory can be reused by another domain object. Additionally, a new user-level method, release(instance), has been added to the Session interface, which allows users to release the direct memory before the corresponding domain object is garbage collected. See the description for release(T) for more information. (Bug #20504741)
  • Bugs Fixed:
  • Important Change; Cluster API: The Ndb::getHighestQueuedEpoch() method returned the greatest epoch in the event queue instead of the greatest epoch found after calling pollEvents2(). (Bug #20700220)
  • Important Change; Cluster API: Ndb::pollEvents() is now compatible with the TE_EMPTY, TE_INCONSISTENT, and TE_OUT_OF_MEMORY event types introduced in MySQL Cluster NDB 7.4.3. For detailed information about this change, see the description of this method in the MySQL Cluster API Developer Guide. (Bug #20646496)
  • Important Change; Cluster API: Added the method Ndb::isExpectingHigherQueuedEpochs() to the NDB API to detect when additional, newer event epochs were detected by pollEvents2(). The behavior of Ndb::pollEvents() has also been modified such that it now returns NDB_FAILURE_GCI (equal to ~(Uint64) 0) when a cluster failure has been detected. (Bug #18753887)
  • After restoring the database metadata (but not any data) by running ndb_restore --restore_meta (or -m), SQL nodes would hang while trying to SELECT from a table in the database to which the metadata was restored. In such cases the attempt to query the table now fails as expected, since the table does not actually exist until ndb_restore is executed with --restore_data (-r). (Bug #21184102)
  • When a great many threads opened and closed blocks in the NDB API in rapid succession, the internal close_clnt() function synchronizing the closing of the blocks waited an insufficiently long time for a self-signal indicating potential additional signals needing to be processed. This led to excessive CPU usage by ndb_mgmd, and prevented other threads from opening or closing other blocks. This issue is fixed by changing the function polling call to wait on a specific condition to be woken up (that is, when a signal has in fact been executed). (Bug #21141495)
  • Previously, multiple send threads could be invoked for handling sends to the same node; these threads then competed for the same send lock. While the send lock blocked the additional send threads, work threads could be passed to other nodes. This issue is fixed by ensuring that new send threads are not activated while there is already an active send thread assigned to the same node. In addition, a node already having an active send thread assigned to it is no longer visible to other, already active, send threads; that is, such a node is longer added to the node list when a send thread is currently assigned to it. (Bug #20954804, Bug #76821)
  • Queueing of pending operations when the redo log was overloaded (DefaultOperationRedoProblemAction API node configuration parameter) could lead to timeouts when data nodes ran out of redo log space (P_TAIL_PROBLEM errors). Now when the redo log is full, the node aborts requests instead of queuing them. (Bug #20782580)
  • An NDB event buffer can be used with an Ndb object to subscribe to table-level row change event streams. Users subscribe to an existing event; this causes the data nodes to start sending event data signals (SUB_TABLE_DATA) and epoch completion signals (SUB_GCP_COMPLETE) to the Ndb object. SUB_GCP_COMPLETE_REP signals can arrive for execution in concurrent receiver thread before completion of the internal method call used to start a subscription. Execution of SUB_GCP_COMPLETE_REP signals depends on the total number of SUMA buckets (sub data streams), but this may not yet have been set, leading to the present issue, when the counter used for tracking the SUB_GCP_COMPLETE_REP signals (TOTAL_BUCKETS_INIT) was found to be set to erroneous values. Now TOTAL_BUCKETS_INIT is tested to be sure it has been set correctly before it is used. (Bug #20575424)
  • NDB statistics queries could be delayed by the error delay set for ndb_index_stat_option (default 60 seconds) when the index that was queried had been marked with internal error. The same underlying issue could also cause ANALYZE TABLE to hang when executed against an NDB table having multiple indexes where an internal error occured on one or more but not all indexes. Now in such cases, any existing statistics are returned immediately, without waiting for any additonal statistics to be discovered. (Bug #20553313, Bug #20707694, Bug #76325)
  • The multi-threaded scheduler sends to remote nodes either directly from each worker thread or from dedicated send threadsL, depending on the cluster's configuration. This send might transmit all, part, or none of the available data from the send buffers. While there remained pending send data, the worker or send threads continued trying to send in a loop. The actual size of the data sent in the most recent attempt to perform a send is now tracked, and used to detect lack of send progress by the send or worker threads. When no progress has been made, and there is no other work outstanding, the scheduler takes a 1 millisecond pause to free up the CPU for use by other threads. (Bug #18390321)
  • On startup, API nodes (including mysqld processes running as SQL nodes) waited to connect with data nodes that had not yet joined the cluster. Now they wait only for data nodes that have actually already joined the cluster. In the case of a new data node joining an existing cluster, API nodes still try to connect with the new data node within HeartbeatIntervalDbApi milliseconds. (Bug #17312761)
  • In some cases, the DBDICT block failed to handle repeated GET_TABINFOREQ signals after the first one, leading to possible node failures and restarts. This could be observed after setting a sufficiently high value for MaxNoOfExecutionThreads and low value for LcpScanProgressTimeout. (Bug #77433, Bug #21297221)
  • Client lookup for delivery of API signals to the correct client by the internal TransporterFacade::deliver_signal() function had no mutex protection, which could cause issues such as timeouts encountered during testing, when other clients connected to the same TransporterFacade. (Bug #77225, Bug #21185585)
  • It was possible to end up with a lock on the send buffer mutex when send buffers became a limiting resource, due either to insufficient send buffer resource configuration, problems with slow or failing communications such that all send buffers became exhausted, or slow receivers failing to consume what was sent. In this situation worker threads failed to allocate send buffer memory for signals, and attempted to force a send in order to free up space, while at the same time the send thread was busy trying to send to the same node or nodes. All of these threads competed for taking the send buffer mutex, which resulted in the lock already described, reported by the watchdog as Stuck in Send. This fix is made in two parts, listed here:
  • Cluster API: The pollEvents2() method now waits indefinitely for events when a negative value is used for the time argument. (Bug #20762291)
  • Cluster API: NdbEventOperation::isErrorEpoch() incorrectly returned false for the TE_INCONSISTENT table event type (see The Event::TableEvent Type). This caused a subsequent call to getEventType() to fail. (Bug #20729091)
  • Cluster API: Creation and destruction of Ndb_cluster_connection objects by multiple threads could make use of the same application lock, which in some cases led to failures in the global dictionary cache. To alleviate this problem, the creation and destruction of several internal NDB API objects have been serialized. (Bug #20636124)
  • Cluster API: A number of timeouts were not handled correctly in the NDB API.
  • Cluster API: When an Ndb object created prior to a failure of the cluster was reused, the event queue of this object could still contain data node events originating from before the failure. These events could reference “old” epochs (from before the failure occurred), which in turn could violate the assumption made by the nextEvent() method that epoch numbers always increase. This issue is addressed by explicitly clearing the event queue in such cases. (Bug #18411034)
  • ClusterJ: When used with Java 1.7 or higher, ClusterJ might cause the Java VM to crash when querying tables with BLOB columns, because NdbDictionary::createRecord calculates the wrong size needed for the record. Subsequently, when ClusterJ called NdbScanOperation::nextRecordCopyOut, the data overran the allocated buffer space. With this fix, ClusterJ checks the size calculated by NdbDictionary::createRecord and uses the value for the buffer size, if it is larger than the value ClusterJ itself calculates. (Bug #20695155)

New in MySQL Cluster 7.4.6 (Apr 14, 2015)

  • Bugs Fixed:
  • During backup, loading data from one SQL node followed by repeated DELETE statements on the tables just loaded from a different SQL node could lead to data node failures. (Bug #18949230)
  • When an instance of NdbEventBuffer was destroyed, any references to GCI operations that remained in the event buffer data list were not freed. Now these are freed, and items from the event bufer data list are returned to the free list when purging GCI containers. (Bug #76165, Bug #20651661)
  • When a bulk delete operation was committed early to avoid an additional round trip, while also returning the number of affected rows, but failed with a timeout error, an SQL node performed no verification that the transaction was in the Committed state. (Bug #74494, Bug #20092754)

New in MySQL Cluster 7.4.5 (Mar 21, 2015)

  • Bugs Fixed:
  • It was found during testing that problems could arise when the node registered as the arbitrator disconnected or failed during the arbitration process. In this situation, the node requesting arbitration could never receive a positive acknowledgement from the registered arbitrator; this node also lacked a stable set of members and could not initiate selection of a new arbitrator. Now in such cases, when the arbitrator fails or loses contact during arbitration, the requesting node immediately fails rather than waiting to time out. (Bug #20538179)
DROP DATABASE failed to remove the database when the database directory contained a .ndb file which had no corresponding table in NDB. Now, when executing DROP DATABASE, NDB performs an check specifically for leftover .ndb files, and deletes any that it finds. (Bug #20480035)
The maximum failure time calculation used to ensure that normal node failure handling mechanisms are given time to handle survivable cluster failures (before global checkpoint watchdog mechanisms start to kill nodes due to GCP delays) was excessively conservative, and neglected to consider that there can be at most number_of_data_nodes / NoOfReplicas node failures before the cluster can no longer survive. Now the value of NoOfReplicas is properly taken into account when performing this calculation. (Bug #20069617, Bug #20062754)
During a node restart, if there was no global checkpoint completed between the START_LCP_REQ for a local checkpoint and its LCP_COMPLETE_REP it was possible for a comparison of the LCP ID sent in the LCP_COMPLETE_REP signal with the internal value SYSFILE->latestLCP_ID to fail. (Bug #76113, Bug #20631645)
When sending LCP_FRAG_ORD signals as part of master takeover, it is possible that the master not is not synchronized with complete accuracy in real time, so that some signals must be dropped. During this time, the master can send a LCP_FRAG_ORD signal with its lastFragmentFlag set even after the local checkpoint has been completed. This enhancement causes this flag to persist until the statrt of the next local checkpoint, which causes these signals to be dropped as well. This change affects ndbd only; the issue described did not occur with ndbmtd. (Bug #75964, Bug #20567730)
When reading and copying transporter short signal data, it was possible for the data to be copied back to the same signal with overlapping memory. (Bug #75930, Bug #20553247)
NDB node takeover code made the assumption that there would be only one takeover record when starting a takeover, based on the further assumption that the master node could never perform copying of fragments. However, this is not the case in a system restart, where a master node can have stale data and so need to perform such copying to bring itself up to date. (Bug #75919, Bug #20546899)
Cluster API: A scan operation, whether it is a single table scan or a query scan used by a pushed join, stores the result set in a buffer. This maximum size of this buffer is calculated and preallocated before the scan operation is started. This buffer may consume a considerable amount of memory; in some cases we observed a 2 GB buffer footprint in tests that executed 100 parallel scans with 2 single-threaded (ndbd) data nodes. This memory consumption was found to scale linearly with additional fragments. 


New in MySQL Cluster 7.4.4 (Feb 26, 2015)

  • MySQL Cluster NDB 7.4.4 is a new release of MySQL Cluster, based on MySQL Server 5.6 and including features under development for version 7.4 of the NDB storage engine, as well as fixing a number of recently discovered bugs in previous MySQL Cluster releases.
  • This release also incorporates all bugfixes and changes made in previous MySQL Cluster releases, as well as all bugfixes and feature changes which were added in mainline MySQL 5.6 through MySQL 5.6.23.

New in MySQL Cluster 7.4.3 RC (Jan 22, 2015)

  • FUNCTIONALITY ADDED OR CHANGED:
  • Important Change; Cluster API: This release introduces an epoch-driven Event API for the NDB API that supercedes the earlier GCI-based model. The new version of this API also simplifies error detection and handling, and monitoring of event buffer memory usage has been been improved. 
New event handling methods for Ndb and NdbEventOperation added by this change include NdbEventOperation::getEventType2(), pollEvents2(), nextEvent2(), getHighestQueuedEpoch(), getNextEventOpInEpoch2(), getEpoch(), isEmptyEpoch(), and isErrorEpoch. The pollEvents(), nextEvent(), getLatestGCI(), getGCIEventOperations(), isConsistent(), isConsistentGCI(), getEventType(), getGCI(), getLatestGCI(), isOverrun(), hasError(), and clearError() methods are deprecated beginning with the same release. 
Some (but not all) of the new methods act as replacements replacements for deprecated methods; not all of the deprecated methods map to new ones. The Event Class, provides information as to which old methods correspond to new ones. 
Error handling using the new API is no longer handled using dedicated hasError() and clearError() methods, which are now deprecated as previously noted. To support this change, TableEvent now supports the values TE_EMPTY (empty epoch), TE_INCONSISTENT (inconsistent epoch), and TE_OUT_OF_MEMORY (inconsistent data). 
Event buffer memory management has also been improved with the introduction of the get_eventbuffer_free_percent(), set_eventbuffer_free_percent(), and get_eventbuffer_memory_usage() methods. Memory buffer usage can now be represented in applications using the EventBufferMemoryUsage data structure, and checked from MySQL client applications by reading the ndb_eventbuffer_free_percent system variable. 
For more information, see the detailed descriptions for the Ndb and NdbEventOperation methods listed. See also The Event::TableEvent Type, as well as The EventBufferMemoryUsage Structure. 

  • Additional logging is now performed of internal states occurring during system restarts such as waiting for node ID allocation and master takeover of global and local checkpoints. (Bug #74316, Bug #19795029)

  • Added the operations_per_fragment table to the ndbinfo information database. Using this table, you can now obtain counts of operations performed on a given fragment (or fragment replica). Such operations include reads, writes, updates, and deletes, scan and index operations performed while executing them, and operations refused, as well as information relating to rows scanned on and returned from a given fragment replica. This table also provides information about interpreted programs used as attribute values, and values returned by them. 

  • Cluster API: Two new example programs, demonstrating reads and writes of CHAR, VARCHAR, and VARBINARY column values, have been added to storage/ndb/ndbapi-examples in the MySQL Cluster source tree. For more information about these programs, including source code listings, see NDB API Simple Array Example, and NDB API Simple Array Example Using Adapter. 

  • BUGS FIXED:
  • The global checkpoint commit and save protocols can be delayed by various causes, including slow disk I/O. The DIH master node monitors the progress of both of these protocols, and can enforce a maximum lag time during which the protocols are stalled by killing the node responsible for the lag when it reaches this maximum. This DIH master GCP monitor mechanism did not perform its task more than once per master node; that is, it failed to continue monitoring after detecting and handling a GCP stop. (Bug #20128256)

  • When running mysql_upgrade on a MySQL Cluster SQL node, the expected drop of the performance_schema database on this node was instead performed on all SQL nodes connected to the cluster. (Bug #20032861)

  • The warning shown when an ALTER TABLE ALGORITHM=INPLACE ... ADD COLUMN statement automatically changes a column's COLUMN_FORMAT from FIXED to DYNAMIC now includes the name of the column whose format was changed. (Bug #20009152, Bug #74795)

  • The local checkpoint ScanFrag watchdog and the global checkpoint monitor can each exclude a node when it is too slow when participating in their respective protocols. This exclusion was implemented by simply asking the failing node to shut down, which in case this was delayed (for whatever reason) could prolong the duration of the GCP or LCP stall for other, unaffected nodes. 
To minimize this time, an isolation mechanism has been added to both protocols whereby any other live nodes forcibly disconnect the failing node after a predetermined amount of time. This allows the failing node the opportunity to shut down gracefully (after logging debugging and other information) if possible, but limits the time that other nodes must wait for this to occur. Now, once the remaining live nodes have processed the disconnection of any failing nodes, they can commence failure handling and restart the related protocol or protocol, even if the failed node takes an excessiviely long time to shut down. (Bug #19858151)

  • The matrix of values used for thread configuration when applying the setting of the MaxNoOfExecutionThreads configuration parameter has been improved to align with support for greater numbers of LDM threads. See Multi-Threading Configuration Parameters (ndbmtd), for more information about the changes. (Bug #75220, Bug #20215689)

  • When a new node failed after connecting to the president but not to any other live node, then reconnected and started again, a live node that did not see the original connection retained old state information. This caused the live node to send redundant signals to the president, causing it to fail. (Bug #75218, Bug #20215395)

  • In the NDB kernel, it was possible for a TransporterFacade object to reset a buffer while the data contained by the buffer was being sent, which could lead to a race condition. (Bug #75041, Bug #20112981)

  • mysql_upgrade failed to drop and recreate the ndbinfo database and its tables as expected. (Bug #74863, Bug #20031425)

  • Due to a lack of memory barriers, MySQL Cluster programs such as ndbmtd did not compile on POWER platforms. (Bug #74782, Bug #20007248)

  • In spite of the presence of a number of protection mechanisms against overloading signal buffers, it was still in some cases possible to do so. This fix adds block-level support in the NDB kernel (in SimulatedBlock) to make signal buffer overload protection more reliable than when implementing such protection on a case-by-case basis. (Bug #74639, Bug #19928269)

  • Copying of metadata during local checkpoints caused node restart times to be highly variable which could make it difficult to diagnose problems with restarts. The fix for this issue introduces signals (including PAUSE_LCP_IDLE, PAUSE_LCP_REQUESTED, and PAUSE_NOT_IN_LCP_COPY_META_DATA) to pause LCP execution and flush LCP reports, making it possible to block LCP reporting at times when LCPs during restarts become stalled in this fashion. (Bug #74594, Bug #19898269)

  • When a data node was restarted from its angel process (that is, following a node failure), it could be allocated a new node ID before failure handling was actually completed for the failed node. (Bug #74564, Bug #19891507)

  • In NDB version 7.4, node failure handling can require completing checkpoints on up to 64 fragments. (This checkpointing is performed by the DBLQH kernel block.) The requirement for master takeover to wait for completion of all such checkpoints led in such cases to excessive length of time for completion. 
To address these issues, the DBLQH kernel block can now report that it is ready for master takeover before it has completed any ongoing fragment checkpoints, and can continue processing these while the system completes the master takeover. (Bug #74320, Bug #19795217)

  • Local checkpoints were sometimes started earlier than necessary during node restarts, while the node was still waiting for copying of the data distribution and data dictionary to complete. (Bug #74319, Bug #19795152)

  • The check to determine when a node was restarting and so know when to accelerate local checkpoints sometimes reported a false positive. (Bug #74318, Bug #19795108)

  • Values in different columns of the ndbinfo tables disk_write_speed_aggregate and disk_write_speed_aggregate_node were reported using differing multiples of bytes. Now all of these columns display values in bytes. 
In addition, this fix corrects an error made when calculating the standard deviations used in the std_dev_backup_lcp_speed_last_10sec, std_dev_redo_speed_last_10sec, std_dev_backup_lcp_speed_last_60sec, and std_dev_redo_speed_last_60sec columns of the ndbinfo.disk_write_speed_aggregate table. (Bug #74317, Bug #19795072)

  • Recursion in the internal method Dblqh::finishScanrec() led to an attempt to create two list iterators with the same head. This regression was introduced during work done to optimize scans for version 7.4 of the NDB storage engine. (Bug #73667, Bug #19480197)

  • Cluster API: It was possible to delete an Ndb_cluster_connection object while there remained instances of Ndb using references to it. Now the Ndb_cluster_connection destructor waits for all related Ndb objects to be released before completing. (Bug #19999242)
References: See also Bug #19846392.

  • ClusterJ: ClusterJ reported a segmentation violation when an application closed a session factory while some sessions were still active. This was because MySQL Cluster allowed an Ndb_cluster_connection object be to deleted while some Ndb instances were still active, which might result in the usage of null pointers by ClusterJ. This fix stops that happening by preventing ClusterJ from closing a session factory when any of its sessions are still active. (Bug #19846392)


New in MySQL Cluster 7.3.8 (Jan 22, 2015)

  • FUNCTIONALITY ADDED OR CHANGED:
  • Performance: Recent improvements made to the multithreaded scheduler were intended to optimize the cache behavior of its internal data structures, with members of these structures placed such that those local to a given thread do not overflow into a cache line which can be accessed by another thread. Where required, extra padding bytes are inserted to isolate cache lines owned (or shared)
  • by other threads, thus avoiding invalidation of the entire cache line if another thread writes into a cache line not entirely owned by itself. This optimization improved MT Scheduler performance by several percent.
  • It has since been found that the optimization just described depends on the global instance of struct thr_repository starting at a cache line aligned base address as well as the compiler not rearranging or adding extra padding to the scheduler struct; it was also found that these prerequisites were not guaranteed (or even checked)Thus this cache line optimization has previously worked only when g_thr_repository (that is, the global instance) ended up being cache line aligned only by accident. In addition, on 64-bit platforms, the compiler added extra padding words in struct thr_safe_pool such that attempts to pad it to a cache line aligned size failed.
  • The current fix ensures that g_thr_repository is constructed on a cache line aligned address, and the constructors modified so as to verify cacheline aligned adresses where these are assumed by design.
  • Results from internal testing show improvements in MT Scheduler read performance of up to 10% in some cases, following these changes. (Bug #18352514)
  • Cluster API: Two new example programs, demonstrating reads and writes of CHAR, VARCHAR, and VARBINARY column values, have been added to storage/ndb/ndbapi-examples in the MySQL Cluster source tree. For more information about these programs, including source code listings, see NDB API Simple Array Example, and NDB API Simple Array Example Using Adapter. 

  • BUGS FIXED:
  • The global checkpoint commit and save protocols can be delayed by various causes, including slow disk I/O. The DIH master node monitors the progress of both of these protocols, and can enforce a maximum lag time during which the protocols are stalled by killing the node responsible for the lag when it reaches this maximum. This DIH master GCP monitor mechanism did not perform its task more than once per master node; that is, it failed to continue monitoring after detecting and handling a GCP stop. (Bug #20128256)
  • When running mysql_upgrade on a MySQL Cluster SQL node, the expected drop of the performance_schema database on this node was instead performed on all SQL nodes connected to the cluster. (Bug #20032861)
Online reorganization when using ndbmtd data nodes and with binary logging by mysqld enabled could sometimes lead to failures in the TRIX and DBLQH kernel blocks, or in silent data corruption. (Bug #19903481)
The local checkpoint ScanFrag watchdog and the global checkpoint monitor can each exclude a node when it is too slow when participating in their respective protocols. This exclusion was implemented by simply asking the failing node to shut down, which in case this was delayed (for whatever reason)
  • could prolong the duration of the GCP or LCP stall for other, unaffected nodes. 
To minimize this time, an isolation mechanism has been added to both protocols whereby any other live nodes forcibly disconnect the failing node after a predetermined amount of time. This allows the failing node the opportunity to shut down gracefully (after logging debugging and other information)
  • if possible, but limits the time that other nodes must wait for this to occur. Now, once the remaining live nodes have processed the disconnection of any failing nodes, they can commence failure handling and restart the related protocol or protocol, even if the failed node takes an excessively long time to shut down. (Bug #19858151)
A watchdog failure resulted from a hang while freeing a disk page in TUP_COMMITREQ, due to use of an uninitialized block variable. (Bug #19815044, Bug #74380)
Multiple threads crashing led to multiple sets of trace files being printed and possibly to deadlocks. (Bug #19724313)
When a client retried against a new master a schema transaction that failed previously against the previous master while the latter was restarting, the lock obtained by this transaction on the new master prevented the previous master from progressing past start phase 3 until the client was terminated, and resources held by it were cleaned up. (Bug #19712569, Bug #74154)
When using the NDB storage engine, the maximum possible length of a database or table name is 63 characters, but this limit was not always strictly enforced. This meant that a statement using a name having 64 characters such CREATE DATABASE, DROP DATABASE, or ALTER TABLE RENAME could cause the SQL node on which it was executed to fail. Now such statements fail with an appropriate error message. (Bug #19550973)
When a new data node started, API nodes were allowed to attempt to register themselves with the data node for executing transactions before the data node was ready. This forced the API node to wait an extra heartbeat interval before trying again. To address this issue, a number of HA_ERR_NO_CONNECTION errors (Error 4009) that could be issued during this time have been changed to Cluster temporarily unavailable errors (Error 4035), which should allow API nodes to use new data nodes more quickly than before. As part of this fix, some errors which were incorrectly categorised have been moved into the correct categories, and some errors which are no longer used have been removed. (Bug #19524096, Bug #73758)
  • When executing very large pushdown joins involving one or more indexes each defined over several columns, it was possible in some cases for the DBSPJ block (see The DBSPJ Block)
  • in the NDB kernel to generate SCAN_FRAGREQ signals that were excessively large. This caused data nodes to fail when these could not be handled correctly, due to a hard limit in the kernel on the size of such signals (32K)
  • . This fix bypasses that limitation by breaking up SCAN_FRAGREQ data that is too large for one such signal, and sending the SCAN_FRAGREQ as a chunked or fragmented signal instead. (Bug #19390895)
  • ndb_index_stat sometimes failed when used against a table containing unique indexes. (Bug #18715165)
  • Queries against tables containing a CHAR(0)
  • columns failed with ERROR 1296 (HY000)
  • Got error 4547 'RecordSpecification has overlapping offsets' from NDBCLUSTER. (Bug #14798022)
  • In the NDB kernel, it was possible for a TransporterFacade object to reset a buffer while the data contained by the buffer was being sent, which could lead to a race condition. (Bug #75041, Bug #20112981)
  • mysql_upgrade failed to drop and recreate the ndbinfo database and its tables as expected. (Bug #74863, Bug #20031425)
  • Due to a lack of memory barriers, MySQL Cluster programs such as ndbmtd did not compile on POWER platforms. (Bug #74782, Bug #20007248)
  • In some cases, when run against a table having an AFTER DELETE trigger, a DELETE statement that matched no rows still caused the trigger to execute. (Bug #74751, Bug #19992856)
  • A basic requirement of the NDB storage engine's design is that the transporter registry not attempt to receive data (TransporterRegistry::performReceive()) from and update the connection status (TransporterRegistry::update_connections()
  • ) of the same set of transporters concurrently, due to the fact that the updates perform final cleanup and reinitialization of buffers used when receiving data. Changing the contents of these buffers while reading or writing to them could lead to "garbage" or inconsistent signals being read or written. 
During the course of work done previously to improve the implementation of the transporter facade, a mutex intended to protect against the concurrent use of the performReceive()and update_connections()) methods on the same transporter was inadvertently removed. This fix adds a watchdog check for concurrent usage. In addition, update_connections()and performReceive() calls are now serialized together while polling the transporters. (Bug #74011, Bug #19661543)
  • ndb_restore failed while restoring a table which contained both a built-in conversion on the primary key and a staging conversion on a TEXT column. 
During staging, a BLOB table is created with a primary key column of the target type. However, a conversion function was not provided to convert the primary key values before loading them into the staging blob table, which resulted in corrupted primary key values in the staging BLOB table. While moving data from the staging table to the target table, the BLOB read failed because it could not find the primary key in the BLOB table. 
Now all BLOB tables are checked to see whether there are conversions on primary keys of their main tables. This check is done after all the main tables are processed, so that conversion functions and parameters have already been set for the main tables. Any conversion functions and parameters used for the primary key in the main table are now duplicated in the BLOB table. (Bug #73966, Bug #19642978)
  • Corrupted messages to data nodes sometimes went undetected, causing a bad signal to be delivered to a block which aborted the data node. This failure in combination with disconnecting nodes could in turn cause the entire cluster to shut down. 
To keep this from happening, additional checks are now made when unpacking signals received over TCP, including checks for byte order, compression flag (which must not be used), and the length of the next message in the receive buffer (if there is one). 
Whenever two consecutive unpacked messages fail the checks just described, the current message is assumed to be corrupted. In this case, the transporter is marked as having bad data and no more unpacking of messages occurs until the transporter is reconnected. In addition, an entry is written to the cluster log containing the error as well as a hex dump of the corrupted message. (Bug #73843, Bug #19582925)
  • Transporter send buffers were not updated properly following a failed send. (Bug #45043, Bug #20113145)
  • ndb_restore --print_data truncated TEXT and BLOB column values to 240 bytes rather than 256 bytes. 
Disk Data: When a node acting as a DICT master fails, the arbitrator selects another node to take over in place of the failed node. During the takeover procedure, which includes cleaning up any schema transactions which are still open when the master failed, the disposition of the uncommitted schema transaction is decided. Normally this transaction be rolled back, but if it has completed a sufficient portion of a commit request, the new master finishes processing the commit. Until the fate of the transaction has been decided, no new TRANS_END_REQ messages from clients can be processed. In addition, since multiple concurrent schema transactions are not supported, takeover cleanup must be completed before any new transactions can be started. 
A similar restriction applies to any schema operations which are performed in the scope of an open schema transaction. The counter used to coordinate schema operation across all nodes is employed both during takeover processing and when executing any non-local schema operations. This means that starting a schema operation while its schema transaction is in the takeover phase causes this counter to be overwritten by concurrent uses, with unpredictable results. 
The scenarios just described were handled previously using a pseudo-random delay when recovering from a node failure. Now we check before the new master has rolled forward or backwards any schema transactions remaining after the failure of the previous master and avoid starting new schema transactions or performing operations using old transactions until takeover processing has cleaned up after the abandoned transaction. (Bug #19874809, Bug #74503)
  • Disk Data: When a node acting as DICT master fails, it is still possible to request that any open schema transaction be either committed or aborted by sending this request to the new DICT master. In this event, the new master takes over the schema transaction and reports back on whether the commit or abort request succeeded. In certain cases, it was possible for the new master to be misidentified—that is, the request was sent to the wrong node, which responded with an error that was interpreted by the client application as an aborted schema transaction, even in cases where the transaction could have been successfully committed, had the correct node been contacted. (Bug #74521, Bug #19880747)
  • Cluster Replication: When an NDB client thread made a request to flush the binary log using statements such as FLUSH BINARY LOGS or SHOW BINLOG EVENTS, this caused not only the most recent changes made by this client to be flushed, but all recent changes made by all other clients to be flushed as well, even though this was not needed. This behavior caused unnecessary waiting for the statement to execute, which could lead to timeouts and other issues with replication. Now such statements flush the most recent database changes made by the requesting thread only. 
As part of this fix, the status variables Ndb_last_commit_epoch_server, Ndb_last_commit_epoch_session, and Ndb_slave_max_replicated_epoch, originally implemented in MySQL Cluster NDB 7.4, are also now available in MySQL Cluster NDB 7.3. For descriptions of these variables, see MySQL Cluster Status Variables; for further information, see MySQL Cluster Replication Conflict Resolution. (Bug #19793475)
  • Cluster Replication: It was possible using wildcards to set up conflict resolution for an exceptions table (that is, a table named using the suffix $EX), which should not be allowed. Now when a replication conflict function is defined using wildcard expressions, these are checked for possible matches so that, in the event that the function would cover an exceptions table, it is not set up for this table. (Bug #19267720)
Cluster API: It was possible to delete an Ndb_cluster_connection object while there remained instances of Ndb using references to it. Now the Ndb_cluster_connection destructor waits for all related Ndb objects to be released before completing. (Bug #19999242)
Cluster API: The buffer allocated by an NdbScanOperation for receiving scanned rows was not released until the NdbTransaction owning the scan operation was closed. This could lead to excessive memory usage in an application where multiple scans were created within the same transaction, even if these scans were closed at the end of their lifecycle, unless NdbScanOperation::close() was invoked with the releaseOp argument equal to true. Now the buffer is released whenever the cursor navigating the result set is closed with NdbScanOperation::close(), regardless of the value of this argument. (Bug #75128, Bug #20166585)
  • ClusterJ: The com.mysql.clusterj.tie class gave off a logging message at the INFO logging level for every single query, which was unnecessary and was affecting the performance of applications that used ClusterJ. (Bug #20017292)
  • ClusterJ: ClusterJ reported a segmentation violation when an application closed a session factory while some sessions were still active. This was because MySQL Cluster allowed an Ndb_cluster_connection object be to deleted while some Ndb instances were still active, which might result in the usage of null pointers by ClusterJ. This fix stops that happening by preventing ClusterJ from closing a session factory when any of its sessions are still active. (Bug #19846392)
  • ClusterJ: The following errors were logged at the SEVERE level; they are now logged at the NORMAL level, as they should be:
  • Duplicate primary key 
Duplicate unique key 
Foreign key constraint error: key does not exist 
Foreign key constraint error: key exists
  • 
A number of problems relating to the fired triggers pool have been fixed, including the following issues:
  • When the fired triggers pool was exhausted, NDB returned Error 218 (Out of LongMessageBuffer). A new error code 221 is added to cover this case. 
An additional, separate case in which Error 218 was wrongly reported now returns the correct error. 
Setting low values for MaxNoOfFiredTriggers led to an error when no memory was allocated if there was only one hash bucket. 
An aborted transaction now releases any fired trigger records it held. Previously, these records were held until its ApiConnectRecord was reused by another transaction. 
In addition, for the Fired Triggers pool in the internal ndbinfo.ndb$pools table, the high value always equalled the total, due to the fact that all records were momentarily seized when initializing them. Now the high value shows the maximum following completion of initialization. 


New in MySQL Cluster 7.3.7 (Oct 21, 2014)

  • Changes:
  • MySQL Cluster NDB 7.3.7 is a new release of MySQL Cluster, based on MySQL Server 5.6 and including features from version 7.3 of the NDB storage engine, as well as fixing a number of recently discovered bugs in previous MySQL Cluster releases.
  • Obtaining MySQL Cluster NDB 7.3. MySQL Cluster NDB 7.3 source code and binaries can be obtained from http://dev.mysql.com/downloads/cluster/.
  • For an overview of changes made in MySQL Cluster NDB 7.3, see MySQL Cluster Development in MySQL Cluster NDB 7.3.
  • This release also incorporates all bugfixes and changes made in previous MySQL Cluster releases, as well as all bugfixes and feature changes which were added in mainline MySQL 5.6 through MySQL 5.6.21

New in MySQL Cluster 7.3.5 (Apr 8, 2014)

  • Functionality Added or Changed:
  • Handling of LongMessageBuffer shortages and statistics has been improved as follows:
  • The default value of LongMessageBuffer has been increased from 4 MB to 64 MB.
  • When this resource is exhausted, a suitable informative message is now printed in the data node log describing possible causes of the problem and suggesting possible solutions.
  • LongMessageBuffer usage information is now shown in the ndbinfo.memoryusage table. See the description of this table for an example and additional information.
  • Bugs Fixed:
  • Important Change: The server system variables ndb_index_cache_entries and ndb_index_stat_freq, which had been deprecated in a previous MySQL Cluster release series, have now been removed. (Bug #11746486, Bug #26673)
  • When an ALTER TABLE statement changed table schemas without causing a change in the table's partitioning, the new table definition did not copy the hash map from the old definition, but used the current default hash map instead. However, the table data was not reorganized according to the new hashmap, which made some rows inaccessible using a primary key lookup if the two hash maps had incompatible definitions. To keep this situation from occurring, any ALTER TABLE that entails a hashmap change now triggers a reorganisation of the table. In addition, when copying a table definition in such cases, the hashmap is now also copied. (Bug #18436558)
  • When certain queries generated signals having more than 18 data words prior to a node failure, such signals were not written correctly in the trace file. (Bug #18419554)
  • The ServerPort and TcpBind_INADDR_ANY configuration parameters were not included in the output of ndb_mgmd --print-full-config. (Bug #18366909)
  • After dropping an NDB table, neither the cluster log nor the output of the REPORT MemoryUsage command showed that the IndexMemory used by that table had been freed, even though the memory had in fact been deallocated. This issue was introduced in MySQL Cluster NDB 7.3.2. (Bug #18296810)
  • -DWITH_NDBMTD=0 did not function correctly, which could cause the build to fail on platforms such as ARM and Raspberry Pi which do not define the memory barrier functions required to compile ndbmtd. (Bug #18267919)
  • ndb_show_tables sometimes failed with the error message Unable to connect to management server and immediately terminated, without providing the underlying reason for the failure. To provide more useful information in such cases, this program now also prints the most recent error from the Ndb_cluster_connection object used to instantiate the connection. (Bug #18276327)
  • The block threads managed by the multi-threading scheduler communicate by placing signals in an out queue or job buffer which is set up between all block threads. This queue has a fixed maximum size, such that when it is filled up, the worker thread must wait for the consumer to drain the queue. In a highly loaded system, multiple threads could end up in a circular wait lock due to full out buffers, such that they were preventing each other from performing any useful work. This condition eventually led to the data node being declared dead and killed by the watchdog timer. To fix this problem, we detect situations in which a circular wait lock is about to begin, and cause buffers which are otherwise held in reserve to become available for signal processing by queues which are highly loaded. (Bug #18229003)
  • The ndb_mgm client START BACKUP command (see Commands in the MySQL Cluster Management Client) could experience occasional random failures when a ping was received prior to an expected BackupCompleted event. Now the connection established by this command is not checked until it has been properly set up. (Bug #18165088)
  • An issue found when compiling the MySQL Cluster software for Solaris platforms could lead to problems when using ThreadConfig on such systems. (Bug #18181656)
  • When creating a table with foreign key referencing an index in another table, it sometimes appeared possible to create the foreign key even if the order of the columns in the indexes did not match, due to the fact that an appropriate error was not always returned internally. This fix improves the error used internally to work in most cases; however, it is still possible for this situation to occur in the event that the parent index is a unique index. (Bug #18094360)
  • Updating parent tables of foreign keys used excessive scan resources and so required unusually high settings for MaxNoOfLocalScans and MaxNoOfConcurrentScans. (Bug #18082045)
  • Dropping a nonexistent foreign key on an NDB table (using, for example, ALTER TABLE) appeared to succeed. Now in such cases, the statement fails with a relevant error message, as expected. (Bug #17232212)
  • Data nodes running ndbmtd could stall while performing an online upgrade of a MySQL Cluster containing a great many tables from a version prior to NDB 7.2.5 to version 7.2.5 or later. (Bug #16693068)
  • Replication: Log rotation events could cause group_relay_log_pos to be moved forward incorrectly within a group. This meant that, when the transaction was retried, or if the SQL thread was stopped in the middle of a transaction following one or more log rotations (such that the transaction or group spanned multiple relay log files), part or all of the group was silently skipped. This issue has been addressed by correcting a problem in the logic used to avoid touching the coordinates of the SQL thread when updating the log position as part of a relay log rotation whereby it was possible to update the SQL thread's coordinates when not using a multi-threaded slave, even in the middle of a group. (Bug #18482854)
  • Cluster Replication: A slave in MySQL Cluster Replication now monitors the progression of epoch numbers received from its immediate upstream master, which can both serve as a useful check on the low-level functioning of replication, and provide a warning in the event replication is restarted accidentally at an already-applied position.
  • Cluster API: When an NDB API client application received a signal with an invalid block or signal number, NDB provided only a very brief error message that did not accurately convey the nature of the problem. Now in such cases, appropriate printouts are provided when a bad signal or message is detected. In addition, the message length is now checked to make certain that it matches the size of the embedded signal. (Bug #18426180)
  • Cluster API: Refactoring that was performed in MySQL Cluster NDB 7.3.4 inadvertently introduced a dependency in Ndb.hpp on a file that is not included in the distribution, which caused NDB API applications to fail to compile. The dependency has been removed. (Bug #18293112, Bug #71803)
  • Cluster API: An NDB API application sends a scan query to a data node; the scan is processed by the transaction coordinator (TC). The TC forwards a LQHKEYREQ request to the appropriate LDM, and aborts the transaction if it does not receive a LQHKEYCONF response within the specified time limit. After the transaction is successfully aborted, the TC sends a TCROLLBACKREP to the NDBAPI client, and the NDB API client processes this message by cleaning up any Ndb objects associated with the transaction.
  • Cluster API: The example ndbapi-examples/ndbapi_blob_ndbrecord/main.cpp included an internal header file (ndb_global.h) not found in the MySQL Cluster binary distribution. The example now uses stdlib.h and string.h instead of this file. (Bug #18096866, Bug #71409)
  • Cluster API: When Dictionary::dropTable() attempted (as a normal part of its internal operations) to drop an index used by a foreign key constraint, this led to data node failure. Now in such cases, dropTable() drops all foreign keys on the table being dropped, whether this table acts as a parent table, child table, or both. (Bug #18069680)
  • References: See also Bug #17591531.
  • Cluster API: ndb_restore could sometimes report Error 701 System busy with other schema operation unnecessarily when restoring in parallel. (Bug #17916243)

New in MySQL Cluster 7.3.4 (Feb 6, 2014)

  • Incorporates all bug fixes and changes made in previous MySQL Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.6 through MySQL 5.6.15.

New in MySQL Cluster 7.3.3 (Dec 4, 2013)

  • FUNCTIONALITY ADDED OR CHANGED:
  • The length of time a management node waits for a heartbeat message from another management node is now configurable using the HeartbeatIntervalMgmdMgmd management node configuration parameter added in this release. The connection is considered dead after 3 missed heartbeats. The default value is 1500 milliseconds, or a timeout of approximately 6000 ms. (Bug #17807768, Bug #16426805)
  • The MySQL Cluster Auto-Installer now generates a my.cnf file for each mysqld in the cluster before starting it. For more information, see Using the MySQL Cluster Auto-Installer. (Bug #16994782)
  • BLOB and TEXT columns are now reorganized by the ALTER ONLINE TABLE ... REORGANIZE PARTITION statement. (Bug #13714148)
  • BUGS FIXED:
  • ndbmemcache:
  • When attempting to start memcached with a cache_size larger than that of the available memory and with preallocate=true failed, the error message provided only a numeric code, and did not indicate what the actual source of the error was. (Bug #17509293, Bug #70403)
  • The CMakeLists.txt files for ndbmemcache wrote into the source tree, preventing out-of-source builds. (Bug #14650456)
  • ndbmemcache did not handle passed-in BINARY values correctly.
  • Performance:
  • In a number of cases found in various locations in the MySQL Cluster codebase, unnecessary iterations were performed; this was caused by failing to break out of a repeating control structure after a test condition had been met. This community-contributed fix removes the unneeded repetitions by supplying the missing breaks. (Bug #16904243, Bug #69392, Bug #16904338, Bug #69394, Bug #16778417, Bug #69171, Bug #16778494, Bug #69172, Bug #16798410, Bug #69207, Bug #16801489, Bug #69215, Bug #16904266, Bug #69393)
  • Packaging:
  • Portions of the documentation specific to MySQL Cluster and the NDB storage engine were not included when installing from RPMs. (Bug #16303451)
  • File system errors occurring during a local checkpoint could sometimes cause an LCP to hang with no obvious cause when they were not handled correctly. Now in such cases, such errors always cause the node to fail. Note that the LQH block always shuts down the node when a local checkpoint fails; the change here is to make likely node failure occur more quickly and to make the original file system error more visible. (Bug #19691443)
  • Trying to restore to a table having a BLOB column in a different position from that of the original one caused ndb_restore --restore_data to fail. (Bug #17395298)
  • It was not possible to start MySQL Cluster processes created by the Auto-Installer on a Windows host running freeSSHd. (Bug #17269626)
  • ndb_restore could abort during the last stages of a restore using attribute promotion or demotion into an existing table. This could happen if a converted attribute was nullable and the backup had been run on active database. (Bug #17275798)
  • ALTER ONLINE TABLE ... REORGANIZE PARTITION failed when run against a table having or using a reference to a foreign key. (Bug #17036744, Bug #69619)
  • The DBUTIL data node block is now less strict about the order in which it receives certain messages from other nodes. (Bug #17052422)
  • TUPKEYREQ signals are used to read data from the tuple manager block (DBTUP), and are used for all types of data access, especially for scans which read many rows. A TUPKEYREQ specifies a series of 'columns' to be read, which can be either single columns in a specific table, or pseudocolumns, two of which—READ_ALL and READ_PACKED—are aliases to read all columns in a table, or some subset of these columns. Pseudocolumns are used by modern NDB API applications as they require less space in the TUPKEYREQ to specify columns to be read, and can return the data in a more compact (packed) format.
  • This fix moves the creation and initialization of on-stack Signal objects to only those pseudocolumn reads which need to EXECUTE_DIRECT to other block instances, rather than for every read. In addition, the size of an on-stack signal is now varied to suit the requirements of each pseudocolumn, so that only reads of the INDEX_STAT pseudocolumn now require initialization (and 3KB memory each time this is performed). (Bug #17009502)
  • A race condition could sometimes occur when trying to lock receive threads to cores. (Bug #17009393)
  • RealTimeScheduler did not work correctly with data nodes running ndbmtd. (Bug #16961971)
  • Results from joins using a WHERE with an ORDER BY ... DESC clause were not sorted properly; the DESC keyword in such cases was effectively ignored. (Bug #16999886, Bug #69528)
  • The Windows error ERROR_FILE_EXISTS was not recognized by NDB, which treated it as an unknown error. (Bug #16970960)
  • Maintenance and checking of parent batch completion in the SPJ block of the NDB kernel was reimplemented. Among other improvements, the completion state of all ancestor nodes in the tree are now preserved. (Bug #16925513)
  • Dropping a column, which was not itself a foreign key, from an NDB table having foreign keys failed with ER_TABLE_DEF_CHANGED. (Bug #16912989)
  • The LCP fragment scan watchdog periodically checks for lack of progress in a fragment scan performed as part of a local checkpoint, and shuts down the node if there is no progress after a given amount of time has elapsed. This interval, formerly hard-coded as 60 seconds, can now be configured using the LcpScanProgressTimeout data node configuration parameter added in this release.
  • This configuration parameter sets the maximum time the local checkpoint can be stalled before the LCP fragment scan watchdog shuts down the node. The default is 60 seconds, which provides backward compatibility with previous releases.
  • You can disable the LCP fragment scan watchdog by setting this parameter to 0. (Bug #16630410)
  • Added the ndb_error_reporter options --connection-timeout, which makes it possible to set a timeout for connecting to nodes, --dry-scp, which disables scp connections to remote hosts, and --skip-nodegroup, which skips all nodes in a given node group. (Bug #16602002)
  • ndb_mgm treated backup IDs provided to ABORT BACKUP commands as signed values, so that backup IDs greater than 231 wrapped around to negative values. This issue also affected out-of-range backup IDs, which wrapped around to negative values instead of causing errors as expected in such cases. The backup ID is now treated as an unsigned value, and ndb_mgm now performs proper range checking for backup ID values greater than MAX_BACKUPS (232). (Bug #16585497, Bug #68798)
  • After issuing START BACKUP id WAIT STARTED, if id had already been used for a backup ID, an error caused by the duplicate ID occurred as expected, but following this, the START BACKUP command never completed. (Bug #16593604, Bug #68854)
  • When trying to specify a backup ID greater than the maximum allowed, the value was silently truncated. (Bug #16585455, Bug #68796)
  • The unexpected shutdown of another data node as a starting data node received its node ID caused the latter to hang in Start Phase 1. (Bug #16007980)
  • The NDB receive thread waited unnecessarily for additional job buffers to become available when receiving data. This caused the receive mutex to be held during this wait, which could result in a busy wait when the receive thread was running with real-time priority.
  • This fix also handles the case where a negative return value from the initial check of the job buffer by the receive thread prevented further execution of data reception, which could possibly lead to communication blockage or configured ReceiveBufferMemory underutilization. (Bug #15907515)
  • When the available job buffers for a given thread fell below the critical threshold, the internal multi-threading job scheduler waited for job buffers for incoming rather than outgoing signals to become available, which meant that the scheduler waited the maximum timeout (1 millisecond) before resuming execution. (Bug #15907122)
  • SELECT ... WHERE ... LIKE from an NDB table could return incorrect results when using engine_condition_pushdown=ON. (Bug #15923467, Bug #67724)
  • Setting lower_case_table_names to 1 or 2 on Windows systems caused ALTER TABLE ... ADD FOREIGN KEY statements against tables with names containing uppercase letters to fail with Error 155, No such table: '(null)'. (Bug #14826778, Bug #67354)
  • When a node fails, the Distribution Handler (DBDIH kernel block) takes steps together with the Transaction Coordinator (DBTC) to make sure that all ongoing transactions involving the failed node are taken over by a surviving node and either committed or aborted. Transactions taken over which are then committed belong in the epoch that is current at the time the node failure occurs, so the surviving nodes must keep this epoch available until the transaction takeover is complete. This is needed to maintain ordering between epochs.
  • A problem was encountered in the mechanism intended to keep the current epoch open which led to a race condition between this mechanism and that normally used to declare the end of an epoch. This could cause the current epoch to be closed prematurely, leading to failure of one or more surviving data nodes. (Bug #14623333, Bug #16990394)
  • When using dynamic listening ports for accepting connections from API nodes, the port numbers were reported to the management server serially. This required a round trip for each API node, causing the time required for data nodes to connect to the management server to grow linearly with the number of API nodes. To correct this problem, each data node now reports all dynamic ports at once. (Bug #12593774)
  • Formerly, the node used as the coordinator or leader for distributed decision making between nodes (also known as the DICT manager—see The DBDICT Block) was indicated in the output of the ndb_mgm client SHOW command as the “master” node, although this node has no relationship to a master server in MySQL Replication. (It should also be noted that it is not necessary to know which node is the leader except when debugging NDBCLUSTER source code.) To avoid possible confusion, this label has been removed, and the leader node is now indicated in SHOW command output using an asterisk (*) character. (Bug #11746263, Bug #24880)
  • ndb_error-reporter did not support the --help option. (Bug #11756666, Bug #48606)
  • ABORT BACKUP in the ndb_mgm client (see Commands in the MySQL Cluster Management Client) took an excessive amount of time to return (approximately as long as the backup would have required to complete, had it not been aborted), and failed to remove the files that had been generated by the aborted backup. (Bug #68853, Bug #17719439)
  • Program execution failed to break out of a loop after meeting a desired condition in a number of internal methods, performing unneeded work in all cases where this occurred. (Bug #69610, Bug #69611, Bug #69736, Bug #17030606, Bug #17030614, Bug #17160263)
  • Attribute promotion and demotion when restoring data to NDB tables using ndb_restore --restore_data with the --promote-attributes and --lossy-conversions options has been improved as follows: Columns of types CHAR, and VARCHAR can now be promoted to BINARY and VARBINARY, and columns of the latter two types can be demoted to one of the first two. Note that converted character data is not checked to conform to any character set. Any of the types CHAR, VARCHAR, BINARY, and VARBINARY can now be promoted to TEXT or BLOB. When performing such promotions, the only other sort of type conversion that can be performed at the same time is between character types and binary types.
  • Cluster Replication:
  • Trying to use a stale .frm file and encountering a mismatch bewteen table definitions could cause mysqld to make errors when writing to the binary log. (Bug #17250994)
  • Replaying a binary log that had been written by a mysqld from a MySQL Server distribution (and from not a MySQL Cluster distribution), and that contained DML statements, on a MySQL Cluster SQL node could lead to failure of the SQL node. (Bug #16742250)
  • Cluster API:
  • The Event::setTable() method now supports a pointer or a reference to table as its required argument. If a null table pointer is used, the method now returns -1 to make it clear that this is what has occurred. (Bug #16329082)