Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression tests and fix for issue 116 #142

Open
wants to merge 1 commit into
base: 5.6
Choose a base branch
from

Conversation

crispweed
Copy link

(Unnecessary certification failures on TOI DDLs)

It looks like the provider api doesn't permit TOI replication without any keys,
which is what we would ideally do here, so we instead just supply a table key that should not match any existing table.

The regression tests and fix target 3 specific table creation code paths (and do not address alter statements or other potential related issues):

  1. Basic CREATE TABLE statements
  2. CREATE TABLE LIKE, where the source table is a temporary table
  3. CREATE TABLE LIKE, where the source table is not a temporary table

(Unnecessary certification failures on TOI DDLs)

It looks like the provider api doesn't permit TOI replication *without any keys*,
which is what we would ideally do here, so we instead just supply a table key that should not match any existing table.

The regression tests and fix target 3 specific table creation code paths (and do not address alter statements or other potential related issues):

1. Basic CREATE TABLE statements
2. CREATE TABLE LIKE, where the source table is a temporary table
3. CREATE TABLE LIKE, where the source table is not a temporary table
@philip-galera
Copy link
Contributor

Hello. Thank you very much for pushing the tests. I have a few questions:

1 What about CREATE TABLE ... SELECT . Does it matter that the SELECT part involves tables?

  1. Is it possible that concurrent execution of the code to continue to cause conflicts but not on the original table name but rather on the new, made up, WSREP_NONEXISTANT_TABLE?

@crispweed
Copy link
Author

Hi Philip,

Regarding CREATE TABLE AS SELECT, this is not addressed by the pull request currently, since as far as I can make out this doesn't actually appear to be set up correctly for replication.

Try doing the following on a test cluster, for example:

create table t1 (f1 integer) ENGINE=InnoDB;
create table t2 as (select * from t1);

Regarding your question about potential new conflicts on WSREP_NONEXISTANT_TABLE, I don't think this is an issue because this could only ever happen between create table statements, which are TOI anyway. (Definitely something to be careful for, through, if ever generalising this approach. Fixing the provider to support transactions without any keys would avoid this kind of concern and be generally cleaner, if that is possible!)

@temeo
Copy link
Contributor

temeo commented Jun 5, 2015

Unfortunately this fix is not safe. With the following MTR test

--source include/galera_cluster.inc
--source include/have_innodb.inc
--source include/have_debug_sync.inc
--source suite/galera/include/galera_have_debug_sync.inc

SET GLOBAL wsrep_debug=ON;
SET GLOBAL wsrep_sync_wait=0;
SET GLOBAL wsrep_retry_autocommit=0;

#
# Make sure that there are at least two slave threads on node2,
# create debug sync point on apply monitor
#
--connection node_2
SET GLOBAL wsrep_debug=ON;
SET GLOBAL wsrep_provider_options = 'debug=1';
SET SESSION wsrep_sync_wait=0;
--let $wsrep_slave_threads_orig = `SELECT @@wsrep_slave_threads`
SET GLOBAL wsrep_slave_threads = 2;
SET GLOBAL wsrep_provider_options = 'debug=1;dbug=d,apply_monitor_slave_enter_sync';

#
# Issue CREATE TABLE on node1 and wait until it arrives at apply monitor
# sync point on node2
#
--connection node_1
CREATE TABLE t1 (f1 INTEGER) ENGINE=InnoDB;

--connection node_2
SELECT * FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_debug_sync_waiters';
--let $wait_condition = SELECT VARIABLE_VALUE = 'apply_monitor_slave_enter_sync' FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_debug_sync_waiters';
--source include/wait_condition.inc

#
# Change sync point to commit monitor and issue INSERT on node1
#

SET GLOBAL wsrep_provider_options = 'dbug=';
SET GLOBAL wsrep_provider_options = 'dbug=d,commit_monitor_slave_enter_sync';

#
# Issue INSERT on node1
#

--connection node_1
INSERT INTO t1 VALUES (1);


#
# Wait until insert reaches commit monitor
#
--connection node_2
--let $wait_condition = SELECT VARIABLE_VALUE = 'apply_monitor_slave_enter_sync commit_monitor_slave_enter_sync' FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_debug_sync_waiters';
--source include/wait_condition.inc

SET GLOBAL wsrep_provider_options = 'dbug=';
SET GLOBAL wsrep_provider_options = 'signal=commit_monitor_slave_enter_sync';
SET GLOBAL wsrep_provider_options = 'signal=apply_monitor_slave_enter_sync';

--let $wait_condition = SELECT COUNT(*) FROM t1 = 1
--source include/wait_condition.inc

--eval SET GLOBAL wsrep_slave_threads = $wsrep_slave_threads_orig;

--connection node_1
DROP TABLE t1;

and the following patch to galera:

diff --git a/galera/src/replicator_smm.hpp b/galera/src/replicator_smm.hpp
index a3dc977..fe31fba 100644
--- a/galera/src/replicator_smm.hpp
+++ b/galera/src/replicator_smm.hpp
@@ -280,14 +280,19 @@ namespace galera
 #ifdef GU_DBUG_ON
             void debug_sync(gu::Mutex& mutex)
             {
+                unlock();
+                mutex.unlock();
                 if (trx_.is_local())
                 {
-                    unlock();
-                    mutex.unlock();
                     GU_DBUG_SYNC_WAIT("apply_monitor_enter_sync");
-                    mutex.lock();
-                    lock();
                 }
+                else
+                {
+                    GU_DBUG_SYNC_WAIT("apply_monitor_slave_enter_sync");
+                }
+                mutex.lock();
+                lock();
+
             }
 #endif // GU_DBUG_ON

@@ -357,14 +362,18 @@ namespace galera
 #ifdef GU_DBUG_ON
             void debug_sync(gu::Mutex& mutex)
             {
+                unlock();
+                mutex.unlock();
                 if (trx_.is_local())
                 {
-                    unlock();
-                    mutex.unlock();
                     GU_DBUG_SYNC_WAIT("commit_monitor_enter_sync");
-                    mutex.lock();
-                    lock();
                 }
+                else
+                {
+                    GU_DBUG_SYNC_WAIT("commit_monitor_slave_enter_sync");
+                }
+                mutex.lock();
+                lock();
             }
 #endif // GU_DBUG_ON

it is possible to crash slave reliably with error

2015-06-05 12:19:18 21439 [Warning] WSREP: BF applier failed to open_and_lock_ta
bles: 1146, fatal: 0 wsrep = (exec_mode: 1 conflict_state: 0 seqno: 2)
2015-06-05 12:19:18 21439 [ERROR] Slave SQL: Error executing row event: 'Table '
test.t1' doesn't exist', Error_code: 1146
2015-06-05 12:19:18 21439 [Warning] WSREP: RBR event 3 Write_rows apply warning:
 1146, 2
2015-06-05 12:19:18 21439 [Warning] WSREP: galera/src/replicator_smm.cpp:apply_trx_ws():78: Failed to apply app buffer: seqno: 2, status: 1
         at galera/src/trx_handle.cpp:apply():351
         at galera/src/replicator_smm.cpp:apply_trx_ws():39

This indicates that dependency calculation in certification on galera side is not able to figure out dependency between TOI CREATE TABLE and following autocommit transaction on that table.

@crispweed
Copy link
Author

Ok, thanks Teemu!
So the table key is used not just for detecting 'collisions' with other transaction write sets, but also for determining dependencies between TOI and non-TOI transactions?

@ayurchen
Copy link
Member

ayurchen commented Jun 5, 2015

Unfortunately, yes. It looks like this hackish fix does not actually work and we'll need to devise a proper one. But I feel that it will have to be done at the cost of complicating the wsrep API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants