Commits


vinyl test


replication: append NOP as the last tx row Since we stopped sending local space operations in replication, the last tx row has to be global in order to preserve tx boundaries on replica. If the last row happens to be a local one, replica will never receive the tx end marker, yielding the following errors: `ER_UNSUPPORTED: replication does not support interleaving transactions`. In order to fix the problem append a global NOP row at the tx end if it happens to end on a local row. Follow-up #4114 Closes #4928 Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>


wal: fix tx boundaries In order to preserve transaction boundaries in replication protocol, wal assigns each tx row a transaction sequence number (tsn). Tsn is equal to the lsn of the first transaction row. Starting with commit 7eb4650eecf1ac382119d0038076c19b2708f4a1, local space requests are assigned a special replica id, 0, and have their own lsns. These operations are not replicated. If a transaction starting with a local space operation ends up in the WAL, it gets a tsn equal to the lsn of the local space request. Then, during replication, when such a transaction is replicated, the local space request is omitted, and replica receives a global part of the transaction with a seemingly random tsn, yielding an ER_PROTOCOL error: "Transaction id must be equal to LSN of the first row in the transaction". Assign tsn as equal to the lsn of the first global row in the transaction to fix the problem, and assign tsn as before for fully local transactions. Follow-up #4114 Part-of #4928 Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>


applier: fix tx boundary check for half-applied txns In case there are 2 "new" instances, running tarantool 2.2+, master and replica, and one "old" instance, running an earlier tarantool version, in a full-mesh cluster, it may happen that the "new" replica receives part of a tx from an "old" instance, and the remaining part from a "new" instance. Since "new" instances preserve tx boundaries, "new" replica would skip the tx remains assuming it has already applied the full tx if it has applied the first tx row. This leads to gaps in "new" replica's WAL and to skipping the remaining part of the tx forever. Fix this behaviour to apply the full tx even if it's beginning is already applied in mixed clusters. Closes #5125


test: fix flaky box/net.box_readahead_gh-3958 test Issue: [014] --- box/net.box_readahead_gh-3958.result Mon Jun 15 15:33:23 2020 [014] +++ box/net.box_readahead_gh-3958.reject Tue Jun 16 02:24:04 2020 [014] @@ -46,6 +46,7 @@ [014] ... [014] test_run:wait_log('default', 'readahead limit is reached', 1024, 0.1) [014] --- [014] +- readahead limit is reached [014] ... [014] s:drop() [014] --- [014] [014] Last 15 lines of Tarantool Log file [Instance "box"][/tarantool/test/var/014_box/box.log]: [014] 2020-06-16 02:24:03.792 [5585] main/121/console/unix/: I> set 'read_only' configuration option to false [014] 2020-06-16 02:24:03.834 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.835 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.835 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.951 [5585] main/121/console/unix/: space.h:336 E> ER_NO_SUCH_INDEX_ID: No index #1 is defined in space '_space' [014] 2020-06-16 02:24:04.180 [5585] main/121/console/unix/: I> set 'readahead' configuration option to 128 [014] 2020-06-16 02:24:04.183 [5585] main/121/console/unix/: I> set 'readahead' configuration option to 102400 [014] 2020-06-16 02:24:04.189 [5585] main/453/console/unix/: I> set 'readahead' configuration option to 16320 Found that the root cause of the issue, was the previously run test 'box/net.box_call_blocks_gh-946.test.lua' on the same worker, in this case the log output mistakenly checked by wait_log/grep_log test_run function, which finds the grepping string in the log of the previous test. To avoid of it the tests can be swapped in worker running queue and in this case both tests pass, check swapped log output: 2020-06-17 10:57:39.881 [69372] main C> entering the event loop 2020-06-17 10:57:39.896 [69372] main/119/console/unix/: I> set 'readahead' configuration option to 128 2020-06-17 10:57:39.898 [69372] main/119/console/unix/: I> set 'readahead' configuration option to 102400 2020-06-17 10:57:40.003 [69372] main/156/console/unix/: I> set 'readahead' configuration option to 16320 2020-06-17 10:57:40.053 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.056 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.056 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.058 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.058 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.061 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.061 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.062 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.062 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.063 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.067 [69372] main C> got signal 15 - Terminated Also found that 'readahead' issue from the first test blocks its printing to log file due to suppressed. To fix this issue the default server must be restarted at the very start of the test. Closes #5082