Commits
- Commit:
b7b095a1563f47105aa86b80b1fd8f8fe5df9915
- From:
- Sergey Bronnikov <sergeyb@tarantool.org>
- Date:
httpc: replace ibuf_alloc with xibuf_alloc
There is no check for NULL for a value returned by `ibuf_alloc`,
the NULL will be passed to `memcpy()` if the aforementioned
function will return a NULL. The patch fixes that by replacing
`ibuf_alloc` with macros `xibuf_alloc` that never return NULL.
Found by Svace.
NO_CHANGELOG=codehealth
NO_DOC=codehealth
NO_TEST=codehealth
- Commit:
4a866f64d64c610a3c8441835fee3d8dda5eca71
- From:
- Astronomax <fgfgfb93@gmail.com>
- Via:
- Serge Petrenko <35663196+sergepetrenko@users.noreply.github.com>
- Date:
limbo: speed up synchronous transaction queue processing
This patch optimizes the process of collecting ACKs from replicas for
synchronous transactions. Before this patch, collecting confirmations
was slow in some cases. There was a possible situation where it was
necessary to go through the entire limbo again every time the next ACK
was received from the replica. This was especially noticeable in the
case of a large number of parallel synchronous requests.
For example, in the 1mops_write bench with parameters --fibers=6000
--ops=1000000 --transaction=1, performance increases by 13-18 times on
small clusters of 2-4 nodes and 2 times on large clusters of 31 nodes.
Closes #9917
NO_DOC=performance improvement
NO_TEST=performance improvement
- Commit:
58f3c93b660499e85f08a4f63373040bcae28732
- From:
- Astronomax <fgfgfb93@gmail.com>
- Via:
- Serge Petrenko <35663196+sergepetrenko@users.noreply.github.com>
- Date:
vclock: introduce `vclock_nth_element` and `vclock_count_ge`
Two new vclock methods have been added: `vclock_nth_element` and
`vclock_count_ge`.
* `vclock_nth_element` takes n and returns whatever element would occur in
nth position if vclock were sorted. This method is very useful for
synchronous replication because it can be used to find out the lsn of the
last confirmed transaction - it's simply the result of calling this
method with argument {vclock_size - replication_synchro_quorum} (provided
that vclock_size >= replication synchro quorum, otherwise it is obvious
that no transaction has yet been confirmed).
* `vclock_count_ge` takes lsn and returns the number of components whose
value is greater than or equal to lsn. This can be useful to understand
how many replicas have already received a transaction with a given lsn.
Part of #9917
NO_CHANGELOG=Will be added in another commit
NO_DOC=internal
- Commit:
f0f9647d8b80a2bad56b9462fff28b28af2203aa
- From:
- Astronomax <fgfgfb93@gmail.com>
- Via:
- Serge Petrenko <35663196+sergepetrenko@users.noreply.github.com>
- Date:
replication: prohibit roll back due to `replication_synchro_timeout`
To better match the canonical Raft design, this patch prohibits automatic
transaction rollback due to `replication.synchro_timeout`. A new compat
option has been added for this purpose. The compat option is named
`compat.replication_synchro_timeout` and is `'old'` by default. When set
to 'new', the `replication.synchro_timeout` option has slightly different
semantics. With this semantics, transactions are no longer rolled back at
this timeout, `replication.synchro_timeout` is used only to wait
confirmation in promote/demote and gc-checkpointing. If some transaction
in limbo did not have time to commit within `replication_synchro_timeput`,
the corresponding operation: promote/demote or gc-checkpointing can be
aborted automatically (in this aspect, the behavior of the option is no
different from what it was before). If 'old' is set, the option has the
same semantics as before.
In order to be able to understand from the code what value the
`compat.replication_synchro_timeout` option is set to - 'old' or 'new',
a special Boolean tweak `replication_synchro_timeout_enabled` was
introduced.
Note that PROMOTE and DEMOTE can still rollback a transaction. Only the
ability to rollback by timeout has been prohibited.
Closes #7486
@TarantoolBot document
Title: new compat option: 'compat.replication_synchro_timeout'
Product: Tarantool
Since: 3.3
Root document: New page - https://www.tarantool.io/en/doc/latest/reference/reference_lua/compat/replication_synchro_timeout/
The `compat` module allows you to choose between:
* the old behavior: unconfirmed synchronous transactions are rolled back
after a `replication.synchro_timeout`.
* and the new behavior: A synchronous transaction can remain in the synchro
queue indefinitely until it reaches a quorum of confirmations.
`replication.synchro_timeout` is used only to wait confirmation
in promote/demote and gc-checkpointing. If some transaction in limbo
did not have time to commit within `replication_synchro_timeput`,
the corresponding operation: promote/demote or gc-checkpointing
can be aborted automatically.
- Commit:
e319c21ca3520de83de9faec1b722dc5da5d776f
- From:
- Astronomax <fgfgfb93@gmail.com>
- Via:
- Serge Petrenko <35663196+sergepetrenko@users.noreply.github.com>
- Date:
limbo: introduce limits on synchro queue
Two new fields added to the structure: the `size` counter and the
`max_size` limit (both in bytes). And also added the corresponding
configuration parameter: `replication.synchro_queue_max_size`.
The counter is increased on every enqueued `txn_limbo_entry`, and
decreased once an entry leaves the `txn_limbo.queue`. Also, the
`approx_len` field has been added to the `txn_limbo_entry` structure,
so that at the time of adding/deleting an entry to the queue, we have
access to the size of the corresponding entry in the journal.
This limitation only applies to the master queue. Once the size of master
queue reaches the maximum value, txn_limbo blocks incoming requests until
some of the transactions in the queue have a quorum of confirmations and
there is free space.
This limitation does not apply during the recovery process, because
otherwise tarantool may fail during the process of the xlog files, if limbo
queue size exceeds `replication.synchro_queue_max_size` and user will have
to pick up the correct value of the `replication.synchro_queue_max_size`
option in order to recover from his xlogs.
The size limit isn't strict, i.e. if there's at least one free byte, the
whole entry fits and no blocking is involved.
Part of #7486
NO_CHANGELOG=Will be added in another commit
@TarantoolBot document
Title: new configuration option: 'replication.synchro_queue_max_size'
Product: Tarantool
Since: 3.3
Root document: https://www.tarantool.io/en/doc/latest/reference/configuration/configuration_reference/
`replication.synchro_queue_max_size` puts a limit on the number of
transactions in the master synchronous queue.
`replication.synchro_queue_max_size` is measured in number of bytes to
be written (0 means unlimited, which was the default behaviour before).
This option affects only the behavior of the master, and defaults to
16 megabytes.
Now that `replication.synchro_queue_max_size` is set on the master node,
tarantool will discard new transactions that try to queue after the limit
is reached. If a transaction had to be discarded, user will get an error
message "The synchronous transaction queue is full".
This limitation does not apply during the recovery process.
The current synchro queue size can be known using
`box.info.synchro.queue.size`:
```lua
tarantool> box.info.synchro
---
- queue:
owner: 1
size: 60
busy: false
len: 1
term: 2
quorum: 2
...
```
[box-info-synchro] https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_info/synchro/