Commit Briefs

b7b095a156 Sergey Bronnikov

httpc: replace ibuf_alloc with xibuf_alloc (ligurio/gh-xxxx-httpc-xibuf_alloc, origin/ligurio/gh-xxxx-httpc-xibuf_alloc)

There is no check for NULL for a value returned by `ibuf_alloc`, the NULL will be passed to `memcpy()` if the aforementioned function will return a NULL. The patch fixes that by replacing `ibuf_alloc` with macros `xibuf_alloc` that never return NULL. Found by Svace. NO_CHANGELOG=codehealth NO_DOC=codehealth NO_TEST=codehealth


4a866f64d6 Serge Petrenko

limbo: speed up synchronous transaction queue processing

This patch optimizes the process of collecting ACKs from replicas for synchronous transactions. Before this patch, collecting confirmations was slow in some cases. There was a possible situation where it was necessary to go through the entire limbo again every time the next ACK was received from the replica. This was especially noticeable in the case of a large number of parallel synchronous requests. For example, in the 1mops_write bench with parameters --fibers=6000 --ops=1000000 --transaction=1, performance increases by 13-18 times on small clusters of 2-4 nodes and 2 times on large clusters of 31 nodes. Closes #9917 NO_DOC=performance improvement NO_TEST=performance improvement


58f3c93b66 Serge Petrenko

vclock: introduce `vclock_nth_element` and `vclock_count_ge`

Two new vclock methods have been added: `vclock_nth_element` and `vclock_count_ge`. * `vclock_nth_element` takes n and returns whatever element would occur in nth position if vclock were sorted. This method is very useful for synchronous replication because it can be used to find out the lsn of the last confirmed transaction - it's simply the result of calling this method with argument {vclock_size - replication_synchro_quorum} (provided that vclock_size >= replication synchro quorum, otherwise it is obvious that no transaction has yet been confirmed). * `vclock_count_ge` takes lsn and returns the number of components whose value is greater than or equal to lsn. This can be useful to understand how many replicas have already received a transaction with a given lsn. Part of #9917 NO_CHANGELOG=Will be added in another commit NO_DOC=internal


f0f9647d8b Serge Petrenko

replication: prohibit roll back due to `replication_synchro_timeout`

To better match the canonical Raft design, this patch prohibits automatic transaction rollback due to `replication.synchro_timeout`. A new compat option has been added for this purpose. The compat option is named `compat.replication_synchro_timeout` and is `'old'` by default. When set to 'new', the `replication.synchro_timeout` option has slightly different semantics. With this semantics, transactions are no longer rolled back at this timeout, `replication.synchro_timeout` is used only to wait confirmation in promote/demote and gc-checkpointing. If some transaction in limbo did not have time to commit within `replication_synchro_timeput`, the corresponding operation: promote/demote or gc-checkpointing can be aborted automatically (in this aspect, the behavior of the option is no different from what it was before). If 'old' is set, the option has the same semantics as before. In order to be able to understand from the code what value the `compat.replication_synchro_timeout` option is set to - 'old' or 'new', a special Boolean tweak `replication_synchro_timeout_enabled` was introduced. Note that PROMOTE and DEMOTE can still rollback a transaction. Only the ability to rollback by timeout has been prohibited. Closes #7486 @TarantoolBot document Title: new compat option: 'compat.replication_synchro_timeout' Product: Tarantool Since: 3.3 Root document: New page - https://www.tarantool.io/en/doc/latest/reference/reference_lua/compat/replication_synchro_timeout/ The `compat` module allows you to choose between: * the old behavior: unconfirmed synchronous transactions are rolled back after a `replication.synchro_timeout`. * and the new behavior: A synchronous transaction can remain in the synchro queue indefinitely until it reaches a quorum of confirmations. `replication.synchro_timeout` is used only to wait confirmation in promote/demote and gc-checkpointing. If some transaction in limbo did not have time to commit within `replication_synchro_timeput`, the corresponding operation: promote/demote or gc-checkpointing can be aborted automatically.


e319c21ca3 Serge Petrenko

limbo: introduce limits on synchro queue

Two new fields added to the structure: the `size` counter and the `max_size` limit (both in bytes). And also added the corresponding configuration parameter: `replication.synchro_queue_max_size`. The counter is increased on every enqueued `txn_limbo_entry`, and decreased once an entry leaves the `txn_limbo.queue`. Also, the `approx_len` field has been added to the `txn_limbo_entry` structure, so that at the time of adding/deleting an entry to the queue, we have access to the size of the corresponding entry in the journal. This limitation only applies to the master queue. Once the size of master queue reaches the maximum value, txn_limbo blocks incoming requests until some of the transactions in the queue have a quorum of confirmations and there is free space. This limitation does not apply during the recovery process, because otherwise tarantool may fail during the process of the xlog files, if limbo queue size exceeds `replication.synchro_queue_max_size` and user will have to pick up the correct value of the `replication.synchro_queue_max_size` option in order to recover from his xlogs. The size limit isn't strict, i.e. if there's at least one free byte, the whole entry fits and no blocking is involved. Part of #7486 NO_CHANGELOG=Will be added in another commit @TarantoolBot document Title: new configuration option: 'replication.synchro_queue_max_size' Product: Tarantool Since: 3.3 Root document: https://www.tarantool.io/en/doc/latest/reference/configuration/configuration_reference/ `replication.synchro_queue_max_size` puts a limit on the number of transactions in the master synchronous queue. `replication.synchro_queue_max_size` is measured in number of bytes to be written (0 means unlimited, which was the default behaviour before). This option affects only the behavior of the master, and defaults to 16 megabytes. Now that `replication.synchro_queue_max_size` is set on the master node, tarantool will discard new transactions that try to queue after the limit is reached. If a transaction had to be discarded, user will get an error message "The synchronous transaction queue is full". This limitation does not apply during the recovery process. The current synchro queue size can be known using `box.info.synchro.queue.size`: ```lua tarantool> box.info.synchro --- - queue: owner: 1 size: 60 busy: false len: 1 term: 2 quorum: 2 ... ``` [box-info-synchro] https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_info/synchro/