Performance of importwallet

+2 votes

Hi.

I am working on HA cluster at the moment:

1. two servers (master / slave) are connected and start with the same wallet.dat

2. master checks periodically if result of dumpwallet has changed

3. if true master sends dump to slave

4. slave does an "importwallet"

 

The problem here is that importwallet rescans the complete blockchain.
Rescanning all blocks seems to be quite slow and the multichain rpc is not responding in this period of time.
 

For example:

- 60GB blocks folder
- importwallet with one new privatekey runs since 2h and is still running
- machine is ec2 m5.xlarge
- rpc is not responsive

 

here is the debug.log output

2018-05-03 07:03:26 Skipping import of 1CNpcJb2xBQ15LgNQCKjFBYAZGXabMXXuHZTrh (key already present)
2018-05-03 07:03:26 Skipping import of 19uWPgKbkn4uGBuf3RQFgpUL71XWoghzAgnMJj (key already present)
2018-05-03 07:03:26 Importing 1TaY4J9DsfMBcbY8mTusciKJfnTgD5G1RD3x2c...
2018-05-03 07:03:26 Skipping import of 177Myj4T8aqYrJJ6ZrDP6md85GLzu7g9BYYRYm (key already present)
2018-05-03 07:03:26 Rescanning all 50954 blocks
2018-05-03 08:25:16 Still rescanning. At block 1. Progress=1.000000
2018-05-03 08:26:48 Still rescanning. At block 15. Progress=1.000000
2018-05-03 08:27:57 Still rescanning. At block 30. Progress=1.000000
2018-05-03 08:29:16 Still rescanning. At block 32. Progress=1.000000
2018-05-03 08:31:06 Still rescanning. At block 35. Progress=1.000000

First of all it took nearly 1,5h to actually start scanning. (?)
Then it seems like it needs 10 seconds to scan one single block. (Maybe this is ok since there are about 5000 transactions in a single block).
So it would take nearly a week to rescan all 55000 blocks.

 

My problem is that a cannot use importprivkey with rescan=false, since I do not know if this address has already been used. That's because I only sync the wallet periodically. I am also not able to intercept calls like "getnewaddress".

 

Maybe an option could be that there is a parameter for importwallet (and also importprivkey) that tells multichain how far in the past it should search for an address transaction.

e.g. if I do an hourly sync, only the last e.g. 500 blocks must be rescanned for the new address.
(I also have a timestamp information in the dumpwallet file itself)

for example:

importwallet dump.dat 500
to rescan the last 500 blocks for all new addresses

or 
importprivkey ABC123 true 500
to rescan last 500 blocks for key ABC123
(but even this would still take over an hour)

 

But having a cluster with a HA node that takes about 6 days to catch up with a single address is quite bad.

Same holds true for backup and restore. If after a server crash I need to restore the blockchain, it seems to be reasonable to reindex / rescan the restored chain. But this also shouldn't take a week.

 

In general I am a little bit concerned to hopefully never be forced to do a reindex / rescan on a huge blockchain, since this would take several days to be operational again.

Do you have similar experiences?

 

asked May 3, 2018 by Alexoid
I just tested restarting the node.
It takes only some seconds until the node is up and running. (without any rescan und reindex).This seems to be ok.
Would this be working:

1. slave runs normally
2. slave gets new priv keys
3. slave executs "pause mining,incoming"
4. slave executes "setlastblock X"  where X == (currentBlock - 200)
5. slave executes importprivkey "ABC123" "" false  <- can be sure that there was no transaction since we are 200 blocks ago
6. slave node executes resume mining,incoming
7. slave node catches up and also get all address transactions in the new 200 blocks

I am not sure if "setlastblock" works without reindexing, since mc throws away (rewinds) transactions, but it doesn't throw them away in the index, does it?

2 Answers

+1 vote

So there are two issues here - the time it takes to rescan the blockchain for a new address added via importaddress, and the time it takes to reindex the blockchain after a backup and restore of the blocks only, without any of the up-to-date state derived from those blocks.

We're aware of the rescan issue for addresses, and have also considered idea you suggest of only rescanning a certain number of blocks going back. We just haven't implemented it yet. I'm also not sure if the setlastblock idea you mention in the comments will work – will get back to you on that shortly.

In terms of full reindexing after restoring, I think the answer here is to back up and restore the entire node state, rather than just the blocks. If you only back up the blocks, this is equivalent in the world of regular databases to only backing up a database's transaction log, and not its tables and indexes. The consequence is that restoring the database state means replaying that entire transaction log.

answered May 3, 2018 by MultiChain
Thanks for your reply. I am mostly interested in the importprivkey part since this would help running a cluster and this would help avoiding the whole full backup and restore scenario nearly most of the time.

In a first test it seemed that the "setlastblock" worked. But maybe I did something wrong. Would be great if you could validate this.
Hi.
I found this code, that would maybe exactly address my problem:
https://github.com/MultiChain/multichain/blob/master/src/rpc/rpcdump.cpp#L392

Is there any way to jump into the "} else {" section here and do a selective rescanning?
The good news is I can confirm that the setlastblock workaround should work fine.

But there's no way to safely use that branch you mention – this branch is a remnant of the old wallet implementation that MultiChain now replaced.
Hi.
Thanks for the information.
But is this branch not just using the same function ScanForWalletTransactions and the only difference is that it does not scan from genesis but just from some blocks later?
Yes, it looks like it, but you cannot activate this branch without modifying MultiChain's source code.
Ok, so then this would be a feature request to add a parameter to "importwallet"  that you can do maybe one of the following:
- provide a number of how many blocks to scan backwards
or
-provide a boolean to set if it should use the timestamp in the walletdump file to automatically calculate the blocks to scan backwards (use the else branch)

This would nearly solve the whole cluster / wallet-sync problems and would be cleaner and less error prone than the "setlastblock" stuff.

What do you think?
Thanks for the suggestion, I'll ask about these possibilities and get back to you.
So we had some discussion about this. The new scalable MultiChain wallet format does not have this timestamp. But even the timestamp in the old legacy format from Bitcoin Core is not updated for imported address, so it's not entirely reliable as a basis for rescanning.

There are two viable options: The first is a parameter for the address importing APIs which specifies how far back rescanning goes. This means we rely on the user to be accurate.

The second is that if the chain has anyone-can-receive=false and anyone-can-receive-empty=false, then we know an address cannot be used until it was granted permissions. So we can extract the start block from the permissions database.

Any thoughts?
Hi.

Thank you for your reply.

I think the second option is too restrictive for the user. We do not want to force these settings.

The first options sounds good and should solve the problem. Yes, the user must be accurate here, but if the user is going to use this parameter he should know what he is doing.

Another option would be to have this parameter not just on import privkey but also on the importwallet method. This would make clustering quite easy since the nodes just periodically send the wallet files. On importing, the node knows which addresses are new, and does a "back-scan" using the given parameter for all the new addresses. So a member of such a cluster must only hold the information, when the last wallet-file was send to a specific cluster member. Then the cluster member could calculate the blocks to rescan.

Of course there could in the worst cast be a problem, but then you know what happened and do at a last resort a full reindex on such a cluster member. But this should happen very rarely.

Just as feedback: for the consumer of this method parameter maybe it would be more convenient to not use "number-of-blocks" to scan back, but "timestamp" to scan back / or "seconds" to scan back, so the user does not have to do the timestamp to block calculation. e.g. if you do a cluster syncing all 2 hours you could just set this parameter to for example 2 hours back, and do not need to know anything of block-time etc.


What do you think?

Greetings,
Alex
Thanks for the feedback - will post an update on this shortly.
OK, so it looks like we're going to offer selective rescanning for importaddress, importprivkey and importwallet in 1.0.5 and 2.0 alpha 3. You'll be able to specify this in three ways: (a) starting block number, (b) number of recent blocks to rescan, (c) a timestamp which is compared against block headers.
These are great news. :-)
Thank you, guys, this will help a lot providing and managing a performant multichain cluster and also for other usecases that are based on importing.
0 votes

Version 1.0.5 of MultiChain has just been released, which allows partial rescanning for all import* APIs.

answered Jun 7, 2018 by MultiChain
...