Blocks with mining permissions are not mining at all. A running multichain network stopped connecting to its peers.

+1 vote
I had a Testchain running for a few weeks without any problems with three mining nodes and two non-mining nodes and now I cannot get it running again.  What happened was two mining nodes went down because those machines needed to be rebooted. The system ended up running on one mining node for a few minutes which was never rebooted and then the last mining node for some reason also stopped mining on its own.

I tried putting Genesis back into the chain and re-indexing the nodes but none of the nodes will mine or connect to each other. After three weeks the multichain blockchain is dead.  How do you get things back up when none of the mining nodes including Genesis will mine again? I added all the nodes again to each other and when I getpeerinfo I get nothing, no nodes connect to other nodes. The entire blockchain just fell apart and will not restart.
asked Sep 20, 2016 by dtarsio

1 Answer

0 votes

The last mining node probably stopped mining because of the mining-diversity rules in the chain, which place a limit on how many blocks can be mined by a single node.

But what do you mean by "putting genesis back into the chain"? The way to recover from this situation would be to bring up another of the mining nodes with -reindex=-1 but it is crucial not to overwrite the wallet files, because these contain the keys that give the node mining permission. Are your IP addresses also the same? If not you'll need to call addnode to help the nodes find each other.

answered Sep 21, 2016 by MultiChain
Keep in mind this same set of nodes was running fine for around three weeks with a lot of test transactions. The block count was over 100,000.  All the nodes were talking to each other before the crash. It went down because a few of the mining nodes had to be rebooted. One of the mining nodes I rebooted because I encrypted the wallet and it forces you to restart multichaind after wallet encryption.
  
I have tried everything you indicated here. I have five nodes all with different IPs four of which have mining permissions. I referred to Genesis because that node was offline and I brought it back online after the crash to see if it would help recover the blockchain.  

Using multichaind I tried setting -reindex=1 on three of the nodes and -miningrequirespeers=0. The miningrequirespeers does not seem to work. I am not sure if that override is functional. I think that really needs to be checked because it does nothing. Then I addnode and try to link all the nodes together again. It still will not see the peers and none of the nodes will start mining again.

Params.dat file variables:
mining-diversity = 0.3
mining-requires-peers = true

I also tried adding another node and gave it mining permissions. The new node will connect with Genesis but also like the others it will not mine and after using addnode it will not connect to any of the other nodes in the network.

Not being able to recover a blockchain crash like this is a bit of a worry in terms of putting multichain into a live environment. The scenario of mining nodes going down and maybe even all nodes going down is possible. The question is how do we recover.
OK, thanks for the comprehensive response. We will need to look at this in detail in order to understand what happened.

If you're willing to help, can you please stop all the nodes, then zip up their blockchain directories (~/.multichain/[chain-name] on each), and send everything to multichain.debug at gmail dot com. It might need to be one archive per email message, depending on the size. It would also be helpful if you could indicate for each archive the IP address of the server it was running on, so we can match the peers tracked by each node with the contents of the other nodes.

If the archives end up too large to send by email, we can also start with just the debug.log files from each node, and see what we learn from those.
Sure no problem I will send you the files.
I think I see whats causing the blockchain to lock up. I had about 7 nodes running for several weeks without a problem and then things just locked up.  I know that something changed in order to bring the whole thing down. Then I realized what I changed just before everything locked up. I started encrypting each wallet.  After encryption the wallet would prompt for a  restart and it will never connect to the chain again.

In order to verify this strange behavior I started a new chain with Genesis and three mining nodes.  I let it mine past the initial setup blocks and then I encrypted one of the wallets. Well after I encrypted the wallet I restarted the node and what do you know it no longer connects to the other nodes and no longer mines.  I tried adding the other nodes again with addnode and its just a dead wallet that cannot see any more nodes.  Give it a try on your end I think you might see whats going on here. Something is happening when I encrypt a wallet that locks it out of the chain.
I unlocked the wallet for 10 minutes and then it does finally find nodes and starts mining again but as soon as it locks it stops mining.  I am not sure if this is normal operation but it seems weird that an encrypted locked wallet will not mine.
That makes perfect sense, because mining requires blocks to be signed (using a key in the wallet) and connecting to another node requires a challenge message to be signed (again, using a key in the wallet). So if the wallet is encrypted and not currently unlocked, the node can't do either of these things. Will discuss with the team how we can provide more warnings about this in future, but the basic problem will remain - a node can't sign things if its wallet is locked.
Yup that is what I thought. At least I found out what caused the chain to stop functioning.
...