Multichain Cluster stops mining

+2 votes

Hi

We discovered an issue yesterday whereby are test cluster stopped mining.  It had been running for about a week without issue beforehand.  After a number of failed attempts to get the nodes mining again we cleared it down and started it up fresh, after a short space of time the issue occurred again. 

Our working theory at the minute is:

Our automated tests create new addresses/keys per run and were assigning these new addresses mining permission.  The only mining parameter we had changed on our setup was admin-consensus-mine which was set to 0.  Is it possible that all these additional addresses/keys on the node were affecting the rules governing mining, specifically related to the mining-diversity setting?

I have now tuned this down to zero and keeping an eye on the newly created cluster.


A couple of questions:

Do you think this was the issue, and if so then there's probably some guidance required for the proper configuration of your cluster with respect to the mining config vs nodes etc?

Secondly, how could we have recovered from this.  In the end we cleared down the cluster because we couldn't get it mining.  All transactions were just sitting in the mempool.  If this happened in a production environment, blowing it away is not really an option :)

Assuming the issue was caused by these additional mining permissions, would revoking them have recovered the cluster?

If this is a misconfiguration issue rather than a bug/problem, some sort of log or message from the nodes would have been really useful given the amount of time we spent trying to resolve the issue.

If you think this shouldn't have been an issue, have you any other suggestions as to what the problem might have been.  I backed up the volume of one of the nodes and should be able to start it up again to investigate further?

 

Marty

asked Sep 6, 2017 by marty
edited Sep 6, 2017 by marty

1 Answer

0 votes

The mining-diversity setting defines the proportion of permissioned miners who have to participate in a round-robin pattern in order to render a chain (or fork) valid. This is the mechanism for Byzantine Fault Tolerance, or preventing minority control over the network. So if you are assigning mining permissions to new addresses, but the private keys for those new addresses aren't owned by any active online nodes, then you would indeed expect mining to freeze up once there are no longer enough actively mining addresses to reach the mining-diversity setting.

In terms of how to recover from this problem, there is a rollback mechanism in MultiChain, which (by design) requires significant cooperation between admins and mining nodes. We'll be documenting this as part of more general production and maintenance guidelines over the coming months.

answered Sep 7, 2017 by MultiChain
If we revoked the mining permissions would this have got the nodes going again, or would those revokes never get actioned because the associated transactions would never get mined :)
The rules for who can mine a block are based on the transactions in the previous blocks only. So I'm afraid this would not help in your case.
I believe I have this issue, but cannot be sure.
@Multichain Could you please point me to the rollback mechanism ?
...