In a network where all nodes are generating transaction why one node performs much better than other nodes?

+3 votes
I have set up a small network of 5 nodes. I am making all 5 of them generate transactions to their maximum capability by using a load testing tool for a 5 minute time duration. For all of my experiments, one node outperforms the other with around 100-110 tps, but all the other nodes only achieve less than 15tps. I have made surer that all the nodes start generating transactions at the exact same time. All the nodes have exactly the same specs and the same programs are running on them during the experiment. I don't understand why this is happening.
asked Oct 26, 2022 by maheen.ayesha
Which MultiChain version are you using?
multichain v2.1.2

1 Answer

0 votes
This is probably related to one node doing all the mining, and being run at saturated capacity so it doesn't get a chance to allow transactions from other nodes. (It also seems like you are running these nodes on weak cloud instances?) Anyway, version 2.3.1 which is coming out shortly should address this scenario.
answered Nov 2, 2022 by MultiChain
I am running multichain on Desktop PCs  with 16gb ram and corei7 11th gen. Communication is done via LAN. Also I have given 5 nodes permission to mine. I am also logging all the blocks mined and I can see different nodes mining blocks. My confusion here is that all the nodes are running at their maximum capacity then why other nodes node don't get chance. Shouldn't all of them get affected and show mid range throughputs. Right now I get one node with very high throughput and the other with very low throughput.
Thanks, are you on Windows or Linux?
All PCs are on Windows 10.
OK, we know that Windows performance is not as good as Linux. Please wait for 2.3.1 to come out next week, it improves parallel processing of transactions and blocks, and then assess the performance again.
I was looking at this discussion regarding ou issue.
We have mining diversity set to 0.3 with anyone-can-mine=false and all miners have permission to mine in a network of 5 nodes. We just updated to v2.3.1. We have enabled blocknotify with getblockinfo logging. We do load testing in  5 minute intervals on all nodes simultaneously but our results are very weird. The admin nodes always has high performance around 80-90 tps. Only two nodes are mining blocks , one of these has been giving a constant 3.5tps and the other one has throuput in 5-10tps range. The other two nodes are not mining any blocks and they show around 7-4tps. Why is there so much varaition in results when all the nodes are started at exactly the same time and have the same specs. We want to see load distrubution with all the nodes mining and sharing the load while mainting a mid range throuput.
The mining can be explained easily – by default the network will stick to a small group of miners, to minimize clashes. See the details about mining-turnover on this page:

Regarding the throughput, can you please tell me what these transactions contain? Are they small asset transfers or publishing large pieces of data?
We are sending a constant 1KB of text data in a simple http request. My question is still that why one node is outperforming the others, and most often it is the admin node. I understand that a small subset of nodes are mining but despite that, the performance should be comparable. One node having such an edge and the rest being so chocked. I want to avoid this situation.
OK, a few more questions to help diagnose this:

a) Are all the nodes connecting to each other, or is there a bottleneck where all connections go through one node, perhaps due to firewalls? (check getpeerinfo)

b) Are you measuring tps in terms of the rate at which each node's API returns from the command which creates the transactions, or in terms of how quickly the transactions are subsequently confirmed on the chain?

c) Are the nodes all publishing from different private keys / addresses, or have you been sharing private keys?
a) Yes all nodes are connected peer to peer. All 5 nodes display the ip addresses of the other 4 nodes(except itself) in getpeerinfo.

b) We are measuring in tps terms of the API response. We send API request using Jmeter and each request has the same 1kb of text data. Jmeter logs the API response time of this request. We get the latency and tps values from there.

c) Yes all nodes are using different addresses and different private keys. We checked with listaddresses and then dumpprivkey. All were different
Thanks, I've discussed this with the development team. We have a few theories, you can say if any of them is true:

a) Some nodes are subscribed to the stream and some are not (the subscribed ones can be slower)
b) Some nodes have a full memory pool, see getmempoolinfo output
c) The nodes are not syncing, check that getblockcount matches

You can also see if there are any hints in debug.log as to why this is happening.

For our information, it would also be helpful to understand *exactly* what you're doing in these transactions. Which API are you calling and what are its parameters?
Thank you for these suggestions. We are basically using the publish API. We have not created a separate stream, so we pass in "root",key, our json object to it. This API request is sent automatically from jmeter and we run Jmeter for 5 minutes on all nodes simultaneously.

a) We have not created a separate stream. We use root and I think it already has permissions to write. read, publish. None of the nodes are subscribed to any other stream.

b) We use blocknotify and run getmempoolinfo and getlastblockinfo whenever blocknofity is triggered. The getmempoolinfo results show that it starts from 0 and then gradually increases and then goes back to 0. These values are different on all nodes, but all go down to 0. We start the next experiment only after is has gone back to 0 on all nodes.

c) We ran getlastblockinfo and we got the same value on all nodes. However, in our results of getlastblockinfo, there are some differences. Some blocks are skipped or out of order in some nodes.

For the debug file I am not sure what to look for. Mostly it says commit transaction.
Also our mining diversity is set to 0.3, mining turnover and mine-empty-round is 0. Anyone-can-mine is also false.
Thanks. I assume nothing else is running on these systems which is slowing them down, i.e. you've checked the Windows Task Manager / Performance Monitor / etc...? Would you be interested in trying it out on Linux instead, or are you committed to using Windows?
Yes, No other programs are running on these pcs, only multichain, our backend server and jmeter.
It will not be feasible for us to move to linux at this stage. We are open any suggestions to improve performance on this setup.
I also wanted to ask about parallel processing of v2.3.1. We have not noticed any changes. Does it need to be enabled specially?
Thanks for your reply. We are still mystified by this. Some more ideas from the team:

a) Perhaps the set of unspent transactions is larger on some nodes than others, which can slow down transaction creation. You can check this by running "listunspent 0" on each node and seeing the number of items in the result.

b) Please confirm all nodes are subscribed to the root stream, so the difference cannot be explained in terms of this subscription status. Look for the 'subscribed' field in the response from the liststreams command.

c) Please try running multichaind with the extra parameter -logcommittx=0 – this will reduce the amount of logging and this could make a difference depending on disk performance and disk driver configuration.

d) Please confirm using an activity monitor / task manager on every Windows computer that there is no process which is taking up a lot of CPU or memory. Sometimes there are background processes on surprises, especially on Windows.