Node consuming 4-10% of CPU doing nothing?

+3 votes
Hi

I am using "MultiChain 1.0 Daemon (protocol 10008)".

I am running a multichain node on two servers that are identical in hardware and setup.

On one server, multichaind consumes 4-10% of the CPU constantly. On the other it consumes 0.5-1% of CPU constantly.

I enabled "-debug" on the server where multichaind consumes 4-10%. However, the debug log just contains an endless repetition of this:

2019-01-23 08:26:34 received: pong (8 bytes) peer=1
2019-01-23 08:26:35 mcnet: received header: ping
2019-01-23 08:26:35 mcnet: complete message: ping, peer=1
2019-01-23 08:26:35 mchn: RECV: ping, peer=1
2019-01-23 08:26:35 received: ping (8 bytes) peer=1
2019-01-23 08:26:35 sending: pong mchn: SEND: pong
2019-01-23 08:26:35 (8 bytes) peer=1
2019-01-23 08:26:44 sending: ping mchn: SEND: ping
2019-01-23 08:26:44 (8 bytes) peer=1
2019-01-23 08:26:44 mcnet: received header: pong
2019-01-23 08:26:44 mcnet: complete message: pong, peer=1
2019-01-23 08:26:45 mchn: RECV: pong, peer=1
2019-01-23 08:26:45 received: pong (8 bytes) peer=1
2019-01-23 08:26:45 mcnet: received header: ping
2019-01-23 08:26:45 mcnet: complete message: ping, peer=1
2019-01-23 08:26:45 mchn: RECV: ping, peer=1
2019-01-23 08:26:45 received: ping (8 bytes) peer=1
2019-01-23 08:26:45 sending: pong mchn: SEND: pong
2019-01-23 08:26:45 (8 bytes) peer=1
2019-01-23 08:26:54 sending: ping mchn: SEND: ping
2019-01-23 08:26:54 (8 bytes) peer=1
2019-01-23 08:26:54 mcnet: received header: pong
2019-01-23 08:26:54 mcnet: complete message: pong, peer=1
2019-01-23 08:26:55 mchn: RECV: pong, peer=1
2019-01-23 08:26:55 received: pong (8 bytes) peer=1
2019-01-23 08:26:55 mcnet: received header: ping
2019-01-23 08:26:55 mcnet: complete message: ping, peer=1

 

There is basically zero application activity on the server (e.g. the blockchain is not really used for anything).

So, what are those 4-10% of CPU doing?

Top:

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                        
11356 multich+  20   0  701196  40120  11768 S  6.7  2.0   0:47.55 multichaind
asked Jan 23, 2019 by Morten

1 Answer

+2 votes
The most likely explanation is that the first server is building blocks to add to the blockchain. Is that a possible explanation in your case? In any event before proceeding any further we recommend upgrading to the latest release of MultiChain 1.0.x, since you seem to be running quite an old version?
answered Jan 23, 2019 by MultiChain
I doubt this is the case. I have another server running the exact same multichaind version (1.0.0), in a standalone node setup, and there multichaind consumes only 0-1% of the CPU. The hardware setup is also the same.

So, I doubt it is the multichaind version, or the block building, that is causing multichaind to consume so much CPU in the two-server setup mentioned in my first post. Unless multichaind consumes considerably more CPU in a multi-node setup?
We've identified another issue where nodes can waste CPU cycles trying to connect to previously known nodes that have since disappeared. Is this a possible explanation of what you're seeing? If so it will be fixed in the next maintenance release, of both 1.0.x and 2.0 betas.
I updated to version 1.0.7 and this seems to have improved things a bit. The first server now "only" constantly consumes 2.7-5.3% of CPU, and the second server now "only" constantly consumes 0.3-1.% of CPU.

Still, this is while there is zero activity on the blockchain from the application's point of view, e.g. there are application queries against the blockchain.

> We've identified another issue where nodes can waste CPU cycles trying to connect to previously known nodes that have since disappeared. Is this a possible explanation of what you're seeing?

I doubt it, because there no "previously known nodes that have since disappeared". There has always only been two nodes, one on each server.

Of course, I can live with 3-5% of CPU being constantly used. Just curious to understand why this is needed when there is no application queries against the blockchain, and thus nothing to mine (I think).
Did you say there *was* querying against the node taking place, even if this is not blockchain activity? If so this would explain the CPU usage, which is due to processing and responding to the API requests.
No, there are NO queries taking place, at least not from my application. I don't know if there are other queries being made, such as administration queries between the nodes.
It seems that CPU usage is increasing. After a week of running with the latest version of Multichaind (1.0.7), I have this constant CPU load:

* On node 1: 8% - 13%
* On node 2: 0% - 2%

Is there a log that can tell me what is going on, e.g. what the multichaind process is doing?
Would you mind please testing the same setup on MultiChain 2.0 beta 2, which fixes a CPU consumption issue we're aware of, and letting me know if it makes a difference?

Also if you want more debugging, run multichaind with these additional options:

-debug=mchn -debug=mcblock -debug=mcapi -debug=mcminer

Then you'll see much more in the debug.log file.
It is currently not an option to update to Multichain 2.0 beta 2 in this (production) environment.

I added the debug flags. It shows various activity, but not much. There are plenty of periods up to 10-20 seconds each where nothing is written to the log, but the CPU load is still 3%-5%. (The CPU load apparently dropped a bit from yesterday after I restarted multichaind with the debug flags.) So, it seems that there is multichaind activity that is not written to the log.
OK. This could also be background compaction of LevelDB database tables.
If so, then why would such compaction only consume so many CPU resources on node 1, and not on node 2 which is completely identical to node 1?
Are these servers exact the same specification as each other? If so, perhaps one is subcribed to some streams that the other is not? Or one is initiating more transactions than the other? The difference in LevelDB activity could be caused by any number of things to do with the data stored by a node.
Yes, they have the exact same specifications.

Why would stream subscriptions matter if there are no transactions being made?
Because if a node is subscribed to a stream, it will be building up a lot more LevelDB database records, and that in turn could lead to more background compaction of that LevelDB database, even if no new transactions are coming in. This is a result of how LevelDB works - using a "log structured merge tree". This compaction will finish at some point.

https://en.wikipedia.org/wiki/Log-structured_merge-tree
There is the same number of stream subscriptions on the two nodes, so I doubt this is a LevelDB compaction issue because the constant CPU load is much higher on node 1 than on node 2.

But if it is a LevelCB compaction issue then LevelDB compaction must be extremely CPU inefficient: We have about multichain 10 transactions per day, yet CPU load is constantly around 10% all-day round. (It seems that this constant CPU load is actually increasing over time; it was 3-5% when I restarted multichaind about 10 days ago. Now it is 10%.)
We are discussing internally and will get back to you shortly.
We still think this is the reconnection problem that has already been fixed in 2.0. Would you be willing to test an unofficial release of 1.0.x to see?
The issue is in a production environment, so I am not too keen to test unofficial releases there :) And perhaps the issue cannot be reproduced if I do a similar setup (because if the issue can be replicated on any any two-node setup you probably would have caught this issue a long time ago).

But is there no way to enable logging to see if this is a reconnection issue?

Alternatively, if version 2.0 is fully backwards-compatible with my version 1.0 blockchains (and is installed in the same way by just overriding the old binaries with new ones), then I am willing to go to 2.0 directly, if needed.
The problem is that there is no logging of the specific cause of this issue that can be activated. Overall if this is not proving a huge problem it sounds like you should wait for the official 1.0.8 release, which includes this fix.
Ok, I will. Can you give me an approximate time estimate for v1.0.8 being available?
We expect in the next month or so.
Version 1.0.8 has now been released with a fix for this issue.
Thank you, this seems to have solved the problem. The multichaind CPU load is now <1% when there is no activity, as expected.
Great, thanks for the update.
...