Node consuming 4-10% of CPU doing nothing?

Hi

I am using "MultiChain 1.0 Daemon (protocol 10008)".

I am running a multichain node on two servers that are identical in hardware and setup.

On one server, multichaind consumes 4-10% of the CPU constantly. On the other it consumes 0.5-1% of CPU constantly.

I enabled "-debug" on the server where multichaind consumes 4-10%. However, the debug log just contains an endless repetition of this:

2019-01-23 08:26:34 received: pong (8 bytes) peer=1
2019-01-23 08:26:35 mcnet: received header: ping
2019-01-23 08:26:35 mcnet: complete message: ping, peer=1
2019-01-23 08:26:35 mchn: RECV: ping, peer=1
2019-01-23 08:26:35 received: ping (8 bytes) peer=1
2019-01-23 08:26:35 sending: pong mchn: SEND: pong
2019-01-23 08:26:35 (8 bytes) peer=1
2019-01-23 08:26:44 sending: ping mchn: SEND: ping
2019-01-23 08:26:44 (8 bytes) peer=1
2019-01-23 08:26:44 mcnet: received header: pong
2019-01-23 08:26:44 mcnet: complete message: pong, peer=1
2019-01-23 08:26:45 mchn: RECV: pong, peer=1
2019-01-23 08:26:45 received: pong (8 bytes) peer=1
2019-01-23 08:26:45 mcnet: received header: ping
2019-01-23 08:26:45 mcnet: complete message: ping, peer=1
2019-01-23 08:26:45 mchn: RECV: ping, peer=1
2019-01-23 08:26:45 received: ping (8 bytes) peer=1
2019-01-23 08:26:45 sending: pong mchn: SEND: pong
2019-01-23 08:26:45 (8 bytes) peer=1
2019-01-23 08:26:54 sending: ping mchn: SEND: ping
2019-01-23 08:26:54 (8 bytes) peer=1
2019-01-23 08:26:54 mcnet: received header: pong
2019-01-23 08:26:54 mcnet: complete message: pong, peer=1
2019-01-23 08:26:55 mchn: RECV: pong, peer=1
2019-01-23 08:26:55 received: pong (8 bytes) peer=1
2019-01-23 08:26:55 mcnet: received header: ping
2019-01-23 08:26:55 mcnet: complete message: ping, peer=1

There is basically zero application activity on the server (e.g. the blockchain is not really used for anything).

So, what are those 4-10% of CPU doing?

Top:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11356 multich+ 20 0 701196 40120 11768 S 6.7 2.0 0:47.55 multichaind

asked Jan 23, 2019 by Morten

1 Answer

The most likely explanation is that the first server is building blocks to add to the blockchain. Is that a possible explanation in your case? In any event before proceeding any further we recommend upgrading to the latest release of MultiChain 1.0.x, since you seem to be running quite an old version?

answered Jan 23, 2019 by MultiChain

I doubt this is the case. I have another server running the exact same multichaind version (1.0.0), in a standalone node setup, and there multichaind consumes only 0-1% of the CPU. The hardware setup is also the same.

So, I doubt it is the multichaind version, or the block building, that is causing multichaind to consume so much CPU in the two-server setup mentioned in my first post. Unless multichaind consumes considerably more CPU in a multi-node setup?

commented Feb 4, 2019 by ilsundal

We've identified another issue where nodes can waste CPU cycles trying to connect to previously known nodes that have since disappeared. Is this a possible explanation of what you're seeing? If so it will be fixed in the next maintenance release, of both 1.0.x and 2.0 betas.

commented Feb 4, 2019 by MultiChain

I updated to version 1.0.7 and this seems to have improved things a bit. The first server now "only" constantly consumes 2.7-5.3% of CPU, and the second server now "only" constantly consumes 0.3-1.% of CPU.

Still, this is while there is zero activity on the blockchain from the application's point of view, e.g. there are application queries against the blockchain.

> We've identified another issue where nodes can waste CPU cycles trying to connect to previously known nodes that have since disappeared. Is this a possible explanation of what you're seeing?

I doubt it, because there no "previously known nodes that have since disappeared". There has always only been two nodes, one on each server.

Of course, I can live with 3-5% of CPU being constantly used. Just curious to understand why this is needed when there is no application queries against the blockchain, and thus nothing to mine (I think).

commented Feb 12, 2019 by ilsundal

Did you say there *was* querying against the node taking place, even if this is not blockchain activity? If so this would explain the CPU usage, which is due to processing and responding to the API requests.

commented Feb 12, 2019 by MultiChain

No, there are NO queries taking place, at least not from my application. I don't know if there are other queries being made, such as administration queries between the nodes.

commented Feb 13, 2019 by ilsundal

It seems that CPU usage is increasing. After a week of running with the latest version of Multichaind (1.0.7), I have this constant CPU load:

* On node 1: 8% - 13%
* On node 2: 0% - 2%

Is there a log that can tell me what is going on, e.g. what the multichaind process is doing?

commented Feb 18, 2019 by ilsundal

Would you mind please testing the same setup on MultiChain 2.0 beta 2, which fixes a CPU consumption issue we're aware of, and letting me know if it makes a difference?

Also if you want more debugging, run multichaind with these additional options:

-debug=mchn -debug=mcblock -debug=mcapi -debug=mcminer

Then you'll see much more in the debug.log file.

commented Feb 18, 2019 by MultiChain

It is currently not an option to update to Multichain 2.0 beta 2 in this (production) environment.

I added the debug flags. It shows various activity, but not much. There are plenty of periods up to 10-20 seconds each where nothing is written to the log, but the CPU load is still 3%-5%. (The CPU load apparently dropped a bit from yesterday after I restarted multichaind with the debug flags.) So, it seems that there is multichaind activity that is not written to the log.

commented Feb 19, 2019 by ilsundal

OK. This could also be background compaction of LevelDB database tables.

commented Feb 19, 2019 by MultiChain

If so, then why would such compaction only consume so many CPU resources on node 1, and not on node 2 which is completely identical to node 1?

commented Feb 20, 2019 by ilsundal

Are these servers exact the same specification as each other? If so, perhaps one is subcribed to some streams that the other is not? Or one is initiating more transactions than the other? The difference in LevelDB activity could be caused by any number of things to do with the data stored by a node.

commented Feb 21, 2019 by MultiChain

Yes, they have the exact same specifications.

Why would stream subscriptions matter if there are no transactions being made?

commented Feb 25, 2019 by ilsundal

Because if a node is subscribed to a stream, it will be building up a lot more LevelDB database records, and that in turn could lead to more background compaction of that LevelDB database, even if no new transactions are coming in. This is a result of how LevelDB works - using a "log structured merge tree". This compaction will finish at some point.

https://en.wikipedia.org/wiki/Log-structured_merge-tree

commented Feb 26, 2019 by MultiChain

There is the same number of stream subscriptions on the two nodes, so I doubt this is a LevelDB compaction issue because the constant CPU load is much higher on node 1 than on node 2.

But if it is a LevelCB compaction issue then LevelDB compaction must be extremely CPU inefficient: We have about multichain 10 transactions per day, yet CPU load is constantly around 10% all-day round. (It seems that this constant CPU load is actually increasing over time; it was 3-5% when I restarted multichaind about 10 days ago. Now it is 10%.)

commented Feb 27, 2019 by ilsundal

We are discussing internally and will get back to you shortly.

commented Mar 4, 2019 by MultiChain

We still think this is the reconnection problem that has already been fixed in 2.0. Would you be willing to test an unofficial release of 1.0.x to see?

commented Mar 4, 2019 by MultiChain

The issue is in a production environment, so I am not too keen to test unofficial releases there :) And perhaps the issue cannot be reproduced if I do a similar setup (because if the issue can be replicated on any any two-node setup you probably would have caught this issue a long time ago).

But is there no way to enable logging to see if this is a reconnection issue?

Alternatively, if version 2.0 is fully backwards-compatible with my version 1.0 blockchains (and is installed in the same way by just overriding the old binaries with new ones), then I am willing to go to 2.0 directly, if needed.

commented Mar 4, 2019 by ilsundal

The problem is that there is no logging of the specific cause of this issue that can be activated. Overall if this is not proving a huge problem it sounds like you should wait for the official 1.0.8 release, which includes this fix.

commented Mar 5, 2019 by MultiChain

Ok, I will. Can you give me an approximate time estimate for v1.0.8 being available?

commented Mar 5, 2019 by ilsundal

We expect in the next month or so.

commented Mar 6, 2019 by MultiChain

Version 1.0.8 has now been released with a fix for this issue.

commented Mar 21, 2019 by MultiChain

Thank you, this seems to have solved the problem. The multichaind CPU load is now <1% when there is no activity, as expected.

commented Apr 10, 2019 by ilsundal

Great, thanks for the update.

commented Apr 11, 2019 by MultiChain

Most popular tags

Node consuming 4-10% of CPU doing nothing?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Most popular tags

Node consuming 4-10% of CPU doing nothing?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.