Node rpc completely freezes (rpcssl)

Hi.

Following problem: (version 1.0.1)

1. First node created

2. Second node connects to first node

3. First nodes rpc works fine (also ssl)

4. Second node does not respond to any rpc calls (even locally on the machine using curl or multichain-cli)

5. Second node can only be killed with kill -9

6. After restart of second node, rpc is available

Last log entries of second node: (log seem nearly the same on first node)

2017-10-09 09:00:58 mchn: Connection from 1PbBy4UD7g3nv19DiodWQ51fZGQtXesaLPnNsn received on peer=11 in verackack (18.194.126.228:44442)
2017-10-09 09:00:58 mchn: Parameter set from peer=11 verified
2017-10-09 09:01:06 MultiChainMiner: Block Found - 007bc5d6629b672be74f91c586ab0be04a9077024818fa68c5b33985d083b18f, prev: 00a2f13f18dfa9c0aa13abe98c2a45eb0ee2dcf0a3f29ec9abaf1d8d35ad0966, height: 114, txs: 2
2017-10-09 09:01:06 UpdateTip: new best=007bc5d6629b672be74f91c586ab0be04a9077024818fa68c5b33985d083b18f height=114 log2_work=14.84549 tx=126 date=2017-10-09 09:01:06 progress=1.000000 cache=0
2017-10-09 09:01:07 ResendWalletTransactions()

asked Oct 9, 2017 by Alexoid
edited Oct 9, 2017 by Alexoid

Show 7 previous comments

Without ssl, there is no problem on the nodes. What I also get from time to time when ssl is enabled:

The very first rpc call after a longer idle time results in:
Post https://18.XX.XX.228:8000: read tcp 10.XX.XX.86:47470->18.XX.XX.228:8000: read: connection reset by peer

All following calls work fine again. _Only sometimes this freezes the node_

I will stop my investigation here until you have further questions...

commented Oct 10, 2017 by Alexoid

Many thanks for doing this investigation. It looks indeed like ssl issue. We will try to reproduce this behavior and find the solution.

commented Oct 10, 2017 by Michael

Just a last comment that maybe help:

When the rpc port was not reponsive (anymore) from the local machine (using curl) I still could contact it from "outside" using actual application that consumes the rpc api.

For me it seems something like: There are connections open that still work but the node does not have any capacity to open new connections...

For example, after about 10 minutes, "lsof -i" still shows 5 connections to port 8000:

multichai 1457 ubuntu    4u IPv4 688746      0t0 TCP ip-XX-XX-XX-XX.eu-central-1.compute.internal:afs3-fileserver->ec2-XX-XX-XX.eu-central-1.compute.amazonaws.com:59026 (ESTABLISHED)
multichai 1457 ubuntu   19u IPv6 688791      0t0 TCP ip-XX-XX-XX-XX.eu-central-1.compute.internal:8000->ec2-XX-XX-XX-XX.eu-central-1.compute.amazonaws.com:29782 (ESTABLISHED)
multichai 1457 ubuntu   24u IPv6 688742      0t0 TCP *:8000 (LISTEN)
multichai 1457 ubuntu   25u IPv6 688743      0t0 TCP *:afs3-fileserver (LISTEN)
multichai 1457 ubuntu   26u IPv4 688744      0t0 TCP *:afs3-fileserver (LISTEN)
multichai 1457 ubuntu   36u IPv6 693541      0t0 TCP ip-XX-XX-XX-XX.eu-central-1.compute.internal:8000->ec2-XX-XX-XX-XX.eu-central-1.compute.amazonaws.com:31593 (ESTABLISHED)
multichai 1457 ubuntu   37u IPv6 689010      0t0 TCP ip-XX-XX-XX-XX.eu-central-1.compute.internal:8000->ec2-XX-XX-XX-XX.eu-central-1.compute.amazonaws.com:41480 (ESTABLISHED)
multichai 1457 ubuntu   38u IPv6 697229      0t0 TCP ip-XX-XX-XX-XX.eu-central-1.compute.internal:8000->ec2-XX-XX-XX-XX.eu-central-1.compute.amazonaws.com:8126 (ESTABLISHED)

But after some while even from the application a connection was not possible anymore (maybe the application keep-alive was triggered and the new connections failed since the old one were still open and not closed properly.

Good luck! :-)

commented Oct 10, 2017 by Alexoid
edited Oct 10, 2017 by Alexoid

Morning.

Seems to be quite clear now:

Multichain is not closing the open connections by itself.

I am using a go application that is running on a cloud deployment. Normally the client should close the connection after a timeout (keep-alive). But this does not seem to work (maybe firewall etc. issues)

But nevertheless if the client never closes the connection multichain also doesn't close them. And if there are 5 connections open no new connections will be processed by multichain.

Maybe multichain should actively close connections after a timeout? Or even have a flag to disable keep-alive?

Otherwise open idle connection would block the complete rpc api.

<EDIT>
Sorry, just found: -rpckeepalive will try that...

commented Oct 11, 2017 by Alexoid
edited Oct 11, 2017 by Alexoid

This seems to "fix" the problem. multichain immediately closes the connections and does not get stuck anymore.

But shouldn't also the server side close idle connections after a period of time?
That does not seem to happen.

commented Oct 11, 2017 by Alexoid

1 Answer

Thank you again for doing this investigation.

It looks like the problem is not related to -rpcssl, we can now simulate "stuck RPC" behaviour without -rpcssl too. Probably, without SSL your client closes connection automatically.

Actually, what you see is intended behaviour. MultiChain has 4 threads responding to RPC requests (can be overridden by -rpcthreads). If -rpckeepalive=1 and you run more than 4 clients and none of them closes connection, the 5th connection will fail.

We will set default value of -rpckeepalive to 0 in the next version of MultiChain.

In the meantime you can either start multichaind with -rpckeepalive=0 or play with -rpcthreads or close connections after every request.

answered Oct 17, 2017 by Michael

rpckeepalive: performance drops vs port freeze

asked Mar 14, 2018 by Alexoid

Most popular tags

Node rpc completely freezes (rpcssl)

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Most popular tags

Node rpc completely freezes (rpcssl)

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.