Sporadically receiving connection error when doing JSON RPC requests

Hello,

I made a web application which sends JSON RPC requests to a multichain node on the same server (localhost), the requests work correctly most of the time, except that in about 4% of the cases the request fails and the only error I have available is what the standard .net webrequest library tells me: "Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host." which should mean that for some reason multichain is closing down the connection while in the process of sending back a response.

I've been trying to debug this for a while without success, it may happen with any JSON RPC request and generally happens when there a bunch of requests in a short time (right now I'm doing tests with 1 second delay and it happens around the tenth consecutive request).

I don't know what's happening and how to debug it since it seems to be a problem on the side of multichain and one that happens only a small percentage of the time.

I tested on multichain community 2.0.3 but I had this happen even on 2.0.2.

What can I do to further investigate this situation?
Any idea on why it could be happening?

asked Oct 23, 2019 by mgiacomellijsb

1 Answer

If the problem is occurring when you are sending a lot of requests at the same time, the likely explanation is that some of these requests are taking a while, and your MultiChain node is not able to answer some new incoming connections because all the RPC threads are used up. Three suggestions that might help:

Improve the performance of the server (maybe this is a light cloud instance?)
Use the rpcthreads runtime parameter to increase the number of RPC commands that can be in process in parallel.
Increase the timeout parameters on the client side.

answered Oct 26, 2019 by MultiChain

thank you for the answer, I can't find any information on the documentation about the rpcthreads parameter, what is the default value? what is the maximum?

commented Oct 28, 2019 by mgiacomellijsb
edited Oct 28, 2019 by mgiacomellijsb

If you run multichaind with no parameters you'll see a full list of all runtime parameters it supports.

commented Oct 28, 2019 by MultiChain

as I said in the edit before:
I tried to set it to 200 but consecutive requests (not parallel) still fail.
I checked the client timeout but it's 100 seconds by default, definitely not the case since the timeout is almost immediate.
It feels like multichain can't keep up with incoming consecutive requests in a short time, my virtual machine (windows) seems able to keep up with the workload.

commented Oct 28, 2019 by mgiacomellijsb

OK, please clarify the virtualization setup, i.e. (a) what is virtualized inside what (Windows inside Windows?), and (b) relative to that setup, where the node is running and where the requests are coming from.

commented Oct 28, 2019 by MultiChain

It's a windows server 2012 virtual machine running on azure cloud.
The node is on that machine, the web application doing the requests is on the same machine, the JSON RPC requests never leave the machine and are all internal.

commented Oct 28, 2019 by mgiacomellijsb
edited Oct 28, 2019 by mgiacomellijsb

Thanks - I've forwarded this to the team and will see if they have any ideas.

commented Oct 29, 2019 by MultiChain

Faced the same problem when using version 1.0.8. Had to add retry code in .NET as a workaround. Mine was running on an Azure VM with Ubuntu 16.04.

commented Oct 29, 2019 by Kit

We discussed this internally some more. It's not a problem that been reported before, so it seems more likely to be an internal networking issue on the virtual machine you are using, rather than something which is specific to MultiChain. Do you want to check if there is any kind of rate limiting being applied, either at the operating system level, or by Azure itself?

commented Oct 30, 2019 by MultiChain

For example see this documentation:

https://docs.microsoft.com/en-us/azure/virtual-network/virtual-machine-network-throughput

Maybe localhost traffic is still counted towards these limits?

commented Oct 30, 2019 by MultiChain

For clarification, I've checked and gotten a reply from Microsoft (please refer to the feedback below that link you have mentioned above) that internal traffic within the same host does not count towards these limits.

commented Nov 4, 2019 by Kit

OK. And just to confirm you're connecting to the node's API using localhost/127.0.0.1 rather than the server's external IP address?

commented Nov 5, 2019 by MultiChain

I can confirm that, every connection is towards localhost/127.0.0.1 and the relevant port

commented Nov 6, 2019 by mgiacomellijsb

OK, thanks for your replies. The last thing I would ask you to try is some other method of sending the requests to the MultiChain node, instead of .NET and the webrequest library. For example you could use Apache JMeter to send a simple command to the node in a loop. That should help isolate whether the problem is in the calling library or in the operating system or node.

commented Nov 7, 2019 by MultiChain

I tried with Advanced REST Client right on the server to send a bunch of getinfo requests to multichain and I haven't seen any issues even after repeatedly spamming the send button.
I'll try the following things next:
- try to recreate the specific request in ARC and spam it (it's a stream insertion with some values)
- try another implementation of the .net webrequest library (for instance restsharp instead system.net.http)

commented Nov 7, 2019 by mgiacomellijsb

Just to add, what I am observing from the debug log after adding -debug=mcapi flag is the client encountered the error and MultiChain did not receive any input from the client. @mgiacomellijsb can you verify if that's the same for your case?

@mgiacomellijsb, I've also changed the .net webrequest to HttpClient but still faced with the same issue. Let me know your findings for restsharp.

commented Nov 7, 2019 by Kit

Hey all, after a long time I managed to find time to do some more tests and the theory was right: 200 consecutive requests directly to multichain through postman work without a hitch, the same can't be said for the same amount of requests done through the .NET webrequest library.
Now I'll try to switch up to another library and see if it's what's causing the issues

commented Jan 7, 2020 by mgiacomellijsb

@mgiacomellijsb Will be looking forward to your findings. Thanks.

commented Jan 7, 2020 by Kit

I can't believe it but I managed to solve the issue (apparently) once and for all and I didn't even do anything.
Basically on the lucidocean github there was an open pull request that fixed the issue https://github.com/LucidOcean/multichain/pull/31

it was a weird timeout lease thing, I just did a stress test with over 400 consecutive calls in a very short timespan and they all went through correctly.
Multichain is not bugged, it's just the lucidocean that did the web request poorly and if you apply that patch it will be solved

commented Jan 9, 2020 by mgiacomellijsb

Excellent findings! Thanks.

commented Jan 9, 2020 by Kit

Most popular tags

Sporadically receiving connection error when doing JSON RPC requests

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Most popular tags

Sporadically receiving connection error when doing JSON RPC requests

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.