Multichain CPU usage for high send-rates

+1 vote

This is follow up of a previous thread regarding dropped connections, that was getting lengthy, so starting a new thread with some concrete observations this time. ( https://www.multichain.com/qa/7815/multichain-benchmarking-with-nodejs-wrapper )

As mentioned previously, I said I was getting two errors predominantly, ETIMEDOUT and ECONNRESET.

For ETIMEDOUT, it turned out that the client itself was dropping requests, so they never reached the peer and the error callback fired. So I'm guessing a single node process cant handle send rates above 500tps, and this is clearly a client problem.

Now, I used to get ECONNRESET only if I send a much higher send rates. So now, I chose two clients, with send rates that both can handle (to avoid ECONNTIMEDOUT problems), and now I started getting ECONNRESET. I did a tcpdump to check again if requests are there, and they were present both on client and peer machines, so no problems there. --debug=mcapi doesnt show this though, so somewhere after receiving the request, its dropped either by Multichain or the OS (but tcpdump shows the request so I'm guessing its the former). But this definitely rules out any client side issues for this error.

You guys had initially pointed out (in the previous thread) that it could be that CPU is being used beyond its capability, so I ran htop and took some screenshots to log the CPU usage. And unsurprisingly, the reset problems were happening for high usage, particularly when one of the cores hit 60%+, and generally the process usage was 100%+

Some Screenshots of peer CPU - https://imgur.com/a/R1zC0

Is this kind of CPU usage normal? And only way to be able to handle is a better system if all parameters are kept the same?

Multichain version v1.0.2

Command: multichaind $CHAINNAME -debug=mcapi -printtoconsole -autosubscribe=streams -blocknotify="script.sh %s"

Blockchain params are default except target-block-time is set to 2 seconds

CPU is Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz (4 cores with Hyperthreading x2)

RAM: 16GB

 

asked Feb 28, 2018 by amolpednekar
edited Feb 28, 2018 by amolpednekar

1 Answer

0 votes

Yes, if you drive the node at full transaction capacity you will see this CPU usage.

The kind of connection resetting you're seeing may still be at the OS level, due to the type of connection being established. We've seen similar things with the curl library against Apache at high throughput. So please try using the PHP code we provided in this comment to see if that stops these connection reset errors when MultiChain is being used under high load:

https://www.multichain.com/qa/7815/multichain-benchmarking-with-nodejs-wrapper?show=8565#c8565

answered Feb 28, 2018 by MultiChain
Thanks for the reply, I tried you PHP code (had to learn this a bit :P ), ran it in a for loop upto 10000 iterations, and it works perfectly.
Here it is if you guys want to double check - https://gist.github.com/amolpednekar/ef3548aa4c28e7fed77eba95ead2f43b

But this code sends requests synchronously! This is something that I had asked in the previous thread too - last comment (https://www.multichain.com/qa/7815/multichain-benchmarking-with-nodejs-wrapper?show=8566#c8566)
PS: multichain CLI does the same thing

If I force sending requests synchronously in nodeJS, that works fine too, but this puts a limit on the number of requests I can send, because only after completing the current will it send the next request. (also in a real world scenario, this wont be the case, and I want to flood the peer with requests)

Apart from that, I dont see any difference in the type of request sent by this code and my node code, its still a standard HTTP post, only difference I saw was HTTP version (1.0 vs 1.1 - only relevant behavior change I can see is persistent connections in 1.1, which I'm not using in NodeJS by setting `connection:close` anyway )

Via TCPDump

Php request

{"method":"publish","params":["mystream","key9991","AB"]}
POST / HTTP/1.0
Host: 10.244.5.41:999
Content-Type: application/json
Content-Length: 57
Connection: close
Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=

NodeJS request

{"method":"publish","params" ["mystream","key0","AB"],"id":1,"jsonrpc":"2.0"}
POST / HTTP/1.1
Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=
Content-Type: application/json
Accept: application/json
Content-Length: 77
host: 10.244.5.41:999
Connection: close
You can run multiple instances of that PHP script in parallel, to test asynchronous API requests.

While the TCP dump looks similar, there are other aspects of how network/port connections are open and closed that you won't see in that.
Oh. I will try running a lot of instances, but not sure how well it can replicate the async behavior that I want.

Can you guys shed any light on what exactly the difference is/ or how Php does this job, so that I can replicate the behavior in NodeJS?
Any updates on this?
We don't know exactly, we just know that we've observed similar behavior with curl in PHP vs raw PHP sockets.
Thanks.

Another update; I noticed that if I increase the "rpcthreads" paramter to a high value, say 5000 or more (depending on my send rate & total load), I no longer get the ECONNRESET error (keep-alive = 0 or 1 doesnt seem to make a difference). The higher I set this value, the more async requests it can handle.

 As soon as a set it to default (4, or a lower value than my load), they crop up again. Any idea why this is happening? (PS: This is still with a NodeJS library I have been using, and I didnt make any OS level network changes)

With this change, I dont get request drops even at a CPU usage of 100%. Is this a valid solution to handle request floods?
I guess it's a valid workaround - you'll just have a lot of threads running in MultiChain, holding on to these not-closed connections. If it doesn't slow your node down too much, or use too much memory, then I can't see any reason not to do it.
...