Multichain benchmarking with NodeJS wrapper

+1 vote
I was running a multichain benchmark test, using this ( https://github.com/scoin/multichain-node ) node wrapper as listed on your developer page to make the calls.

When I try a fairly high number of requests (say, publish to a stream), such as 500 transactions per second or higher ( on a single peer), I get "connect ETIMEDOUT" or "connect ECONNRESET" errors from node. I tried setting a higher value for timeout in the library code, but same problem. What is causing the issue here? The library itself or do I need to configure the blockchain differently to avoid this problem?  I'm using default parameters at the moment (except anyone-can-connect=true - but no additional peers were connected for my tests). Does the peer drop the http request if the send rate is very high?

Multichain version is 1.0-release
asked Dec 11, 2017 by amolpednekar
edited Dec 11, 2017 by amolpednekar

1 Answer

+1 vote

It sounds like a simple case of pushing the node beyond its capacity to process transactions on your setup. We see around 1000 tx/sec for basic publish operations, but this can be slowed down by (a) your CPU performance, (b) the size of the items being published, (c) your node being subscribed to the stream and (d) a larger number of UTXOs in the node's wallet.

answered Dec 12, 2017 by MultiChain
Thanks for the reply.

I noticed that the CPU utilization (using bash's 'top' utility) for the multichaind process shoots up between 90-100% intermittently, for certain transaction rates (say, 200 and upwards).
I'm setting autosubscribe=true, and each publish item is fairly small, eg. { account_1 : 50 } and so on for 'n' items.
It hits these high utilization rates even faster if I enable transaction events.

So, 1) Is this causing the connection breaks?
2) How can I view utxo's for my wallet? Using 'getwalletinfo' showed me "utxocount: 1"
3) Would any changes to the default blockchain parameters potentially help me get a performance improvement for the hardware?
The connection breaks will probably be caused by too many API threads waiting for access to a lock, and this in turn is caused by the high level of node activity.

If getwalletinfo shows "utxocount: 1" then the UTXO count is certainly not your problem.

If you set a shorter block time that might help somewhat, since it will take less time to process each block, but it will not have much effect on the overall throughput.

So your best bet is still to stop subscribing to the stream on the same node that is writing to it, and to use a more powerful CPU.

Also, just verifying that you didn't write your benchmark in such a way that each API request starts a new operating system process, because if so that will be your main source of overhead. You need to use a single running process that repeatedly sends to the MultiChain API.
Thanks for the explanation.

I will try it on a better system to see how much difference it makes. Stream subscription is necessary as I am doing read tests too, but will keep that in mind.

And no, I don't start a new process for each request. A single script calls the API 'n' times based on my test config. In any case, that shouldn't cause a problem as I'm using a separate machine to send the API requests to the peer.
@multichain - Update

I upgraded from a 2 vCPU virtual machine, where my limit was 1000 txns @ 100tps, after which I used to get the connection reset issue, to an actual hardware machine, with the following specs -

Peer and client (sending req. to peer from a diff machine) are SEPARATE machines with the following hardware -

Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
16GB RAM
500GB HDD
OS: Ubuntu 14.04 LTS

Even with the major upgrade, I could only reach a maximum of 2750 txns @ 275 tps (basically, 10 second tests). On the peer machine, CPU usage for "multichaind" process starts crossing 100% at the point and fails (econnreset/etimedout) outright at 3000 txns @ 300tps. (With walletnotify ON)

Is this the expected performance?
The issue is probably -walletnotify since this forks a new process for every transaction notification. Please try with this option off and see what you get.
Yeah, that seems to be it; if I turn -walletnotify off, I can reach upwards of 500tps and more.

I need events because I what I'm doing is logging the time and txn id of a publish stream action, and again logging the time when I get the event for the same txn id on the client side, and computing the difference.

The current implementation of events seems kind of crude, also considering that an event is generated TWICE for each transaction.

Is there any way I can efficiently get around this? Opening up a port for events or something? I dont think polling is a good option.
Another update: So when I run a 60 second test WIHTOUT NOTIFICATIONS; say a total of 24000 transactions at 400 txs per second, I still get the ECONNRESET/ETIMEDOUT error in spite of CPU usage hovering around 30-50%, never going above 80% at any point during the test.
First the next release will include an option to set a notification script that is only triggered when a transaction is new, so that will already save half of the slowdown effect you're experiencing.

As for your second note, the CPU usage will not go to 100% on a multicore CPU because there are mutex locks inside the MultiChain codebase that prevent it from fully exploiting all the cores. As for what is limited your throughput, it could be the size of the stream items, or the number of UTXOs in the wallet (check utxocount in getwalletinfo).
Hey, thanks for the continuous support.

@single-event-transactions, that's great, yeah. I started using blocknotify instead to get list of all confirmed transactions, so that kind of brought the scripts issue under some control.

@second-note: I continually kept checking UTXO count, it was 1 at all times.
Then I added debug=mcapi runtime paramter, and checked debug.log file to see if there's a corresponding error.

After the failures starting showing my client, the debug log had this:
"mcapi: API request failure: publish, code: -708"
Looking at your API error page, it states

"-708 : RPC_ENTITY_NOT_FOUND :The asset or stream reference in an API request or transaction was not found."

What exactly does this mean? After some point, are my requests being send without the stream name/getting corrupted? So its a client issue/node wrapper issue?
We can't say for sure what is the cause, it's recommended to try debugging the API request payload to see what is actually being sent.
Hi.

So I was unable to recreate that error, but the send issue persists. The debug.log doesnt seem to show any failures, which is puzzling.

Here is my entire setup, if that makes things clear.

I'm running the multichain process inside a docker container.

Stream publish requests information:

The key is : "Key+i", where i is the ith send request, and it can go from "key1" upto "key24000"; and the value is fixed, is "31".
I am sending a total of 24000 transactions, at 400 transactions per second. Though I should mention that a lower number of transactions for the same send-rate does succeed.

utxo count is also not a problem, its "1"; and I tested with just one node.

I even tried running client (where I send requests from) and the node on the same machine, to make it a localhost connection, still the same, so that rules out a network issue.

CPU usage seems to stay well within limits (35-50%).

I queried the node via CLI to check is case the request I going through, but no, the node stream doesn't have that key at all, so it seems to be dropping the entire request altogether.

It's a very bounded setup, so I'm not able to put my finger on what the issue is. Does the node have a specific RPC send limit, or some other parameter which could be affecting the test?

Should I simply count these transactions as "failed"?

Thanks.
How many API requests are you sending concurrently? Generally 2 is the sweet spot in terms of performance, and this will prevent requests failing.
I keep on flooding the node with requests, and log the response times per txid as they come. I don't have logic that will send the next request only when any of the previous requests (in this case, '2') complete, if that's what you mean. (This is what happens in something like ApacheBenchmark, as far as I understand)

So my question here is, why the value 2? Does the peer have some kind of request queue size beyond which it will simply drop requests from client? ( if yes, what is this size? ) Because they key-value pairs for which the publish requests fail aren't logged at all by the peer (using debug=mcapi)

Also, wouldn't a continuous flood of requests be the case in real world scenarios?
There is an rpcthreads runtime parameter which controls how many incoming RPC requests can be handled simultaneously. By default it is 4, but it can be increased if you wish by adding something like -rpcthreads=16 on the command line. In any event, the reason for suggesting 2 is that beyond this you will not get any performance increase from sending requests in parallel.
Understood the point why it's 2, but my point here is even if performance saturates beyond this point, it should still not DROP the rpc connection regardless.

I tried setting the rpcthreads parameter, didnt seem to make any difference.

This is how I used it, just to confirm that I am using the properly: multichaind CHAINNAME -autosubscribe=streams -rpcthreads=16 -blocknotify="/root/server/event-listener-websockets/script.sh %s"
Have you checked whether the connection is dropped due to a timeout on the Node.js side?
I have tried setting timeouts on NodeJS side, but those timeout callbacks don't get executed, it directly goes to the "error" callback where I abort the request.
I have modified https://github.com/scoin/multichain-node/blob/development/lib/client.js (library listed on multichain ) to https://gist.github.com/amolpednekar/0c64b35fb11cba13e95be07204023f40

Seems to bypass all the timeout callbacks and go to error. Doesn't this indicate a peer issue?
We don't know enough about this Node.js library to say. But when we try executing five long API requests in parallel using multichain-cli when MultiChain is run with rpcthreads=1, we see one request neatly waiting for another to complete, without an error.
I've noticed that the problem appears only when there is a very large set of requests, so you might not be able to see it with 5.

I ran a test with 1 client, 1 node, sending a total of 5000 transactions (publish) at 500 transactions per second. It succeeds.

When I do it with 2 clients targeting the same node, EACH sending a total of 5000 transactions (publish) at 500 transactions per second, the ECONNRESET issues show up again.

I think this experiment kind of indicates that the issue is not with a specific client ( a nodejs problem), but when there's bombardment of transactions to a single peer, or when there is a queue up of a high number of requests from a single client.

Does JSON-RPC 2.0 using HTTP have some kind of connection limit? If not, this has got to be a peer level check, cant imagine it being anything else.
We've discussed this internally and still don't know whether this issue is within MultiChain or Node.js. I guess the way to tell is: when you are seeing these ECONNRESET problems in Node.js, send off a few API requests with multichain-cli. If multichain-cli waits patiently until the API request is answered, it's not a MultiChain problem. If multichain-cli fails before a response is received, it probably is a MultiChain problem.
Hi, sorry for the late response, I was working on something else in the meanwhile.

I shall do that cli-test and get back.

Another thing I wanted to inform is that when I start multichain daemon directly on the host machine, it can handle much more requests as compared to when I run it inside a docker container, with the same runtime and blockchain parameters. The docker image just contains a simple installation on Multichain 1.0.2.

Have you guys come across this issue where Docker (version 17.09.0-ce, build afdb6d4) brings in some bottleneck? The RPC threads setting doesnt get applied or something? (there's no way for me to actually see this)
I'm afraid we're not aware of anything like this. Is it possible that the Docker environment has some kind of limit or throttle on CPU usage?
So I tried the CLI approach that you guys suggested, and I am able to publish, but I dont think that means that the problem is with the library.
Because even after I get the ETIMEDOUT /ECONNRESET issues, it's only for some keys and it continues publishing again successfully for subsequent keys.

I tried a completely different NodeJS library ( https://github.com/Tilkal/multichain-api ) and ended up with the same errors.

Just to confirm that these requests are actual being sent correctly from client to the node, I ran the tcpdump tool and all the requests are sent + well formed.

One thing I noticed is that when I had debug=mcapi enabled, printing to console, usually it scrolls very fast as the requests come in, but suddenly it hangs/lags and that is when I start getting these errors. But the debugger itself shows absolutely no errors, and continues normally after a while. So that lag and the subsequent errors kind of make me think its a multichain issue. Also the fact that this issue gets better if you move to a better system for the node(keeping the client machine as it is). The main problem is the node not logging anything about the dropped requests, which is making it difficult to pinpoint the error.

Edit 2 - You guys might be able to replicate this but simply starting a node, and use any NodeJS wrapper (https://github.com/scoin/multichain-node/ ) (or any other async lib) and running the publish command inside a for loop, for a large number of iterations (like 10k).
Thanks for the update. I've forwarded this to the dev team and we'll update asap.
To circle back... we still believe this is a consequence of the library you're using rather than MultiChain. You might want to confirm this by running the same sort of test (in terms of HTTP requests per second) against an Apache web server running locally, delivering some tiny pixel.

Here's some PHP code you should be able to adapt to demonstrate that MultiChain does not have a problem with a transaction rate of 1000 tps:

$url='http://'.$host.':'.$port.'/';
$strUserPass64= base64_encode($user.':'.$password);

$payload=json_encode(array(
    'id' => time(),
    'method' => $method,
    'params' => $params,
));

$header='Content-Type: application/json'."\r\n".
    'Content-Length: '.strlen($payload)."\r\n".
    'Connection: close'."\r\n".
    'Authorization: Basic '.$strUserPass64."\r\n";

$options = array(
    'http' => array(
        'header'  => $header,
        'method'  => 'POST',
        'content' => $payload
    )
);

$context  = stream_context_create($options);
$response = file_get_contents($url, false, $context);
Thanks for trying, I will take another look at the node library and its parameters.

I wanted to ask you guys if using an asynchronous library to send requests might cause issues compared to a synchronous library? As many requests are remaining pending state as more keep coming, and that might be cause timeout problems/breaks?

Because php is synchronous, and even the multichain-cli is synchronous as far as I know.
...