Multichaind seed node couldn't connect from 1.0 alpha 22

+4 votes

Using the same two nodes I repeated the "Create a MultiChain private blockchain in 90 seconds" till 1.0-alpha-21. 

From 1.0-alpha-22 I get an error such this:

Error: Couldn't connect to the seed node 10.10.20.90 on port 6737 - please check multichaind is running at that address and that your firewall settings allow incoming connections.

My nodes are two clones of LXC container with the following kernel:

root@multichain-01:~/multichain-1.0-alpha-22# uname -a
Linux multichain-01 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux

Is there something to change in configuration from alpha-22 ?

Regards,

 

 

asked Jan 27, 2017 by Nanni

1 Answer

+1 vote

First I would recommend upgrading to the latest version, in case that is the problem (there was an old bug relating to node handshaking that appeared under certain circumstances).

If the issue persists, it is likely to be related to firewalls or port forwarding on your own setup. You can check this by trying to telnet from one container to the appropriate peer-to-peer (not RPC) port on the other, and seeing if the connection is successfully made.

answered Jan 30, 2017 by MultiChain
Sorry, I didn't say. I started from alpha 27, then I went back to 26, ...

I checked again with a different OS (Centos), and the result is the same.

Here follows my way to check it.

I made two nodes and configure their network configurations:

lxc-create -t download -n multichain-centos7 -- -d centos -r 7 -a amd64

On both nodes:

curl "http://www.multichain.com/download/multichain-1.0-alpha-27.tar.gz" > multichain-1.0-alpha-27.tar.gz
curl "http://www.multichain.com/download/multichain-1.0-alpha-21.tar.gz" > multichain-1.0-alpha-21.tar.gz
tar xvzf multichain-1.0-alpha-27.tar.gz
tar xvzf multichain-1.0-alpha-21.tar.gz

@node1
cd multichain-1.0-alpha-27
./multichain-util create x
./multichaind x -daemon

@node2
cd multichain-1.0-alpha-27
./multichaind x@10.162.3.116:4353

I get:

Error: Couldn't connect to the seed node 10.162.3.116 on port 4353 - please check multichaind is running at that address and that your firewall settings allow incoming connections.

@node1
./multichain-cli x stop
cd ../multichain-1.0-alpha-21
./multichain-util create y
vi /root/.multichain/y/params.dat # to use the same port
./multichaind y -daemon

@node2
cd ../multichain-1.0-alpha-21
./multichaind y@10.162.3.116:4353

I get:

multichain-cli y grant 12Bga9EpffBabx56MZwJB2ZYW6P6KnLaUMRgZ2 connect
multichain-cli y grant 12Bga9EpffBabx56MZwJB2ZYW6P6KnLaUMRgZ2 connect,send,receive
OK, thanks. This looks like a bug. Can you please retry the alpha 27 versions, but running multichaind on both nodes with these extra runtime parameters:

-debug=mchn -debug=mchnminor -debug=net

Then can you please look in the debug.log files in the blockchain directory on each node, and post what you see relating to this connection attempt (you can look based on the timestamps).
Here follows what you asked. I can only see a "Permission denied" on first running node when the second tries to connect.

Thank you

@node1 10.10.30.10

2017-02-01 16:18:33 Connection attempt from: 10.10.30.11:45699
2017-02-01 16:18:33 Added connection peer=1
2017-02-01 16:18:33 mchn: RECV: version, peer=1
2017-02-01 16:18:33 received: version (106 bytes) peer=1
2017-02-01 16:18:33 send version message: version 70002, blocks=181, us=10.10.30.10:4249, peer=1
2017-02-01 16:18:33 sending: version mchn: SEND: version
2017-02-01 16:18:33 (106 bytes) peer=1
2017-02-01 16:18:33 mchn: Sending minimal parameter set to 10.10.30.11:45699
2017-02-01 16:18:33 sending: verack mchn: SEND: verack
2017-02-01 16:18:33 (291 bytes) peer=1
2017-02-01 16:18:33 receive version message: /MultiChain:0.1.0.7/: version 70002, blocks=-1, us=10.10.30.10:4249, peer=1
2017-02-01 16:18:33 Added time data, samples 2, offset +0 (+0 minutes)
2017-02-01 16:18:34 mchn: RECV: verackack, peer=1
2017-02-01 16:18:34 received: verackack (141 bytes) peer=1
2017-02-01 16:18:34 mchn: Connection from 1ZrcvZAy68KUwFefm2kZHnUib3USU6VU1D1HEo received on peer=1 in verackack (10.10.30.11:45699)
2017-02-01 16:18:34 mchn: Permission denied for address 1ZrcvZAy68KUwFefm2kZHnUib3USU6VU1D1HEo received from peer=1
2017-02-01 16:18:34 mchn: Sending minimal parameter set to 10.10.30.11:45699
2017-02-01 16:18:34 sending: verack mchn: SEND: verack
2017-02-01 16:18:34 (291 bytes) peer=1
2017-02-01 16:18:36 socket closed
2017-02-01 16:18:36 disconnecting peer=1
2017-02-01 16:18:36 disconnect flag set


@node2 10.10.30.11

2017-02-01 16:18:33 MultiChain version build 1.0 alpha 27 protocol 10007 (2016-12-22 15:31:29 +0200)
2017-02-01 16:18:33 Using OpenSSL version OpenSSL 1.0.1h 5 Jun 2014
2017-02-01 16:18:33 Using BerkeleyDB version Berkeley DB 4.8.30: (April  9, 2010)
2017-02-01 16:18:33 Default data directory /root/.multichain/x
2017-02-01 16:18:33 Using data directory /root/.multichain/x
2017-02-01 16:18:33 Using config file /root/.multichain/x/x.conf
2017-02-01 16:18:33 Using at most 125 connections (65536 file descriptors available)
2017-02-01 16:18:33 Using 8 threads for script verification
2017-02-01 16:18:33 Using wallet wallet.dat
2017-02-01 16:18:33 init message: Verifying wallet...
2017-02-01 16:18:33 CDBEnv::Open : LogDir=/root/.multichain/x/database ErrorFile=/root/.multichain/x/db.log
2017-02-01 16:18:33 init message: Initializing multichain...
2017-02-01 16:18:33 nFileVersion = 100000
2017-02-01 16:18:33 Keys: 2 plaintext, 0 encrypted, 2 w/ metadata, 2 total
2017-02-01 16:18:33 mchn: Parameter set is not complete - starting paramset discovery thread...
2017-02-01 16:18:33 init message: Loading addresses...
2017-02-01 16:18:33 Loaded 0 addresses from peers.dat  0ms
2017-02-01 16:18:33 Added connection peer=0
2017-02-01 16:18:33 AddLocal(10.10.30.11:8571,1)
2017-02-01 16:18:33 Discover: IPv4 eth0: 10.10.30.11
2017-02-01 16:18:33 AddLocal([fe80::225b:2aff:fef2:116c]:8571,1)
2017-02-01 16:18:33 Discover: IPv6 eth0: fe80::225b:2aff:fef2:116c
2017-02-01 16:18:33 dnsseed thread start
2017-02-01 16:18:33 Loading addresses from DNS seeds (could take a while)
2017-02-01 16:18:33 dumpaddr thread start
2017-02-01 16:18:33 opencon thread start
2017-02-01 16:18:33 net: Trying to connect to 10.10.30.10:4249 (by address)
2017-02-01 16:18:33 net: trying connection 10.10.30.10:4249 lastseen=0.0hrs
2017-02-01 16:18:33 msghand thread start
2017-02-01 16:18:33 0 addresses found from DNS seeds
2017-02-01 16:18:33 addcon thread start
2017-02-01 16:18:33 mchn: Sending query to seed node 10.10.30.10:4249
2017-02-01 16:18:33 net thread start
2017-02-01 16:18:33 Added connection peer=1
2017-02-01 16:18:33 dnsseed thread exit
2017-02-01 16:18:33 send version message: version 70002, blocks=-1, us=10.10.30.11:8571, peer=1
2017-02-01 16:18:33 sending: version mchn: SEND: version
2017-02-01 16:18:33 (106 bytes) peer=1
2017-02-01 16:18:33 mchn: Connected to seed node 10.10.30.10:4249
2017-02-01 16:18:33 net: Connection to [::]:0 established
2017-02-01 16:18:33 mchn: RECV: version, peer=1
2017-02-01 16:18:33 received: version (106 bytes) peer=1
2017-02-01 16:18:33 mchn: RECV: verack, peer=1
2017-02-01 16:18:33 received: verack (291 bytes) peer=1
2017-02-01 16:18:33 mchn: Connection from 1XCMHLbmoVCKZdwPZkas4rfzErTbbdd5VJ7zsR received on peer=1 in verack
2017-02-01 16:18:33 mchn: Cannot find address having connect permission, trying default key
2017-02-01 16:18:33 sending: verackack mchn: SEND: verackack
2017-02-01 16:18:33 (141 bytes) peer=1
2017-02-01 16:18:34 mchn: RECV: verack, peer=1
2017-02-01 16:18:34 received: verack (291 bytes) peer=1
2017-02-01 16:18:34 mchn: Connection from 1XCMHLbmoVCKZdwPZkas4rfzErTbbdd5VJ7zsR received on peer=1 in verack
2017-02-01 16:18:34 mchn: Successfully stored parameter set received from 10.10.30.10:4249
2017-02-01 16:18:34 mchn: Query completed, waiting for seed node to disconnect
2017-02-01 16:18:36 disconnect flag set
2017-02-01 16:18:36 disconnecting peer=1
2017-02-01 16:18:36 mchn: Successfully disconnected from seed node
2017-02-01 16:18:36 dumpaddr thread stop
2017-02-01 16:18:36 msghand thread interrupt
2017-02-01 16:18:36 addcon thread interrupt
2017-02-01 16:18:36 opencon thread interrupt
2017-02-01 16:18:36 net thread interrupt
2017-02-01 16:18:36 mchn: Exited from paramset discovery thread
2017-02-01 16:18:36 mchn: Minimal blockchain parameter set is created, default address: 1ZrcvZAy68KUwFefm2kZHnUib3USU6VU1D1HEo
2017-02-01 16:18:36 Shutdown: In progress...
2017-02-01 16:18:36 StopNode()
2017-02-01 16:18:36 Flushed 0 addresses to peers.dat  52ms
2017-02-01 16:18:36 Node stopped
2017-02-01 16:18:36 Shutdown: done
I have also seen this issue on rhel 7 linux version with alpha 26. But if you actually go to the debug.log file, you see that node has successfully connected and there it publishes the address for the connect or connect,send,receive permissions. And then if you grant the permission, the node is able to connect successfully.

And I have seen this without even enabling the debug params.
Is this a log for failed attempt? It looks perfectly normal. "please check multichaind is running at that address and that your firewall settings allow incoming connections" error should be accompanied by couple of additional rows in the log.

Furthermore, it is the log for the second attempt.

Sometimes "please check multichaind is running..." error may be wrong (we'll fix this in the next version) and the real reason is "cannot disconnect from seed node". But again, in this case you should see log message if -debug=mchn flag is set. In this case amanc is right - you successfully connected and you have address.

Can you please post logs for the first and/or second failed attempt with "-debug=mchn -debug=mchnminor -debug=net" runtime parameters?
I repeated the test many times as follows:

@node1 10.10.30.10

[root@multichain-centos7 multichain-1.0-alpha-27]# ./multichain-cli x stop; rm -fr ../.multichain/x/; ./multichain-util create x -default-network-port=1234; ./multichaind x -debug=mchn -debug=mchnminor -debug=net -daemon; tail -f ../.multichain/x/debug.log

@node2 10.10.30.11

[root@multichain-centos7-01 multichain-1.0-alpha-27]# rm -fr ../.multichain/x/; ./multichaind x@10.10.30.10:1234 -debug=mchn -debug=mchnminor -debug=net

I mostly got:

Error: Couldn't connect to the seed node 10.10.30.10 on port 1234 - please check multichaind is running at that address and that your firewall settings allow incoming connections.

with the log sent yesterday and sometimes with the following logs:

@node1
2017-02-02 08:58:46 mchn: PROCESSMESSAGE: INVALID MESSAGESTART version peer=2

@node2
2017-02-02 08:58:46 net: Trying to connect to 10.10.30.10:4769 (by address)
2017-02-02 08:58:46 mchn: Sending query to seed node 10.10.30.10:4769
2017-02-02 08:58:46 net: trying connection 10.10.30.10:4769 lastseen=0.0hrs
2017-02-02 08:58:46 Added connection peer=1
2017-02-02 08:58:46 send version message: version 70002, blocks=-1, us=10.10.30.11:8571, peer=1
2017-02-02 08:58:46 sending: version mchn: SEND: version
2017-02-02 08:58:46 (106 bytes) peer=1
2017-02-02 08:58:46 mchn: Connected to seed node 10.10.30.10:4769
2017-02-02 08:58:46 net: Connection to [::]:0 established
2017-02-02 08:58:56 mchn: Could not establish connection with seed node - timeout
2017-02-02 08:58:56 mchn: Query completed, waiting for seed node to disconnect
2017-02-02 08:59:07 mchn: Could not disconnect from seed node - timeout

I rarely also got a "Blockchain successfully initialized."
First of all "mchn: Could not establish connection with seed node - timeout" is the message I was looking in the previous logs.


Now I see a problem. Either node1 or node2 uses old configuration (i.e. before "rm -fr".

I was not able to reproduce "PROCESSMESSAGE: INVALID MESSAGESTART", but I was able to reproduce "please check multichaind is running..."

I don't see a problem with the command you use for node2 (I hope directory is completely removed when you start multichaind). And I suppose you don't have multichaind running when you remove directory.

But I do see the problem with node1: "./multichain-cli x stop" doesn't stop multichaind immediately, it just sends "stop signal" to multichaind. It may take time for multichaind to receive and process this signal. You cannot remove directory while multichaind is running, at least not entire directory. If directory is not removed, multichain-util cannot create new chain. It returns error, but your script ignores it. When you start multichaind, it either starts with old configuration or doesn't start at all (with error message) - this results in the error message on node2.

To avoid this please make sure multichaind is not running after "./multichain-cli x stop" before  "rm -fr ../.multichain/x/"

If this doesn't help, please post full logs of node1 and node2. node2's log is short. node1's log should be also not very long if it run for several seconds.
I added a 30 secs sleep between stop and remove.

I repeated the tests a dozen times and never got the INVALID MESSAGESTART again, but only once I got the grants.

Comments are limited to 8000 characters, I sent you the complete logs for one of the remaining cases as personal messages.
Thank you for reporting this. Unfortunately we cannot reproduce it (at least without LXC containers), so resolving this issue may take some time.

It looks like the problem is on node2 - for some reason it doesn't disconnect from node1. As I suspected connection was broken by timeout and you got wrong message (we'll fix it in the next version). But connection was properly established, relevant information transferred and node2 got valid address and is ready to connect if this address would have connect permission. Unfortunately multichaind doesn't return this address in its output, but you can see it in the logs. We'll fix this too.

But first, we'll try to resolve "disconnect" issue. Can you confirm that the issue doesn't appear in previous versions of MultiChain, say Alpha 26? Please note: you should run Alpha 26 on both nodes, because Alpha 26 cannot connect to nodes running blockchain created by Alpha27 (unless you explicitely specified <protocol-version> parameter in multichain-util)
...