getpeerinfo returns empty list intermittently

+1 vote

Hi 

I am a newbie to multichain and trying to understand an issue I am facing. I am running a multichain network with firstnode/masternode in public cloud A and second node in public cloud B. When the node B comes up, the getpeerinfo returns an empty list most of the times.

nw-398-932-3: getpeerinfo

{"method":"getpeerinfo","params":[],"id":"40170048-1597761326","chain_name":"nw-398-932-3"}

 

[

]

Restarting the node B multiple times helps with this. getblockchaininfo, however gives the same result on both the nodes. The DNS resolution works all the while without any issues and the routing seems to be fine as well which is why it works after a couple of restarts. Any suggestions to troubleshoot this further will be helpful. Thank you

asked Aug 18 by avinashbo
It looks like the connection is disconnected soon after  downloading the blocks

2020-08-19 10:05:28 mchn: Connection from 15DLa5vGdCvf1BiVo2dHDNqhkV1RGdMJK9xXTd received on peer=4 in verackack (X.X.X.X:7447)
2020-08-19 10:05:28 mchn: Parameter set from peer=4 verified
2020-08-19 10:05:28 sending: getaddr mchn: SEND: getaddr
2020-08-19 10:05:28 (0 bytes) peer=4
2020-08-19 10:05:28 sending: ping mchn: SEND: ping
2020-08-19 10:05:28 (8 bytes) peer=4
2020-08-19 10:05:28 initial getheaders (69) to peer=4 (startheight:70)
2020-08-19 10:05:28 sending: getheaders mchn: SEND: getheaders
2020-08-19 10:05:28 (581 bytes) peer=4
2020-08-19 10:05:28 sending: addr mchn: SEND: addr
2020-08-19 10:05:28 (31 bytes) peer=4
2020-08-19 10:05:28 mchn: RECV: getaddr, peer=4
2020-08-19 10:05:28 received: getaddr (0 bytes) peer=4
2020-08-19 10:05:28 mchn: Sent 13 known addresses
2020-08-19 10:05:28 mchn: RECV: ping, peer=4
2020-08-19 10:05:28 received: ping (8 bytes) peer=4
2020-08-19 10:05:28 sending: pong mchn: SEND: pong
2020-08-19 10:05:28 (8 bytes) peer=4
2020-08-19 10:05:28 mchn: RECV: getheaders, peer=4
2020-08-19 10:05:28 received: getheaders (581 bytes) peer=4
2020-08-19 10:05:28 getheaders 70 to 0000000000000000000000000000000000000000000000000000000000000000 from peer=4
2020-08-19 10:05:28 sending: headers mchn: SEND: headers
2020-08-19 10:05:28 (82 bytes) peer=4
2020-08-19 10:05:28 mchn: RECV: addr, peer=4
2020-08-19 10:05:28 received: addr (481 bytes) peer=4
2020-08-19 10:05:28 mchn: received addr: 16
2020-08-19 10:05:28 mchn: RECV: pong, peer=4
2020-08-19 10:05:28 received: pong (8 bytes) peer=4
2020-08-19 10:05:28 mchn: RECV: headers, peer=4
2020-08-19 10:05:28 received: headers (82 bytes) peer=4
2020-08-19 10:05:28 mchn: Synced with node 4 on block 70 - requesting mempool
2020-08-19 10:05:28 sending: mempool mchn: SEND: mempool
2020-08-19 10:05:28 (0 bytes) peer=4
2020-08-19 10:05:28 mchn: Synced with seed node on block 70
2020-08-19 10:05:28 mchn: Disconnecting seed node
I will ask the team about this and let you know.
Thank you. Sharing an observation on my end.

When we restart the node as below, it works smoothly and no more disconnections (it is that we cannot use -connect with the initial connection)

multichaind ${CHAIN_NAME} -connect=${MASTERNODE_HOSTNAME}:${MASTERNODE_PORT} \
                    --datadir=/datadir -txindex -initprivkey=${PRIVKEY} -shrinkdebugfilesize -printtoconsole

The usual command used while the node is connecting to the network for the first time: (which disconnects after a while)

multichaind ${CHAIN_NAME}@${MASTERNODE_HOSTNAME}:${MASTERNODE_PORT} \
                    --datadir=/datadir -txindex -initprivkey=${PRIVKEY} -shrinkdebugfilesize -printtoconsole

Both the nodes are running behind public load balancers on cloud providers. I noticed after disconnection that internal IPs are being used in the peer discovery process (netstat SYN_SENT connections) which obviously wont work. Is there a way to reconfigure peer discovery to use public IPs/names?

1 Answer

0 votes
It is difficult to see from logs what exactly goes on, but here are some principles which may help you to understand the problem

1. MultiChain nodes have database of peers identified by IP:port (not by host name), so once DNS resolution is made, IP is used. If node was connected to specific IP:port once, the peer is stored in the database. In your log example, X.X.X.X:7447 is stored in the peer database

2. MultiChain tries to connect to different peers in random order, until it has several connections. Peers it was connected recently have high priority, peers it was not able to connect many times have low priority.

3. MultiChain always tries to connect to addresses specified in "-connect" and to seed address (multichaind ${CHAIN_NAME}@${MASTERNODE_HOSTNAME}:${MASTERNODE_PORT} in your case)

4. MultiChain disconnects from the seed address when nodes are in sync AND there is at least one other connection (last rows in the log). So, the last two rows are perfectly normal, but getpeerinfo should return results at this moment.

 

So, there are several directions you can check:

1. Some nodes (or seed) are busy (or have enough connections to other nodes) and doesn't respond or even drop connections

2. Nodes are restarted with different IPs - as a result "peer" database is full of IPs which no longer exist or don't respond on specific port.

3. X.X.X.X is IP of the load balancer and when node tries to connect to X.X.X.X:7447, load balancer doesn't reroute request properly
answered Aug 23 by Michael
Thank you very much for the reply. I would like to ask a follow up question. Can we use the seed node and -connect as options in combination to multichaind while connecting to the network for the first time? Like so:

multichaind ${CHAIN_NAME}@${MASTERNODE_HOSTNAME}:${MASTERNODE_PORT} -connect=${MASTERNODE_HOSTNAME}  --datadir=/datadir -txindex -initprivkey=${PRIVKEY} -shrinkdebugfilesize -printtoconsole

I could not make this work.

About point 4, I could only see peer=4 in the logs until the point of disconnection which makes me expect the connection to hold. But I will try all the directives mentioned. Thanks again
I was able to fix it by using the -onlynet option. The issue seems to be arising from the usage of IPv6 during peer discovery name resolution. Thank you for the support with this
...