Yup update is in plan most probably we will do it by today as it is drop
There is no fix pattern for network issue as far as i searched.
My query was After network issue occurred and server were not
communicating and everything came back online why still it was showing an
error of communication.
As for example Yesterday evening we faced that secondary of one Shard was
not reachable and heart beat issue was there.
But once secondary was up and reachable rs.status were giving different on
both primary and secondary.
Primary was still showing secondary not available but secondary replica
status was coming fine.
We try to open port from primary to secondary and it was working.
Issue was fixed when step down primary and restart then made it primary
Similarly when last time we took whole cluster restart issue was gone.
I have attached rs.status from both servers here.
On Monday, 9 May 2016 12:02:38 UTC+5:30, Kevin Adistambha wrote:
This issue was not consistent as sometimes we see it on MongoS or some
times on replica set..
3 Config Servers 4 MongoS yup alll serevrs showed same issue.
Network connection was down from primary to replica for some time and we
restored it. But primary could not connect to resorted secondary.
We took a restart of whole cluster and then thyis error was gone
From what I have seen so far, I believe the underlying issue is network
connectivity/configuration within your deployment. My understanding so far
- all servers are experiencing the same issue intermittently
- the issue seems to be spread across the whole cluster (i.e.
sometimes on mongos and other times on individual mongod)
- network connectivity issues within a replica set
- error messages in the form of “HostNotFound: unable to resolve DNS
for host” or “Couldn’t get a connection within the time limit”
- restarting the cluster seems to solve the problem for a while
(likely due to the refresh of the DNS cache)
All these signs seems to point that the issue is in your network setup
(e.g. DNS setup, network hardware issues, etc.) and not in your MongoDB
deployment. The output of sh.status() doesn’t seem to show any notable
One more thing we also confirmed the dbhash of all config servers and it
was all fine.
seems to indicate that the cluster is operating normally, the config
servers are consistent with each other (which is vital to the operation of
a sharded cluster), and cluster balancing seems to operate normally as well.
Is there a pattern to this issue? For example, did you observe these
network-related errors happening more often in some particular time, during
particular load, etc.?
On another note, I would recommend you to upgrade to the latest in the
3.2 series, which is currently 3.2.6
bugfixes and improvements.