Re: HOw to avoid Initial sync with dropping all databases

From: Metikov Vadim <metikov.vadim@xxxxxxxxx>
To: mongodb-user <mongodb-user@xxxxxxxxxxxxxxxx>
Date: Mon, 25 Apr 2016 18:30:31 -0700 (PDT)
Why ads?
Mongod version is 2.6.11 and now it is a part of sharded cluster. Shard 
number 1 (unistore-1) consist of Primary , two arbiters and one secondary 
that has been restored from dump.

вторник, 26 апреля 2016 г., 4:05:48 UTC+5 пользователь Tim Hawkins написал:

What version and topology?
On 26 Apr 2016 01:34, "Metikov Vadim" <metiko...@xxxxxxxxx <javascript:>> 
wrote:

Hello there!

We have very big replicaset with 7 TB of data. Once apon a time we have 
noticed data corruption. After that all initial sync' can't be completed.
There are 3500 datafiles like grid_fs.XXXX. Ititial sync on new member 
breaks on 12th datafile and new member promoted to state SECONDARY without 
data (having only 12/3500 files).
At that moment we haven't health node. After last crashed secondary it 
was recovered by copying datafiles from primary and it have ame broken 
datafiles.
We have dumped all the data from those secondary with "mongodump 
--repair". But after restoring new member by mongorestore tool it wants to 
make initial sync with dropping all datafiles.
How to avoid this behavior?

Oplog size is 221GB, from 10th of march, but data dumped in april

Here are mongod log after adding it to replicaset:

2016-04-25T21:46:07.346+0500 ***** SERVER RESTARTED *****

2016-04-25T21:46:07.359+0500 [DataFileSync] BackgroundJob starting: 
DataFileSync

2016-04-25T21:46:07.359+0500 [DataFileSync] flushing diag log

2016-04-25T21:46:07.359+0500 shardKeyTest passed

2016-04-25T21:46:07.359+0500 isInRangeTest passed

2016-04-25T21:46:07.359+0500 shardObjTest passed

2016-04-25T21:46:07.359+0500 [initandlisten] MongoDB starting : pid=9972 
port=27017 dbpath=/var/lib/                                                 
                                                                            
             mongodb 64-bit host=russia

2016-04-25T21:46:07.359+0500 [initandlisten] db version v2.6.12

2016-04-25T21:46:07.359+0500 [initandlisten] git version: 
d73c92b1c85703828b55c2916a5dd4ad46535f6a

2016-04-25T21:46:07.359+0500 [initandlisten] build info: Linux 
build5.ny.cbi.10gen.cc 2.6.32-431.3.1                                 
                                                                            
                             .el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 
x86_64 BOOST_LIB_VERSION=1_49

2016-04-25T21:46:07.359+0500 [initandlisten] allocator: tcmalloc

2016-04-25T21:46:07.359+0500 [initandlisten] options: { config: 
"/etc/mongod.conf", diaglog: 7, net:                                       
                                                                            
                        { port: 27017 }, replication: { replSet: 
"unistore-1" }, storage: { dbPath: "/var/lib/mongodb" }, s                 
                                                                            
                                             ystemLog: { destination: 
"file", logAppend: true, path: "/var/log/mongodb/mongod.log", verbosity: 1 
                                                                            
                                                              } }

2016-04-25T21:46:07.360+0500 [initandlisten] flushing directory 
/var/lib/mongodb

2016-04-25T21:46:07.388+0500 [initandlisten] journal 
dir=/var/lib/mongodb/journal

2016-04-25T21:46:07.388+0500 [initandlisten] recover : no journal files 
present, no recovery needed

2016-04-25T21:46:07.389+0500 [initandlisten] flushing directory 
/var/lib/mongodb/journal

2016-04-25T21:46:07.390+0500 [initandlisten] flushing directory 
/var/lib/mongodb/journal

2016-04-25T21:46:07.390+0500 [initandlisten] enter repairDatabases (to 
check pdfile version #)

2016-04-25T21:46:07.397+0500 [initandlisten]    grid_fs

2016-04-25T21:46:07.397+0500 [initandlisten] opening db:  grid_fs

2016-04-25T21:46:07.731+0500 [initandlisten]    local

2016-04-25T21:46:07.731+0500 [initandlisten] opening db:  local

2016-04-25T21:46:07.739+0500 [initandlisten] done repairDatabases

2016-04-25T21:46:07.739+0500 [initandlisten] opening db:  admin

2016-04-25T21:46:07.746+0500 [initandlisten] query admin.system.roles 
planSummary: EOF ntoreturn:0 n                                             
                                                                            
                 toskip:0 nscanned:0 nscannedObjects:0 keyUpdates:0 
numYields:0 locks(micros) W:6541 r:61 nreturned:0                           
                                                                            
                                    reslen:20 0ms

2016-04-25T21:46:07.746+0500 [ClientCursorMonitor] BackgroundJob 
starting: ClientCursorMonitor

2016-04-25T21:46:07.746+0500 [PeriodicTaskRunner] BackgroundJob 
starting: PeriodicTaskRunner

2016-04-25T21:46:07.746+0500 [TTLMonitor] BackgroundJob starting: 
TTLMonitor

2016-04-25T21:46:07.746+0500 [IndexRebuilder] BackgroundJob starting: 
IndexRebuilder

2016-04-25T21:46:07.747+0500 [initandlisten] opening db:  local

2016-04-25T21:46:07.755+0500 [initandlisten] create collection 
local.startup_log { size: 10485760, capped: true }

2016-04-25T21:46:07.755+0500 [initandlisten] command local.$cmd command: 
create { create: "startup_log", size: 10485760, capped: true } ntoreturn:1 
keyUpdates:0 numYields:0  reslen:75 7ms

2016-04-25T21:46:07.755+0500 [initandlisten] insert local.startup_log 
ninserted:1 keyUpdates:0 numYields:0  0ms

2016-04-25T21:46:07.755+0500 [IndexRebuilder] opening db:  grid_fs

2016-04-25T21:46:07.756+0500 [rsStart] replSet beginning startup...

2016-04-25T21:46:07.756+0500 [rsStart] loadConfig() local.system.replset

2016-04-25T21:46:07.907+0500 [rsStart] query local.system.replset 
planSummary: COLLSCAN ntoreturn:1 ntoskip:0 nscanned:1 nscannedObjects:1 
keyUpdates:0 numYields:0 locks(micros) r:276 nreturned:1 reslen:260 0ms

2016-04-25T21:46:07.907+0500 [ConnectBG] BackgroundJob starting: 
ConnectBG

2016-04-25T21:46:07.911+0500 [rsStart] getMyAddrs(): [127.0.0.1] 
[IP.IP.IP.107] [IP.IP.IP.1] [10                                             
                                                                            
                 .10.10.100] [192.168.0.15] [10.1.0.1] [::1] 
[fd19:6c87:964:0:230:48ff:fed5:eccc] [fe80::230:48ff:fed                   
                                                                            
                                           5:eccc%br0] 
[fe80::ecb9:12ff:fe84:632b%lxcbr0]

2016-04-25T21:46:07.911+0500 [rsStart] getallIPs("IP.IP.IP.106"): 
[IP.IP.IP.106]

2016-04-25T21:46:07.911+0500 [ConnectBG] BackgroundJob starting: 
ConnectBG

2016-04-25T21:46:07.912+0500 [ConnectBG] BackgroundJob starting: 
ConnectBG

2016-04-25T21:46:07.913+0500 [rsStart] replSet REMOVED

2016-04-25T21:46:07.913+0500 [rsStart] replSet info self not present in 
the repl set configuration:

2016-04-25T21:46:07.913+0500 [rsStart] { _id: "unistore-1", version: 
140, members: [ { _id: 18, host : "IP.IP.IP.73:30000", arbiterOnly: true }, 
{ _id: 2, host: "IP.IP.IP.106:27017", priority: 2.0 }  , { _id: 27, host: "
79.110.251.109:37017", arbiterOnly: true } ] }

2016-04-25T21:46:07.913+0500 [rsStart] loadConfig() local.system.replset

2016-04-25T21:46:07.915+0500 [rsStart] query local.system.replset 
planSummary: COLLSCAN ntoreturn:1                                           
                                                                            
                    ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 
numYields:0 locks(micros) r:107 nreturned:1 resl                           
                                                                            
                                   en:260 0ms

2016-04-25T21:46:07.915+0500 [rsStart] replSet info Couldn't load config 
yet. Sleeping 20sec and wil                                                 
                                                                            
             l try again.

2016-04-25T21:46:07.916+0500 [IndexRebuilder] checking complete

2016-04-25T21:46:27.916+0500 [rsStart] query local.system.replset 
planSummary: COLLSCAN ntoreturn:1                                           
                                                                            
                    ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 
numYields:0 locks(micros) r:107 nreturned:1 resl                           
                                                                            
                                   en:260 0ms

2016-04-25T21:46:27.916+0500 [rsStart] replSet info Couldn't load config 
yet. Sleeping 20sec and wil                                                 
                                                                            
             l try again.

2016-04-25T21:46:47.916+0500 [rsStart] query local.system.replset 
planSummary: COLLSCAN ntoreturn:1                                           
                                                                            
                    ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 
numYields:0 locks(micros) r:85 nreturned:1 resle                           
                                                                            
                                   n:260 0ms

2016-04-25T21:46:47.916+0500 [rsStart] trying to 
contact IP.IP.IP.106:27017

2016-04-25T21:46:47.916+0500 [ConnectBG] BackgroundJob starting: 
ConnectBG

2016-04-25T21:46:48.070+0500 [rsStart] trying to 
contact IP.IP.IP.109:37017

2016-04-25T21:46:48.070+0500 [ConnectBG] BackgroundJob starting: 
ConnectBG

2016-04-25T21:46:48.071+0500 [rsStart] trying to 
contact IP.IP.IP.73:30000

2016-04-25T21:46:48.072+0500 [ConnectBG] BackgroundJob starting: 
ConnectBG

2016-04-25T21:46:48.280+0500 [rsStart] getMyAddrs(): [127.0.0.1] 
[IP.IP.IP.107] [IP.IP.IP.1] [10                                             
                                                                            
                 .10.10.100] [192.168.0.15] [10.1.0.1] [::1] 
[fd19:6c87:964:0:230:48ff:fed5:eccc] [fe80::230:48ff:fed                   
                                                                            
                                           5:eccc%br0] 
[fe80::ecb9:12ff:fe84:632b%lxcbr0]

2016-04-25T21:46:48.281+0500 [rsStart] getallIPs("IP.IP.IP.107"): 
[IP.IP.IP.107]

2016-04-25T21:46:48.281+0500 [rsStart] replSet I am IP.IP.IP.107:27017

2016-04-25T21:46:48.281+0500 [rsHealthPoll] BackgroundJob starting: 
rsHealthPoll

2016-04-25T21:46:48.281+0500 [rsHealthPoll] BackgroundJob starting: 
rsHealthPoll

2016-04-25T21:46:48.281+0500 [rsStart] replSet got config version 141 
from a remote, saving locally

2016-04-25T21:46:48.281+0500 [rsHealthPoll] BackgroundJob starting: 
rsHealthPoll

2016-04-25T21:46:48.281+0500 [rsStart] replSet info saving a newer 
config version to local.system.replset: { _id: "unistore-1", version: 141, 
members: [ { _id: 18, host: "IP.IP.IP.73:30000", arbiterOnly: true }, { 
_id: 2, host: "IP.IP.IP.106:27017", priority: 2.0 }, { _id: 27, host: 
"IP.IP.IP.109:37017", arbiterOnly: true }, { _id: 21, host: 
"IP.IP.IP.107:27017", priority: 10.0 } ] }

2016-04-25T21:46:48.281+0500 [rsStart] replSet saveConfigLocally done

2016-04-25T21:46:48.281+0500 [rsStart] replSet info Couldn't load config 
yet. Sleeping 20sec and will try again.

2016-04-25T21:47:07.370+0500 [DataFileSync] flushing mmaps took 11ms 
 for 4933 files

2016-04-25T21:47:07.370+0500 [DataFileSync] flushing diag log

2016-04-25T21:47:07.747+0500 [TTLMonitor] query admin.system.indexes 
query: { expireAfterSeconds: { $exi                                         
                                                                            
                 sts: true } } planSummary: EOF ntoreturn:0 ntoskip:0 
nscanned:0 nscannedObjects:0 keyUpdates:0 numYields                         
                                                                            
                                 :0 locks(micros) r:62 nreturned:0 
reslen:20 0ms

2016-04-25T21:47:07.747+0500 [TTLMonitor] query grid_fs.system.indexes 
query: { expireAfterSeconds: { $e                                           
                                                                            
               xists: true } } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 
nscanned:10 nscannedObjects:10 keyUpdates:0                                 
                                                                            
                          numYields:0 locks(micros) r:97 nreturned:0 
reslen:20 0ms

2016-04-25T21:47:07.747+0500 [TTLMonitor] query local.system.indexes 
query: { expireAfterSeconds: { $exi                                         
                                                                            
                 sts: true } } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 
nscanned:3 nscannedObjects:3 keyUpdates:0 numY                             
                                                                            
                             ields:0 locks(micros) r:38 nreturned:0 
reslen:20 0ms

2016-04-25T21:47:07.748+0500 [clientcursormon] mem (MB) res:74 
virt:20163493

2016-04-25T21:47:07.748+0500 [clientcursormon]  mapped (incl journal 
view):20163016

2016-04-25T21:47:08.281+0500 [rsStart] query local.system.replset 
planSummary: COLLSCAN ntoreturn:1 ntoskip:0 nscanned:1 nscannedObjects:1 
keyUpdates:0 numYields:0 locks(micros) r:102 nreturned:1 reslen:326 0ms

2016-04-25T21:47:08.281+0500 [rsStart] trying to 
contact IP.IP.IP.106:27017

2016-04-25T21:47:08.282+0500 [rsStart] trying to 
contact IP.IP.IP.109:37017

2016-04-25T21:47:08.283+0500 [rsStart] trying to 
contact IP.IP.IP.73:30000

2016-04-25T21:47:08.496+0500 [rsStart] replSet I am IP.IP.IP.107:27017

2016-04-25T21:47:08.498+0500 [rsHealthPoll] BackgroundJob starting: 
rsHealthPoll

2016-04-25T21:47:08.498+0500 [rsHealthPoll] BackgroundJob starting: 
rsHealthPoll

2016-04-25T21:47:08.498+0500 [rsHealthPoll] BackgroundJob starting: 
rsHealthPoll

2016-04-25T21:47:08.498+0500 [rsStart] replSet STARTUP2

2016-04-25T21:47:08.498+0500 [rsMgr] BackgroundJob starting: rsMgr

2016-04-25T21:47:08.498+0500 [rsHealthPoll] replSet 
member IP.IP.IP.106:27017 is up

2016-04-25T21:47:08.498+0500 [rsMgr] replSet total number of votes is 
even - add arbiter or give one member an extra vote

2016-04-25T21:47:08.498+0500 [rsGhostSync] BackgroundJob starting: 
rsGhostSync

2016-04-25T21:47:08.498+0500 [rsSync] replSet initial sync pending

2016-04-25T21:47:08.498+0500 [SyncSourceFeedbackThread] BackgroundJob 
starting: SyncSourceFeedbackThread

2016-04-25T21:47:08.499+0500 [rsSync] replSet initial sync need a member 
to be primary or secondary to do our initial sync

2016-04-25T21:47:08.499+0500 [rsHealthPoll] replSet 
member IP.IP.IP.106:27017 is now in state PRIMARY

2016-04-25T21:47:08.500+0500 [rsHealthPoll] replSet 
member IP.IP.IP.73:30000 is up

2016-04-25T21:47:08.500+0500 [rsHealthPoll] replSet 
member IP.IP.IP.73:30000 is now in state ARBITER

2016-04-25T21:47:10.498+0500 [rsHealthPoll] replSet 
member IP.IP.IP.109:37017 is up

2016-04-25T21:47:10.498+0500 [rsHealthPoll] replSet 
member IP.IP.IP.109:37017 is now in state ARBITER

2016-04-25T21:47:10.499+0500 [ConnectBG] BackgroundJob starting: 
ConnectBG

2016-04-25T21:47:24.499+0500 [rsSync] replSet initial sync pending

2016-04-25T21:47:24.499+0500 [rsSync] replSet syncing 
to: IP.IP.IP.106:27017

2016-04-25T21:47:24.499+0500 [ConnectBG] BackgroundJob starting: 
ConnectBG

2016-04-25T21:47:24.500+0500 [rsSync] Database::_addNamespaceToCatalog 
ns: local.replset.minvalid

2016-04-25T21:47:24.500+0500 [rsSync] ExtentManager::increaseStorageSize 
ns:local.replset.minvalid desiredSize:8192 fromFreeList: 0 eloc: 0:239d000

2016-04-25T21:47:24.501+0500 [rsSync] Database::_addNamespaceToCatalog 
ns: local.replset.minvalid.$_id_

2016-04-25T21:47:24.501+0500 [rsSync] build index on: 
local.replset.minvalid properties: { v: 1, key: {  _id: 1 }, name: "_id_", 
ns: "local.replset.minvalid" }

2016-04-25T21:47:24.501+0500 [rsSync] local.replset.minvalid: clearing 
plan cache - collection info cache reset

2016-04-25T21:47:24.501+0500 [rsSync] allocating new extent

2016-04-25T21:47:24.501+0500 [rsSync] ExtentManager::increaseStorageSize 
ns:local.replset.minvalid.$_id_desiredSize:131072 fromFreeList: 0 eloc: 
0:239f000

2016-04-25T21:47:24.501+0500 [rsSync]    added index to empty collection

2016-04-25T21:47:24.501+0500 [rsSync] local.replset.minvalid: clearing 
plan cache - collection info cache reset

2016-04-25T21:47:24.501+0500 [rsSync] replSet initial sync drop all 
databases

2016-04-25T21:47:24.508+0500 [rsSync] dropAllDatabasesExceptLocal 2

2016-04-25T21:47:24.508+0500 [rsSync] dropDatabase grid_fs

2016-04-25T21:47:24.524+0500 [rsSync] lsn set 40381

2016-04-25T21:47:24.541+0500 [rsSync] removeJournalFiles

2016-04-25T21:47:24.542+0500 [rsSync] flushing directory 
/var/lib/mongodb/journal

2016-04-25T21:47:24.543+0500 [rsSync] removeJournalFiles end

2016-04-25T21:47:26.501+0500 [ConnectBG] BackgroundJob starting: 
ConnectBG

2016-04-25T21:47:27.121+0500 [ConnectBG] BackgroundJob starting: 
ConnectBG

-- 
You received this message because you are subscribed to the Google Groups 
"mongodb-user"
group.
 
For other MongoDB technical support options, see: 
https://docs.mongodb.org/manual/support/
--- 
You received this message because you are subscribed to the Google Groups 
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an 
email to mongodb-user...@xxxxxxxxxxxxxxxx <javascript:>.
To post to this group, send email to mongod...@xxxxxxxxxxxxxxxx 
<javascript:>.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/mongodb-user/b13a23f4-3442-4c3a-a8c8-7dbb1b1caee1%40googlegroups.com ;
<https://groups.google.com/d/msgid/mongodb-user/b13a23f4-3442-4c3a-a8c8-7dbb1b1caee1%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
--- 
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@xxxxxxxxxxxxxxxx.
To post to this group, send email to mongodb-user@xxxxxxxxxxxxxxxx.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c2b91678-7a82-420a-8330-7167da484cf2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Why ads?