I have also tried the variants which are commented out in the code. But all
are equally slow. For a collection of size 2GB (100000 rows and 1000 columns
), it takes around 6 hours(holy moly :/) on a cluster of 3 machines each
with 12 cores and 72 GB RAM (using all the cores in this spark cluster).
Mongodb server is also running on one of these machines.
I am not sure if I am doing it correctly. Any pointers on how to optimize
this code would be really helpful.
You received this message because you are subscribed to the Google Groups "mongodb-user"