I am using Pyspark to analyse a BSON File. When I run my program on a
deflated BSON file, it runs perfectly fine.
However, running the same program on the same file now compressed gives me
an empty RDD.
I am using mongo-java-driver.jar 3.2.2, mongo-hadoop-spark.jar 1.5.2,
pymongo_spark and pymongo-3.2.2
The deployed Spark version is 1.6.1 and Hadoop 2.6.4.
I am aware that the current library does not support splitting compressed
bson files, however it in my opinion, it should work with a single split. I
have hundreds of them files to analyse, so deflating all of those does not
seem a viable option.
Can anyone please give a direction in order to proceed?
You received this message because you are subscribed to the Google Groups "mongodb-user"