[destination-mongodb] Out of memory error when copying from temporary to permanent collection #48851
Open
1 task done
Labels
area/connectors
Connector related issues
autoteam
community
needs-triage
team/connectors-python
type/bug
Something isn't working
Connector Name
destination-mongodb
Connector Version
0.2.0
What step the error happened?
During the sync
Relevant information
Hello,
I am reporting this out of memory error which happens with the MongoDB destination connector.
The error happens at the end of a synchronisation job, when the mongoDB destination connector tries to copy all the synchronised records from a temporary collection to a permanent.
I believe this error is caused by the copyTable function of the mongodb connector: link to source code
All the documents to copy are put in a list in memory to be inserted in the new collection, without any batching. So if there are too many documents, the local memory can be overfilled by this huge list.
This is consistent with my experience where the error happens when I am trying to synchronise a high volume data.
I suggest to fix this by adding some batching. The following code snippet would use a batch size which is fixed size in terms of number of documents:
The code snippet I give would provide some improvements over the current behavior.
The best would be to have a batch which has a fixed size of memory, and not just a fixed size of number of documents (as the size of each mongo document may vary).
Relevant log output
2024-12-05 20:00:13 replication-orchestrator INFO Stream Status Update Received: orders - COMPLETE 2024-12-05 20:00:13 replication-orchestrator INFO Updating status: orders - COMPLETE 2024-12-05 20:00:17 destination INFO i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):80 Airbyte message consumer: succeeded. 2024-12-05 20:00:17 destination INFO i.a.i.d.m.MongodbRecordConsumer(close):90 Migration finished with no explicit errors. Copying data from tmp tables to permanent 2024-12-05 20:09:11 destination INFO Malformed non-Airbyte record (connectionId = 9bb79ad9-f258-4d97-bbe6-03f691541e9e): Terminating due to java.lang.OutOfMemoryError: Java heap space 2024-12-05 20:09:12 replication-orchestrator INFO Destination finished successfully — exiting read dest... 2024-12-05 20:09:12 replication-orchestrator INFO readFromDestination: exception caught 2024-12-05 20:09:12 replication-orchestrator INFO readFromDestination: done. (writeToDestFailed:false, dest.isFinished:true) 2024-12-05 20:09:12 replication-orchestrator INFO Closing StateCheckSumCountEventHandler
Contribute
The text was updated successfully, but these errors were encountered: