-
Hi, Can anyone share any proper tips or channel setups that would allow me to read in large CSV files (5-10GB) and insert those records into a SQL database? I have a File Reader that is technically working, but even a smaller CSV (1GB) with 100k records seems to take an abnormally long time, so I am worried about the actual large files. My settings are:
Source Transformer has the following Javascript code: var sql = "INSERT INTO [dbo].[test_table] SELECT "; var dbConn = DatabaseConnectionFactory.createDatabaseConnection('net.sourceforge.jtds.jdbc.Driver','jdbc:jtds:sqlserver) dbConn.executeUpdate(sql); I am doing this correctly or is there a more efficient way people are using when dealing with large CSVs? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
My tip is don't use mirth for this. It's far to slow. Use native command line database utilities for large data imports and exports. That is bcp in your case with MSSQL. |
Beta Was this translation helpful? Give feedback.
-
To attempt to give a mirth answer, you're running into two bottle necks. First, e4x gets exponentially slower the more children you add to a single element (rows, in the case of delimited text.) Your choices here are to (in increasing amounts of work)
Once you are able to get the data, it looks like it is possible to do batch inserts to speed up the process over inserting each row individually. See
Additionally, the microsoft driver has an option for rewriting batch operations (with limitations) to use the Bulk Copy API. I imagine if your data supports this use case it would be even faster. https://docs.microsoft.com/en-us/sql/connect/jdbc/use-bulk-copy-api-batch-insert-operation?view=sql-server-ver15 If you are using mirth batch processing, you could store your database connection in the globalChannelMap between messages and build up your batch insert across multiple messages. Then you could execute the batch after so many rows and at the end of the mirth batch. |
Beta Was this translation helpful? Give feedback.
My tip is don't use mirth for this. It's far to slow. Use native command line database utilities for large data imports and exports. That is bcp in your case with MSSQL.