I knew Open XML was a good thing but I was still pleasantly surprised by this one: we have achieved a 300 to 1 performance improvement on a data export job by migrating from a Microsoft Excel automation solution to Open XML. Here are the details:
Our client wanted to extract data from a SQL Server database to perform analysis in Microsoft Excel.
The client wanted the export to be driven from a management system that already provided a user-friendly interface for selecting columns as well as filtering and sorting rows.
Option 1 – Comma Separated Values
The first idea that comes to mind when exporting data for Excel is to generate a file with comma separated values (file extension/acronym: CSV).
The problem with this approach is that the results depend on the client’s internationalization configuration. A file which opens just fine on a computer configured for the U.S. might give unspecified results when opened from a European desktop having different settings for currency formats and list separators.
Option 2 – Microsoft Excel Automation
Microsoft Excel Automation is a means of controlling this application from scripts which can essentially mimic a user entering data on a sheet. The scripts are written in Visual Basic for application (VBA), a fairly easy language to master for entry-level programmers and enlightened power users.
On the plus side, you get a genuine Microsoft Excel file which will open correctly on any desktop. You also get to specify cell formatting and to structuring the data any way you want.
On the minus side, this method is excruciatingly slow. In our tests, exporting 4,000 rows of 10 columns took more than 10 minutes, which wasn’t acceptable for the client.
Option 3 – Open XML
Open XML is the native document format for Office 2007. Excel Open XML files are easily recognisable by their “.XLSX” extension.
The solution implemented by the Xpertdoc team involves creating a template Excel file and then injecting data in its XML structure using a DotNet program.
The same export that took 10 minutes in Microsoft Excel Automation now completes in less than two seconds. Overall the solution is also more robust, easier to maintain and it does not require Microsoft Excel to be installed on the system (good for server-side operation).