Description
Excel is a popular spreadsheet application by Microsoft. It is read by FME as non-geometric (or non-spatial) data. When you select an Excel file (.xls) for input, you should go into the settings and select the desired worksheet (table).
The Microsoft Excel Reader/Writer works through ADO.
ADO is used to directly read and write Microsoft based technology using
ActiveX Data Objects (ADO) interfaces, rather than going through ODBC. The result is improved functionality and performance (in some cases, the improvement is dramatic).
For Excel, ADO specifically permits:
-
Handling of Mixed Types
-
Drop Table First option in writer
Known Issues
Mixed-type Columns
The default ADO settings for Excel have a high probability of misidentifying column types and silently dropping data. For example, if the first 8 data rows of a column contain numbers, any non-numeric data that follows will not be read. Modern versions of FME try to lessen the likelihood of this occurring, and there are further steps users can take to improve the situation.
The general algorithm used by ADO to determine the type of an Excel column is as follows:
-
Scan the first X rows of the spreadsheet (possibly skipping a header), where X is the TypeGuessRows value specified in the registry. If TypeGuessRows is 0, all rows are scanned. TypeGuessRows is initially set to 8 and may not be larger than 16.
-
If only a single data type is seen during the scan, that is the data type chosen for the column. All data in the column not compatible with the chosen type will be ignored.
-
If multiple data types are seen during the scan, the ImportMixedTypes registry value is consulted (because FME specifies IMEX=1 on the connection string).
-
ImportMixedTypes = Text (default) results in FME choosing varchar as the column type, and all data in the column being read as text.
-
ImportMixedTypes = Majority Type results in FME choosing the type that represents the majority of the data, with a preference towards numeric types in the case of ties. All data in the column not compatible with the chosen type will be ignored.
Engines
There are multiple engines available for reading Excel files with ADO. ‘Jet’ only considers the data in a column when performing type scanning. ‘ACE’ (
Access Connectivity Engine) considers both the data and the data formatting when performing type scanning. In practice, this means ACE is more likely to choose the safer type of ‘text’ for columns.
How Versions of FME Differ
FME 2009 and earlier only support the Jet engine. FME 2010 supports both Jet and ACE, and defaults to ACE. In order to resolve concurrent reading/writing issues, FME 2011 changed the default engine back to Jet but retained support for ACE if Jet is not present or the file extension is .xlsx or .xlsb. All of these versions of FME are vulnerable to the “first 8 rows” scenario described above, but because ACE is more likely to choose text for column types, there are files that will be correctly read in FME 2010 but not FME 2011.
As of FME 2012 beta 12055, FME temporarily modifies the registry settings that affect the Jet engine such that TypeGuessRows=0 and ImportMixedTypes=Text. This will ensure mixed-type columns are identified as text and no data is lost on read. ACE continues to work as before, but warnings are produced if incorrect registry settings are detected.
External Resources
Registry Keys
-
HKLM\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
-
HKLM\SOFTWARE\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel
Q+A
Q) FME seems to read more Excel features than I have records in the spreadsheet - why?A) The library we use to read Excel files will return a record for any rows that have null values - the problem is that on opening a spreadsheet in Excel it is not easily apparent whether a cell is null or non-existent.
The safest way to avoid this problem is when you are using Excel make sure any data deletions are carried out by right-clicking the row number and choosing 'delete';i.e. delete the row. If you simply select the cells within the row and press the delete key you risk deleting the cell contents but leaving the row intact - it is these rows which will be read by FME as blank records.
In order to remove these extra features a
Tester can be used.