Description

The NeighborFinder locates the nearest 'candidate' feature to a 'base' feature and copies the candidates attributes over to the base feature.


User-added image

 

NeighborFinder and Format Attributes

Problems may occur when the base and candidate features have different geometry types.

For example, when using point features as the base and line features as the candidate all candidate attributes are copied to the nearest base feature. This includes 'format attributes' such as fme_geometry. So after processing the base point features now have a geometry type of mif_polyline (for example) when they ought to be mif_point

 

Candidates Only Mode

The NeighborFinder has a special "candidates only" mode for when the incoming data is all in one group, rather than two groups (base/candidate).

This mode is activated by simply making a connection to the CANDIDATE port only, and ignoring the BASE port. Then all CANDIDATEs will be compared with all other CANDIDATEs, but will not be compared to themselves.

 

Performance Tip

The greater the max distance, and the more features that fall within it, the longer the process will take. Basically we create a list of features to consider and compare them, vertex by vertex, candidate against base, to find the shortest distance between the two features. This is a computationally intense step.


To improve performance you may get better results by using more than one NeighbourFinder.


For example, you believe most candidates will be within 10m of the base features, but want to set a maximum of 1000m to be absolutely sure.

 

  • Set a NeighborFinder with a 10m max distance
  • Any UNMATCHED_BASE features go into a second NeighborFinder with a 1000m max distance
     

That way the majority of base features are only compared against a much smaller subset of candidates.


The effect is greatest when most candidates are very close, but some may be very far.

 

Example

The attached workspaces show examples of the NeighborFinder transformer.


This example demonstrates a typical use for this transformer, and also how the Candidates First setting can be applied to improve performance.
 

Scenario

Here we have two source datasets; one of residences (addresses) in GeoMedia format, and one of city landmarks in AutoCAD DWG. The scenario is we wish to find the nearest landmark to each address, with a maximum distance to the landmark of one mile (5280 feet).
 

Workspace

This workspace uses the standard FME sample dataset.


Example 1 workspace
Above: The workspace. Approximately 13,000 addresses are being processed against 110 landmarks.

 

Results

The results are output according to whether or not a match was found. A second workspace (Example 2)  was created that included a Bufferer and Dissolver to show the 5280 foot boundary and prove the results are correct.
 

Filtering

User-added image
Above: Red addresses are inside the one mile limit and are matches. Grey addresses are outside and unmatched. The blue points are the landmark features.

 

Attributes

A second result is that the neighbor's attributes are automatically copied onto the base feature referencing it.


User-added image
Above: On querying this address we find that it is 383ft from the nearest landmark, which is Windsor Village.

 

Visualization

One interesting point is that a couple of matched addresses do appear to fall outside the boundary in the above image. That's because - for purposes of speed* - the buffer Interpolation Angle setting was left at 22.5 degrees, and thus resulted in only a crude approximation. This proves a) the processing is using the true distance, so the result is correct; and b) don't be complacent about default values in the settings dialog, if you set the buffer angle to 1 then the results look a lot better.


User-added image
Above: Although this looks poor, the results are actually good.

 

Candidates First

When set to "Yes", the Candidates First setting informs the transformer that Candidate features will arrive first. Consequently, when the first base feature arrives the Candidates port is closed. Now the transformer can process Base features immediately, without having to cache them in case some candidates are yet to arrive.

The benefits of this are improved performance - the process will be faster and use less memory.


User-added image
Above: The NeighborFinder settings dialog.


However, setting to yes does require that candidates do arrive first. The user controls Candidate/Base order by rearranging dataset readers in the navigation pane. The uppermost reader should be the candidates reader as it will be read first, and therefore arrive first at the transformer. Right-click a reader to find options for moving it up/down the list.


User-added image
Above: The Navigator Pane. Note how Landmarks (our Candidates) is the uppermost dataset.

 

Log Files

At first the Candidates First option appears to be slowing things down, taking 11.4 seconds to read the data, rather than 7.4:
 

  • Candidates First = No
2008-08-07 12:27:25|   7.4|  1.0|INFORM|Reading source feature # 10000
  • Candidates First = Yes
2008-08-07 12:29:10|  11.4|  2.1|INFORM|Reading source feature # 10000



But this is misleading and you shouldn't be taken in. The Candidates First = Yes translation is not just reading the data, but processing it as it goes. Therefore it is almost complete. The Candidates First = No translation has only read the data and cached it to disk. It must still do the processing.

 

  • Candidates First = No
2008-08-07 12:27:31|  13.3|  0.0|INFORM|Translation was SUCCESSFUL with 0 warning(s) (0 feature(s)/0 coordinate(s) output)
2008-08-07 12:27:31|  13.3|  0.0|INFORM|FME Session Duration: 13.2 seconds. (CPU: 12.0s user, 0.5s system)
2008-08-07 12:27:31|  13.3|  0.0|INFORM|END - ProcessID: 4368, peak process memory usage: 96308 kB, current process memory usage: 58552 kB.
  • Candidates First = Yes
 2008-08-07 12:29:12| 12.7| 0.0|INFORM|Translation was SUCCESSFUL with 0 warning(s) (0 feature(s)/0 coordinate(s) output) 2008-08-07 12:29:12| 12.7| 0.0|INFORM|FME Session Duration: 12.8 seconds. (CPU: 11.5s user, 0.5s system) 2008-08-07 12:29:12| 12.7| 0.0|INFORM|END - ProcessID: 6940, peak process memory usage: 56208 kB, current process memory usage: 48244 kB. 


 

That's more like it! The Candidates First = Yes translation has finished first. More importantly it's peak memory usage is only 50% of the regular translation.
 

Caution

It's important to use this setting only when you are sure of the feature order.

Firstly, the candidates input port is closed the moment the first base feature arrives, therefore any subsequent candidates will be ignored.

 

2008-08-07 13:30:42|   9.0|  0.0|WARN  |NeighborFinder(ProximityFactory): Extra Candidate feature(s) encountered and ignored


Above: This is the log message that indicates ignored candidate features


Secondly, if there are no candidates at all (ie a base feature is the first to arrive) then the transformer will NOT halt the translation. It will carry on and ALL base features will be classed as unmatched. This is not the same behaviour as the Clipper which will stop the translation should there be no Clippers.

 

2008-08-07 13:42:08|   9.1|  0.0|STATS |NeighborFinder(ProximityFactory): Input Summary:  12292 Base feature(s), 0 Candidate features(s)


Above: You will always get a summary that will tell you exactly how many bases and candidates were used, but as a statistic and not a warning
 

Summary

Although this example only demonstrates a small difference in speed, the source datasets were relatively small. The larger the datasets (particularly the Base features) the greater the difference. In fact at some point it is going to make the difference between a translation succeeding or failing due to a lack of memory, and that's why this setting is so useful.

* OK, the real reason is that I plain forgot the Bufferer setting and blamed the bad output on the Viewer. As usual PEBKAC!