Overview

This transformer arose because of a question on the FMETalk user group.

I'm trying to render a hot spot map of crimes.  Using other GIS systems, I can easily do a Kernel Density type analysis which will create contours of dense areas.  I'd like to do this in FME, but am struggling! Any thoughts please?

FME does have a DensityCalculator transformer, but this is a little different. Basically we need to calculate density (or clustering) by taking each point and assessing the density of similar points within the same area.

Input

This transformer accepts as input any set of point features.

User-added image
Above: In my test case I used a set of address points.

You can see where there are clusters of addresses, but can we write this to a format for better visualization?

Output

Yes we can! The output from this transformer can be either a set of contours, or a surface feature. You could write this to many formats, including ones capable of true 3D visualization.

User-added image
Above: 3D PDF output opened in Adobe Viewer. This is showing both surface and contours, though of course you could just use one or the other.


The one issue about the output is that it can look fairly flat, even in the densest areas. To counter this, the Z values are all exaggerated by a factor of 10.

Methodology

The method is very simple (See attachment: ClusterModeller.zip). I just find the number of neighbors within a given radius for each point, and use that value as the Z in a DEM/Surface Model.  The trick is to find the best radius to give a good result. This transformer offers the user the ability to enter that value, or will calculate one automatically based on the average distance between the input points.

Source Data

The source data in my example comes from the FME Sample Dataset (address points for Austin, TX) and from the City of Vancouver's excellent open data catalogue (heritage sites for Vancouver, BC).

Detailed Description

Here's the transformer as a whole. You can see how it is made up of two main parts.

User-added image
Above: The custom transformer definition.


Let's take a closer look at the section in the blue bookmark, as this is where all the real work takes place.

User-added image
Above: The processing section.


This starts out with a NeighborFinder transformer. Rather than the normal Base/Candidate use, we're using the NeighborFinder in a special mode in which the input is both Base AND Candidate. We set a maximum radius for finding neighbors (what I call 'tolerance') and set a list attribute to store the results.


The ListElementCounter is then used to find how many entries are in the list - i.e. how many neighbors cluster around this point, within the defined tolerance. Unmatched points (those with no neighbors) are set to 1.


The 3DForcer transformer simply sets the Z coordinate of each point to this value, and the Scaler multiplies it by 10 to emphasize the results.


Finally, the SurfaceModeller turns the points and their Z values into a digital elevation model (DEM) and uses that to create a set of contours and a TIN surface.
The contours are simple line features compatible with almost any format. The surface is a true 3D feature, compatible with formats supporting 3D (PDF, Geodatabase, 3DS, etc).


You might be asking, "what did the red section of workspace do?". That's where we calculate a tolerance if the user did not set one.

User-added image
Above: The tolerance calculation section.


Here the NeighborFinder is again used, but this time we want to find the distance between each point and its nearest neighbor.


Then a StatisticsCalculator and an ExpressionEvaluator are used to calculate a tolerance value based on the average distance between points.


The FeatureMerger then attaches that tolerance value back onto the original features.

Limitations

There are some obvious limitations, but I should probably point them out:

  • I've no idea if this method is really suitable for modelling Clusters. It's just something I thought of that seems to produce a good result.
  • One thing this definitely does not do is prove whether the clusters are statistically significant. They could just be random patterns.
  • The tolerance calculation is convenient when you aren't sure about your data, but again it's just a method which seems to produce an acceptable result. You could experiment with values to get a better result.

Updates

Since I took the screenshots above, I made the following updates:
  • Get user input on the Z exaggeration factor.
   
User-added image

  • Write to the log window a message about what tolerances are being calculated.
User-added image

Future Updates

As this isn't for a particular project I don't really plan to make further updates. However I could....

  • Calculate a contour interval in a similar way to the automatic tolerance calculation.
  • Calculate a statistical significance figure. I'm thinking...
    • Create the same number of points as the input, in the same X/Y extents, but in random locations.
    • Create a surface from the random locations.
    • Calculate the volume of the clustered surface minus the random surface
    • The closer the value is to zero, the less clustered the original data is.