Overview
This transformer arose because of a question on the
FMETalk user group.
I'm trying to render a hot spot map of crimes. Using other GIS systems, I can easily do a Kernel Density type analysis which will create contours of dense areas. I'd like to do this in FME, but am struggling! Any thoughts please?
FME does have a
DensityCalculator transformer, but this is a little different. Basically we need to calculate density (or clustering) by taking each point and assessing the density of similar points within the same area.
Input
This transformer accepts as input any set of point features.
Above: In my test case I used a set of address points.
You can see where there are clusters of addresses, but can we write this to a format for better visualization?
Output
Yes we can! The output from this transformer can be either a set of contours, or a surface feature. You could write this to many formats, including ones capable of true 3D visualization.
Above: 3D PDF output opened in Adobe Viewer. This is showing both surface and contours, though of course you could just use one or the other.
The one issue about the output is that it can look fairly flat, even in the densest areas. To counter this, the Z values are all exaggerated by a factor of 10.
Methodology
The method is very simple (
See attachment: ClusterModeller.zip). I just find the number of neighbors within a given radius for each point, and use that value as the Z in a DEM/Surface Model. The trick is to find the best radius to give a good result. This transformer offers the user the ability to enter that value, or will calculate one automatically based on the average distance between the input points.
Source Data
The source data in my example comes from the
FME Sample Dataset (address points for Austin, TX) and from the City of Vancouver's excellent
open data catalogue (heritage sites for Vancouver, BC).
Detailed Description
Here's the transformer as a whole. You can see how it is made up of two main parts.
Above: The custom transformer definition.
Let's take a closer look at the section in the blue
bookmark, as this is where all the real work takes place.
Above: The processing section.
This starts out with a
NeighborFinder transformer. Rather than the normal Base/Candidate use, we're using the NeighborFinder in a special mode in which the input is both Base AND Candidate. We set a maximum radius for finding neighbors (what I call 'tolerance') and set a list attribute to store the results.
The
ListElementCounter is then used to find how many entries are in the list - i.e. how many neighbors cluster around this point, within the defined tolerance. Unmatched points (those with no neighbors) are set to 1.
The
3DForcer transformer simply sets the Z coordinate of each point to this value, and the
Scaler multiplies it by 10 to emphasize the results.
Finally, the SurfaceModeller turns the points and their Z values into a digital elevation model (DEM) and uses that to create a set of contours and a TIN surface.
The contours are simple line features compatible with almost any format. The surface is a true 3D feature, compatible with formats supporting 3D (PDF, Geodatabase, 3DS, etc).
You might be asking, "what did the red section of workspace do?". That's where we calculate a tolerance if the user did not set one.
Above: The tolerance calculation section.
Here the
NeighborFinder is again used, but this time we want to find the distance between each point and its nearest neighbor.
Then a
StatisticsCalculator and an
ExpressionEvaluator are used to calculate a tolerance value based on the average distance between points.
The
FeatureMerger then attaches that tolerance value back onto the original features.
Limitations
There are some obvious limitations, but I should probably point them out:
- I've no idea if this method is really suitable for modelling Clusters. It's just something I thought of that seems to produce a good result.
- One thing this definitely does not do is prove whether the clusters are statistically significant. They could just be random patterns.
- The tolerance calculation is convenient when you aren't sure about your data, but again it's just a method which seems to produce an acceptable result. You could experiment with values to get a better result.
Updates
Since I took the screenshots above, I made the following updates:
- Get user input on the Z exaggeration factor.
- Write to the log window a message about what tolerances are being calculated.
Future Updates
As this isn't for a particular project I don't really plan to make further updates. However I could....
- Calculate a contour interval in a similar way to the automatic tolerance calculation.
- Calculate a statistical significance figure. I'm thinking...
- Create the same number of points as the input, in the same X/Y extents, but in random locations.
- Create a surface from the random locations.
- Calculate the volume of the clustered surface minus the random surface
- The closer the value is to zero, the less clustered the original data is.