crownsegmentr is an R package with a C++ implementation of the AMS3D
algorithm (Ferraz et. al, 2016) for tree crown
segmentation in airborne lidar data. For a general description of how
the tree segmentation works, see the documentation of the segment_tree_crowns
generic. Pseudo code of the AMS3D algorithm is listed below.
This is a basic example which shows you how to segment a normalized point cloud and visualize the result:
# Load a point cloud of some trees included in the lidR package
point_cloud <- lidR::readLAS(system.file(
"extdata/MixedConifer.laz",
package = "lidR"
))
# Segment a normalized point cloud
segmented_point_cloud <- crownsegmentr::segment_tree_crowns(
point_cloud,
crown_diameter_to_tree_height = 0.25,
crown_height_to_tree_height = 0.5
)
# Generate random crown colors
crown_colors <- lidR::pastel.colors(
n = length(unique(segmented_point_cloud@data[["crown_id"]]))
)
# Plot the segmented crown bodies
lidR::plot(
segmented_point_cloud,
color = "crown_id",
pal = crown_colors,
nbreaks = length(crown_colors),
size = 3,
axis = TRUE
)The package provides four functions: The central function segment_tree_crowns(),
the preprocessing functions watershed_diameter_raster()
and li_diameter_raster(), and
the postprocessing function remove_small_trees().
This is the central function of the package, which implements the AMS3D-Algorithm. It takes a point cloud and returns the same point cloud with an additional attribute called “crown_id” (or differently, depending on user specification). In addition, the user must specify the two arguments crown_diameter_to_tree_height and crown_length_to_tree_height, to determine the size of the search kernel depending on the estimated shapes of the trees. Those values can either be given as single numbers, or as raster, which allows to account for different tree shapes in different parts of the point cloud. They can also be set to 0, if non-zero values are given for crown_diameter_constant and crown_length_constant, respectively. For details see the documentation of the function, or the pseudo code of the algorithm.
These two preprocessing functions do the same thing, but in different
ways: They take a point cloud, and do a preliminary segmentation of tree
crowns with a quick algorithm (lidR::li2012() or
lidR::watershed()). Then, they return a raster of local
crown_diameter_to_tree_height ratios, based on the preliminary
segmentation. This raster can then be used as input for
segment_tree_crowns. Usually, this improves the segmentation accuracy of
the AMS3D algorithm. However, in cases where the preliminary
segmentation performs poorly, it may be worse than specifying reasonable
fixed numbers for crown_diameter_to_tree_height. The watershed variant
requires the Bioconductor package EBImage. If you have it
available, watershed_diameter_raster() is recommended.
Otherwise, li_diameter_raster() offers a good alternative.
Both functions can also be called from within
segment_tree_crowns(), by specifying
crown_diameter_to_tree_height = “li2012” or
crown_diameter_to_tree_height = “watershed”. In these
cases, the functions will be called with default parameters. In order to
vary parameters, the function must be called manually.
The AMS3D-Algorithm often yields numerous smaller crown clusters in
the lower part of the point cloud, especially if the point cloud is thin
in its lower parts, or the constants for crown diameter and crown length
are small. To reduce such over-segmentation, you can post-process the
segmentation result with remove_small_trees(). Crown ID’s
whose crowns have diameters or heights below user-defined thresholds
will be set to NA.
The AMS3D algorithm works best for airborne laser scanning (ALS) point clouds with densities between 5 and 20 points per m². Typically, point densities higher than this do not improve the accuracy of the segmentation, while substantially increasing the computation time: The processing time increases roughly quadratically with the density of the input point cloud. For high-density point clouds, we recommend thinning the point cloud before segmentation, or using other algorithms.
The code base ofsegment_tree_crowns() is split into an R
“front-end” and a C++ “back-end”. The other three functions are less
complex, and implemented in one R file each.
The function segment_tree_crowns is an S4 generic ,
i.e. it can handle point cloud data stored in different data types and
behave differently according to that type. More specifically, the
generic function chooses one out of several so called “methods” based on
the input data type. There are methods for
data.frames/data.tables,lidR::LAS
objects, andlidR::LAScatalogs.While the other functions (li_diameter_raster(),
watershed_diameter_raster() and
remove_small_trees()) are also implemented as S4 generics,
they can only take point clouds of type lidR::LAS.
The data type specific methods deal with specifics of their data
type. The actual segmentation is done by the internal function segment_tree_crowns_core,
which is used by the data.frame/data.table and
lidR::LAS methods. The lidR::LAScatalog method
internally calls the lidR::LAS method.
This function performs the segmentation by first calling the C++
back-end to calculate centroids and by then clustering the terminal
centroids with the DBSCAN algorithm (as implemented in the dbscan::dbscan
function). It takes point cloud data in the tabular form of a
data.frame or data.table and returns a list
with at most three elements. The first element always contains a vector
of crown IDs with one ID for each point (i.e. row) in the input data.
The second and third elements are optional and contain coordinates of
terminal and prior centroids together with crown IDs and (row) indices
of the points they belong to.
This method just calls the core function and binds the returned crown IDs to the input table. It returns this extended table and, if requested, also the terminal centroids or all centroids returned by the core function.
Similar to the data.frame/data.table method
in that it extends the input object with a crown ID attribute and, if
requested, returns the centroids as separate lidR::LAS
objects. The centroid objects are assigned the metadata of the input
object.
For context: The lidR R
package offers a framework for processing point clouds of large
areas, possibly stored in multiple files and referenced by so called LAScatalogs.
LAScatalogs organize point clouds in adjacent chunks which
are processed individually. The chunks each get a buffer area around
them so that edge effects can be accounted for. In the case of
individual tree segmentation, edge effects would be that tree crowns are
cut off at the edge of a chunk when not using a buffer. Parallel
processing of the chunks is also supported.
The segment_tree_crowns method for
lidR::LAScatalogs internally defines a function which
segments a chunk of a LAScatalog. This function is then
applied to all chunks of the LAScatalog provided by the
user. The “chunk function” first passes the chunk to the
lidR::LAS method. Afterwards, it excludes both tree crowns
and unsegmented points in the buffer area. Since crown IDs overlap
across chunks (the IDs in each chunk start at 1), the chunk function
also calculates unique replacements for the crown IDs based on the
apices’ absolute coordinates.
extract_coordinate_values
and collect_scale_n_offset_of_LAScatalog_filesThe back-end is a small C++ library which implements the AMS3D algorithm. The core functionality can be found in:
namespace ams3d: Functionality for calculating a
point’s terminal centroid with the AMS3D algorithm (header), andnamespace spatial: a facade to the Boost
Geometry library which provides e.g. the spatial index used for
finding points inside cylinders (header).There is also some R interface code outside of any namespace called
ams3d_R_interface (header).Note: The namespaces ams3d and
spatial contain internal functions, classes, etc. which
should not be used in other namespaces. These internal components are
indicated by an underscore at the beginning of their name.
This code contains the C++ functions that are callable from R. They
are not contained in any namespace, since this is a requirement of the
Rcpp
package which does the actual exposition to R. The functions are
calculate_centroids_normalized (source) for processing
normalized point clouds,calculate_centroids_terraneous (source) for processing
non-normalized point clouds, andcalculate_centroids_flexible (source) for processing
both normalized and non-normalized while possibly also using rasters for
the crown diameter and crown height to tree height parameters.They are all doing basically the same thing, which is looping over
points they get from R and call the functionality exposed by
ams3d and spatial to calculate terminal
centroids for these points.
In addition, there are also some helper functions in the
namespace ams3d_R_interface_util (source).
The ams3d_R_interface is the only part of the C++ code
which calls R-specific functions (a.o. it manages a progress bar
provided by the R package progress).
By separating the core functionality from the R-specific C++ code it is
possible to use the core functionality with other C++ code when not
using R.
This namespace only exposes two functions (header):
calculate_terminal_centroidcalculate_all_centroidsThey do exactly the same thing, i.e. calculate the centroids of a
point, except that the *_plus_centroids variant also
returns the prior centroids which were calculated during the process.
Both functions are overloaded three times:
crown_diameter_to_tree_height and
crown_height_to_tree_height arguments (source).There is also an internal _Kernel class (source) that models the cylinder used to
find points in the neighborhood of a point or centroid. It also contains
the logic to calculate the weighted centroid of all points inside the
cylinder. A _Kernel object is instantiated with a point or
centroid and the crown-diameter-to-tree-height and
crown-height-to-tree-height parameters. It features only one public
method: calculate_centroid_in( point_cloud ). Cylinder
dimensions are calculated in the constructor and the centroid
calculation logic is implemented in a few private methods.
There is one more very small internal namespace in ams3d
called _math_functions that contains the gaussian and
epanechnikov functions used for weighing the points inside a cylinder
during centroid calculation.
This namespace mainly exposes functionality of and based on the C++
library Boost Geometry
for dealing with point data. Most of this functionality consists of data
types (header) but there are
also some functions and one functor (header and source). Additionally, there is some
raster functionality (header) and a few custom
iterators (header)
for inserting points into the spatial index provided by the Boost
Geometry library (an R*-tree).
The data types are:
coordinate_t for coordinate values,distance_t for distances,point_2d_t and point_3d_t for 2D and 3D
points,index_for_3d_points_t for R*-tree index structures,
andbox_t for 3D boxes.A few simple functions of boost::geometry are directly
forwarded:
distance( geometry1, geometry2 ) returns the distance
between two geometric objects,get_x( point ), get_y( point ), and
get_z( point ) return the respective coordinate value of a
point, andget_xy_point( point ) returns a point’s xy-coordinates
as a 2D point.Exposed functions with own logic are
get_points_intersecting_vertical_cylinder( <cylinder dimensions and an index structure> )
which searches an R*-tree index andweighted_mean_of( points, weights ) which calculates a
weighted average position of a collection of points.In addition there is an internal functor class called
_within_xy_distance_functor whose objects are needed for
queries to the R*-tree index. Functors are function objects,
i.e. objects with a ()-operator, making it possible to use
such an object like a function.
For handling raster data in the C++ back-end, a small set of three raster classes was set up (header):
I_Raster an interface class which defines methods that
every raster object should have.Raster a normal raster class which implements the
methods demanded by I_Raster.Single_value_pseudo_raster also implements the methods
demanded by I_Raster but internally stores just one value.
It can return this value faster than a Raster object could
if it was used for storing just one value.The raster classes were set up like this to make it possible to pass
either a single value or an actual raster to the same function argument.
Without this it would be necessary to code every combination of function
parameters where the ground height and the
crown_diameter_to_tree_height and
crown_height_to_tree_height parameters can be either a
single value or a raster of values. With the I_Raster class
it is possible to pass either Single_value_pseudo_raster or
Raster objects to the same function parameter of type
I_Raster.
The spatial index provided by the Boost Geometry library can be
constructed with a packing algorithm that results in faster queries. In
order to use this feature, the index constructor needs to be passed all
of the points it should use in the form of a begin and end iterator. In
order to skip points with non-finite coordinate values and points below
certain heights, custom iterators for
std::vector< point_3d_t > (header) are used by a
few public functions to set up spatial indices:
create_index_of_finite( points, min_height ) skips
points with non-finite coordinate values and points below
min_heightcreate_index_of_above_ground( points, min_height_above_ground, ground_height_grid )
same as above but does not assume a normalized point cloud and instead
reads ground heights from a rastercreate_index_of_above_ground( points, min_height_above_ground_grid, ground_height_grid )
same as above but reads minimum above-ground heights also from a
rasterThere may be a few syntax constructs in the C++ code which appear unfamiliar to R users. This section gives the rationale for some of these constructs. Most of it is based on information found at the website learncpp.com.
{}There are a few different ways to create and assign a value to objects in C++:
int foo = 0; // copy initializationint foo(0); // direct initializationint foo{0}; // list initializationstd::vector<int> foos{1, 2, 3}; // list initializationAccording to learncpp.com,
list initialization is the preferred option. However, as you can see in
the examples, it works a little bit differently for array-like objects
like e.g. std::vectors or Rcpp::Lists. So if
you want to create a vector with a certain size instead of with some
elements, you need to use direct initialization instead,
e.g. Rcpp::NumericVector foos( <size of the vector> );
for Rcpp vectors.
Instances (i.e. objects) of a class are initialized by constructors. Constructors are basically functions without a return value which internally assign values to the properties of an object. According to learncpp.com, the most direct way to do these assignments is to use member initializer lists. The syntax of these initializer lists looks like this:
some_class (
<arguments to constructor>
):
property_1{ <initial value> },
property_2{ <initial value> } // ...and so on
{
<the actual body of the constructor with possibly more code>
}
# outer loop over the points in the point cloud
for each point in the point cloud:
find_mode_of( point )
# inner loop to find the mode of a single point
find_mode_of( point ):
# this assignment is needed in the loop below
current_centroid = point
# "move" towards the nearest mode by calculating a new centroid at the
# location of the previous centroid and repeating this until the
# centroids converge
do:
former_centroid = current_centroid
current_centroid = calculate_weighted_mean_of (
points_in_neighborhood_of( former_centroid )
)
while( distance_of( former_centroid, current_centroid ) > very_small
AND number_of_iterations < too_many )
# the terminal centroid is returned as approximation to the mode
return( current_centroid )
# the neighborhood of a point is defined by a crown-sized cylinder
points_in_neighborhood_of( point ):
return( points_in( vertical_cylinder_at( point ) ) )
# the cylinder's size is calculated using its above-ground height and the
# four main parameters to the algorithm
vertical_cylinder_at( point ):
cylinder = new cylinder (
height = above_ground_height_of( point ) * cl_2_th + cl_c
diameter = above_ground_height_of( point ) * cd_2_th + cd_c
)
return( upper_three_quarters_of( cylinder ) )
# The numbers "cl_2_th", "cd_2_th", "cl_c" and "cd_c" are parameters to the
# algorithm and stand for "crown length to tree height",
# "crown diameter to tree height", "crown length constant" and
# "crown diameter constant".
points_in( cylinder ):
<use a spatial index for finding the points in cylinder>
calculate_weithed_mean_of( points ):
return( weighted_average_position_of (
points,
weights_of( points )
) )
weights_of( points ):
return( for each point in points:
exp( -5 * relative_horizontal_distance_of_cylinder_center_to( point )^2 )
* ( 1 - relative_vertical_distance_of_cylinder_center_to( point )^2 )
)
# horizontal: gaussian profile
# vertical : epanechnikov profile
Ferraz, A.; Saatchi, S.; Mallet, C. & Meyer, V., (2016) “Lidar detection of individual tree size in tropical forests”, In: Remote Sensing of Environment, 183, 318-333, DOI: 10.1016/j.rse.2016.05.028 Ferraz, A.; Bretar, F.; Jaquemond, S.; Gonçalves, G.; Pereira, L.;, Tomé, M. & Soares, P., “3-D mapping of a multi-layered Mediterranean forest using ALS data”, In: Remote Sensing of Environment, 121, 210-223, DOI: 10.1016/j.rse.2012.01.020