Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement @turf/clusters-dbscan module #812

Merged
merged 33 commits into from
Jul 14, 2017
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
d0d8970
Implement `@turf/clusters-distance` module
DenisCarriere Jun 21, 2017
24cf66d
Update yarn lock
DenisCarriere Jun 21, 2017
8669d30
Update debug file
DenisCarriere Jun 21, 2017
e5fb827
simplified calculation (run almost x2 faster);
stebogit Jul 7, 2017
17ca549
Convert index.js to ES5
DenisCarriere Jul 12, 2017
ed0e48a
Publish new clusters-distance approach
DenisCarriere Jul 12, 2017
88d3953
Add minPoints to tests param
DenisCarriere Jul 12, 2017
cf53d66
Update Typescript tests
DenisCarriere Jul 12, 2017
4636f3a
Merge branch 'master' into clusters-distance
DenisCarriere Jul 12, 2017
0893af2
added units parameter; added parameters validation and throw tests
stebogit Jul 13, 2017
359815a
Suggested DBSCAN implementation for `@turf/clusters-distance` (#840)
stebogit Jul 14, 2017
a14fa69
Update Typescript Defintion (3 outputs)
DenisCarriere Jul 14, 2017
0a60b90
Merge branch 'master' into clusters-distance
DenisCarriere Jul 14, 2017
144ef79
Merge branch 'clusters-distance' of /~https://github.com/Turfjs/turf in…
stebogit Jul 14, 2017
ef76c37
Add geokdbush as reference to repo
DenisCarriere Jul 14, 2017
7bee88f
Single line JSDocs param
DenisCarriere Jul 14, 2017
3eb3d66
Fix tests (results.points)
DenisCarriere Jul 14, 2017
07dea46
Make both index + index.geokdbush work
DenisCarriere Jul 14, 2017
62e4e91
Place Geokdbush to DevDependencies
DenisCarriere Jul 14, 2017
63275fc
Fix noise issue
DenisCarriere Jul 14, 2017
82485d8
Prevent input mutation & add edges
DenisCarriere Jul 14, 2017
8a45889
Major changes
DenisCarriere Jul 14, 2017
c2be7be
Create a set of clusters to colorize
DenisCarriere Jul 14, 2017
01e92de
Define edges with cross
DenisCarriere Jul 14, 2017
01bc50b
Add CentroidFromProperty to tests
DenisCarriere Jul 14, 2017
d3a3166
Updates based on @stebogit comments
DenisCarriere Jul 14, 2017
28436e7
Update Readme
DenisCarriere Jul 14, 2017
d926702
Update benchmark results & drop geokdbush
DenisCarriere Jul 14, 2017
7fe4acf
Add noisePoint.properties fallback incase no props
DenisCarriere Jul 14, 2017
67e9071
Added Array of Features handling
DenisCarriere Jul 14, 2017
e7fb4a8
Update library to clusters-dbscan
DenisCarriere Jul 14, 2017
9bdd634
Rename folder to clusters-dbscan
DenisCarriere Jul 14, 2017
af57608
Update readme to clusters-dbscan
DenisCarriere Jul 14, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions packages/turf-clusters-distance/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
The MIT License (MIT)

Copyright (c) 2017 TurfJS

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
51 changes: 51 additions & 0 deletions packages/turf-clusters-distance/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# @turf/clusters-distance

# clustersDistance

Takes a set of [points](http://geojson.org/geojson-spec.html#point) and partition them into clusters.

**Parameters**

- `points` **[FeatureCollection](http://geojson.org/geojson-spec.html#feature-collection-objects)<[Point](http://geojson.org/geojson-spec.html#point)>** to be clustered
- `maxDistance` **[number](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number)** Maximum Distance to generate the clusters (kilometers only)

**Examples**

```javascript
// create random points with random z-values in their properties
var points = turf.random('point', 100, {
bbox: [0, 30, 20, 50]
});
var distance = 100;
var clustered = turf.clustersDistance(points, distance);

//addToMap
var addToMap = featureCollection(clustered.points);
```

Returns **[FeatureCollection](http://geojson.org/geojson-spec.html#feature-collection-objects)<[Point](http://geojson.org/geojson-spec.html#point)>** clustered points

<!-- This file is automatically generated. Please don't edit it directly:
if you find an error, edit the source file (likely index.js), and re-run
./scripts/generate-readmes in the turf project. -->

---

This module is part of the [Turfjs project](http://turfjs.org/), an open source
module collection dedicated to geographic algorithms. It is maintained in the
[Turfjs/turf](/~https://github.com/Turfjs/turf) repository, where you can create
PRs and issues.

### Installation

Install this module individually:

```sh
$ npm install @turf/clusters-distance
```

Or install the Turf module that includes it as a function:

```sh
$ npm install @turf/turf
```
60 changes: 60 additions & 0 deletions packages/turf-clusters-distance/bench.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
const fs = require('fs');
const path = require('path');
const load = require('load-json-file');
const Benchmark = require('benchmark');
const clusters = require('./');

// Define Fixtures
const directory = path.join(__dirname, 'test', 'in') + path.sep;
const fixtures = fs.readdirSync(directory).map(filename => {
return {
filename,
name: path.parse(filename).name,
geojson: load.sync(directory + filename)
};
});


/**
* Benchmark Results
*
* // Clusters distance (dbscan)
* fiji: 1.875ms
* many-points: 57.541ms
* noise: 0.937ms
* points-with-properties: 0.087ms
* points1: 0.495ms
* points2: 0.380ms
* fiji x 104,254 ops/sec ±2.31% (77 runs sampled)
* many-points x 20.61 ops/sec ±6.17% (39 runs sampled)
* noise x 7,929 ops/sec ±1.86% (80 runs sampled)
* points-with-properties x 97,864 ops/sec ±1.68% (81 runs sampled)
* points1 x 9,350 ops/sec ±1.71% (78 runs sampled)
* points2 x 4,396 ops/sec ±1.94% (80 runs sampled)
*
* // Clusters kmeans
* fiji: 3.236ms
* many-points: 32.563ms
* points-with-properties: 0.123ms
* points1: 0.569ms
* points2: 0.119ms
* fiji x 112,975 ops/sec ±7.64% (70 runs sampled)
* many-points x 129 ops/sec ±20.10% (62 runs sampled)
* points-with-properties x 151,784 ops/sec ±4.47% (80 runs sampled)
* points1 x 44,736 ops/sec ±5.12% (77 runs sampled)
* points2 x 26,771 ops/sec ±4.22% (83 runs sampled)
*/
const suite = new Benchmark.Suite('turf-clusters');
for (const {name, geojson} of fixtures) {
let {distance} = geojson.properties || {};
distance = distance || 100;

console.time(name);
clusters(geojson, distance);
console.timeEnd(name);
suite.add(name, () => clusters(geojson, distance));
}
suite
.on('cycle', e => console.log(String(e.target)))
.on('complete', () => {})
.run();
17 changes: 17 additions & 0 deletions packages/turf-clusters-distance/index.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
/// <reference types="geojson" />

import {Units, Points} from '@turf/helpers';

interface Output {
edges: Points;
points: Points;
noise: Points;
centroids: Points;
}

/**
* http://turfjs.org/docs/#clusterdistance
*/
declare function clustersDistance(points: Points, maxDistance: number, units?: Units, minPoints?: number): Output;
declare namespace clustersDistance { }
export = clustersDistance;
236 changes: 236 additions & 0 deletions packages/turf-clusters-distance/index.geokdbush.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
var Set = require('es6-set');
var Map = require('es6-map');
var kdbush = require('kdbush');
var geokdbush = require('geokdbush');
var collectionOf = require('@turf/invariant').collectionOf;
var helpers = require('@turf/helpers');
var featureCollection = helpers.featureCollection;
var convertDistance = helpers.convertDistance;

/**
* Takes a set of {@link Point|points} and partition them into clusters.
*
* @name clustersDistance
* @param {FeatureCollection<Point>} points to be clustered
* @param {number} maxDistance Maximum Distance to generate the clusters (kilometers only)
* @param {string} [units=kilometers] in which `maxDistance` is expressed, can be degrees, radians, miles, or kilometers
* @param {number} [minPoints=1] Minimum number of points to generate a single cluster, points will be excluded if the cluster does not meet the minimum amounts of points.
* @returns {FeatureCollection<Point>} clustered points
* @example
* // create random points with random z-values in their properties
* var points = turf.random('point', 100, {
* bbox: [0, 30, 20, 50]
* });
* var distance = 100;
* var clustered = turf.clustersDistance(points, distance);
*
* //addToMap
* var addToMap = featureCollection(clustered.points);
*/
module.exports = function (points, maxDistance, units, minPoints) {
// Input validation
collectionOf(points, 'Point', 'Input must contain Points');
if (maxDistance === null || maxDistance === undefined) throw new Error('maxDistance is required');
if (!(Math.sign(maxDistance) > 0)) throw new Error('Invalid maxDistance');
if (!(minPoints === undefined || minPoints === null || Math.sign(minPoints) > 0)) throw new Error('Invalid minPoints');

// Default values
minPoints = minPoints || 1;
var maxDistanceKm = convertDistance(maxDistance, units);

// Generate IDs - 11,558,342 ops/sec
points = generateUniqueIds(points);

// Create KDBush Tree - 3,305,155 ops/sec
var tree = kdbush(points.features, getX, getY);

// Create Clusters - 13,041 ops/sec
var clusters = createClusters(tree, maxDistanceKm);

// Join Clusters - 4,435 ops/sec
var joined = joinClusters(clusters);

// Remove Clusters based on minPoints
var removed = removeClusters(joined, minPoints);

// Clusters To Features -
var features = clustersToFeatures(removed, points, minPoints);

return {
points: features,
centroids: featureCollection([]),
noise: featureCollection([])
};
};

function getX(p) {
return p.geometry.coordinates[0];
}

function getY(p) {
return p.geometry.coordinates[1];
}

/**
* Create Clusters - Set of indexes
*
* @param {KDBush} tree KDBush Tree
* @param {number} maxDistance Maximum Distance (in kilometers)
* @returns {Map<number, Set<number>>} Map<clusterId, cluster> A Map which contains a Set of Feature ids which are 'around' by maxDistance
* @example
* createClusters(tree, maxDistance)
* //= Map {
* 0 => Set { 0, 2, 1, 5, 4, 3 },
* 1 => Set { 1, 2, 0, 5, 4, 3 },
* ...
* 25 => Set { 25 },
* 26 => Set { 26, 23, 21, 24, 22, 11, 8, 7, 10, 6, 9, 13 }
* }
*/
function createClusters(tree, maxDistance) {
var clusters = new Map();
var clusterId = 0;
tree.ids.forEach(function (id) {
// Cluster contains a Set of Feature IDs
var cluster = new Set();
var feature = tree.points[id];

// Find points around Max Distance
var around = geokdbush.around(tree, getX(feature), getY(feature), Infinity, maxDistance);
around.forEach(function (feature) {
cluster.add(feature.id);
});
clusters.set(clusterId, cluster);
clusterId++;
});
return clusters;
}

/**
* Joins clusters together
*
* @param {Map<number, Set<number>>} clusters Created Clusters
* @returns {Map<number, Set<number>>} Map<clusterId, cluster> joined clusters
* joinClusters(clusters)
* //= Map {
* 0 => Set { 0, 2, 1, 5, 4, 3 },
* 1 => Set { 6, 10, 13, 8, 12, 9, 11, 7, 22, 24, 21, 23, 26 },
* 2 => Set { 14, 20, 18, 19, 16, 15, 17 },
* 3 => Set { 25 }
* }
*/
function joinClusters(clusters) {
var totalClusters = clusters.size;
var newClusterId = 0;
var newClusters = new Map();

// Iterate over cluster and join clusters together
clusters.forEach(function (clusterOuter, clusterOuterId) {
clusters.forEach(function (clusterInner, clusterInnerId) {
if (!clusters.has(clusterOuterId) || !clusters.has(clusterInnerId)) return;
if (clusterOuterId === clusterInnerId) return;
if (setContains(clusterOuter, clusterInner)) {
newClusters.set(newClusterId, setJoin(clusterOuter, clusterInner));
clusters.delete(clusterOuterId);
clusters.delete(clusterInnerId);
newClusterId++;
}
});
});
// Add remaining clusters which did not need to be merged
clusters.forEach(function (cluster) {
newClusters.set(newClusterId, cluster);
newClusterId++;
});

// Restart Join operation if cluster size changes
// Happens when multiple small clusters are joined by narrow edges
if (newClusters.size < totalClusters) return joinClusters(newClusters);
else return newClusters;
}

/**
* Set Contains
*
* @param {Set<number>} set1 Set
* @param {Set<number>} set2 Set
* @returns {boolean} (true) if Set1 contains a number in Set2
*/
function setContains(set1, set2) {
var boolean = false;
set1.forEach(function (value) {
if (set2.has(value)) boolean = true;
});
return boolean;
}

/**
* Set Join
*
* @param {Set<number>} set1 Set
* @param {Set<number>} set2 Set
* @returns {Set<number>} Joins two Sets together
*/
function setJoin(set1, set2) {
var join = new Set();
set1.forEach(function (value) {
join.add(value);
});
set2.forEach(function (value) {
join.add(value);
});
return join;
}

/**
* Generates new Unique IDs for all features inside FeatureCollection
* 2,790,204 ops/sec ±1.40% (89 runs sampled)
*
* @param {FeatureCollection<any>} geojson GeoJSON FeatureCollection
* @returns {FeatureCollection<any>} mutated GeoJSON FeatureCollection
*/
function generateUniqueIds(geojson) {
for (var i = 0; i < geojson.features.length; i++) {
geojson.features[i].id = i;
}
return geojson;
}

/**
* Remove Clusters based on Minimum Points allowed
*
* @param {Map<number, Set<number>>} clusters Clusters
* @param {number} minPoints Minimum Points
* @returns {Map<number, Set<number>>} removed clusters
*/
function removeClusters(clusters, minPoints) {
var clusterId = 0;
var newClusters = new Map();
clusters.forEach(function (cluster) {
if (cluster.size >= minPoints) {
newClusters.set(clusterId, cluster);
clusterId++;
}
});
return newClusters;
}

/**
* Clusters to Features
*
* @param {Map<number, Set<number>>} clusters Clusters
* @param {FeatureCollection<Point>} points Points
* @returns {GeoJSON.FeatureCollection<GeoJSON.Point>} FeatureCollection of Points with 'cluster' added to properties
*/
function clustersToFeatures(clusters, points) {
var features = [];
clusters.forEach(function (cluster, clusterId) {
cluster.forEach(function (id) {
var feature = points.features[id];
if (feature.properties) feature.properties.cluster = clusterId;
else feature.properties = {cluster: clusterId};
features.push(feature);
});
});
return featureCollection(features);
}
Loading