-
Notifications
You must be signed in to change notification settings - Fork 59
/
Copy pathpoint_merge.html
297 lines (257 loc) · 8.87 KB
/
point_merge.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
<html>
<head>
<title>
POINT_MERGE - Count or Index Unique or Tolerably Unique Points
</title>
</head>
<body bgcolor="#EEEEEE" link="#CC0000" alink="#FF3300" vlink="#000055">
<h1 align = "center">
POINT_MERGE <br> Count or Index Unique or Tolerably Unique Points
</h1>
<hr>
<p>
<b>POINT_MERGE</b>
is a FORTRAN90 library which
deals with the problem of counting or indexing the unique or
"tolerably unique" points in a collection of N points in
M dimensional space.
</p>
<p>
This problem is distinct from, though similar to, problems such
as finding the nearest neighbor, or counting all the points that
lie within a given distance of each point, or finding the optimal
assignment of N points into K clusters (the K-Means problem).
</p>
<p>
The "tolerably unique" problem is the "Starbucks problem", that is,
the task of choosing a list of Starbucks cafes to shut down, so that
there is no Starbucks cafe across the street from another one.
The Starbucks cafes that remain open are "tolerably unique", that
is, there is now no other open cafe within the given tolerance.
</p>
<p>
Given sets of data with some points very close to each other,
there are a number of ways of resolving the data. Here, a simpleminded
approach is taken, in which we start with one tolerably unique point,
and consider the remaining points one at a time, accepting the next
point as long as it is not closer than the tolerance to some already
accepted point.
</p>
<p>
This is a simpler approach than trying to maximize the number of points
you can have in the set, while satisfying the tolerance, or of trying
to replace two nearby points by their average, for instance.
</p>
<p>
For the unique case, in 1D, a simple and efficient procedure sorts
the data, and then compares consecutive entries.
For the unique case in multiple dimensions, the sorting procedure
can still be used.
</p>
<p>
For the "tolerably unique" case in 1D, the same sorting procedure
can be used, but in multiple dimensions, the usual kinds of lexicographic
sorting will interleave near and far points in a way that is
hard to deal with.
</p>
<p>
A reliable method for the tolerably unique case in multiple dimensions
is simply to compute the distance between every pair of points.
However, this is an O(N^2) computation, and becomes terribly unsuitable
when the number of points considered is in the tens of thousands or more.
</p>
<p>
The "radial" approach, implemented in <b>POINT_RADIAL_TOL_UNIQUE_COUNT</b>,
picks a random base point Z, computes the radial distance R(I) of each point
P(I) to Z, and then sorts the data by R. It then counts tolerably unique
items by inspecting the R array in order. Two points are possible
neighbors only if they lie within a TOL interval in R. Assuming the
points are in general position, the number of points that need to be
compared will be small enough that this algorithm is essentially O(N)
rather than O(N^2).
</p>
<p>
In MATLAB, the <b>unique</b> command can select the unique points;
there is also a user-written function called <b>consolidator</b>
that can merge points with a tolerance.
</p>
<h3 align = "center">
Licensing:
</h3>
<p>
The computer code and data files described and made available on this web page
are distributed under
<a href = "../../txt/gnu_lgpl.txt">the GNU LGPL license.</a>
</p>
<h3 align = "center">
Languages:
</h3>
<p>
<b>POINT_MERGE</b> is available in
<a href = "../../c_src/point_merge/point_merge.html">a C version</a> and
<a href = "../../cpp_src/point_merge/point_merge.html">a C++ version</a> and
<a href = "../../f77_src/point_merge/point_merge.html">a FORTRAN77 version</a> and
<a href = "../../f_src/point_merge/point_merge.html">a FORTRAN90 version</a> and
<a href = "../../m_src/point_merge/point_merge.html">a MATLAB version</a>.
</p>
<h3 align = "center">
Related Data and Programs:
</h3>
<p>
<a href = "../../cpp_src/ann/ann.html">
ANN</a>,
a C++ library which
computes Approximate Nearest Neighbors,
by David Mount, Sunil Arya;
</p>
<p>
<a href = "../../cpp_src/ann_test/ann_test.html">
ANN_TEST</a>,
a C++ program which
uses ann to approximate the nearest
neighbors of a set of points stored in a file;
</p>
<p>
<a href = "../../datasets/cities/cities.html">
CITIES</a>,
a dataset directory which
contains sets of information about cities and the distances between them;
</p>
<p>
<a href = "../../f_src/cities/cities.html">
CITIES</a>,
a FORTRAN90 library which
handles various problems associated with a set of "cities" on a map.
</p>
<p>
<a href = "../../f_src/kmeans/kmeans.html">
KMEANS</a>,
a FORTRAN90 library which
contains several different algorithms for the K-Means problem.
</p>
<p>
<a href = "../../f_src/spaeth/spaeth.html">
SPAETH</a>,
a FORTRAN90 library which
can cluster data according to various principles.
</p>
<p>
<a href = "../../f_src/spaeth2/spaeth2.html">
SPAETH2</a>,
a FORTRAN90 library which
can cluster data according to various principles.
</p>
<p>
<a href = "../../f_src/table_merge/table_merge.html">
TABLE_MERGE</a>,
a FORTRAN90 program which
reads a file of N points in M dimensions, removes duplicates or points
that are closer than some tolerance, and writes the reduced set of points
to a file.
</p>
<h3 align = "center">
Source Code:
</h3>
<p>
<ul>
<li>
<a href = "point_merge.f90">point_merge.f90</a>, the source code.
</li>
<li>
<a href = "point_merge.sh">point_merge.sh</a>,
commands to compile the source code.
</li>
</ul>
</p>
<h3 align = "center">
Examples and Tests:
</h3>
<p>
<ul>
<li>
<a href = "point_merge_prb.f90">point_merge_prb.f90</a>,
a sample calling program.
</li>
<li>
<a href = "point_merge_prb.sh">point_merge_prb.sh</a>,
commands to compile and run the sample program.
</li>
<li>
<a href = "point_merge_prb_output.txt">point_merge_prb_output.txt</a>,
the output file.
</li>
</ul>
</p>
<h3 align = "center">
List of Routines:
</h3>
<p>
<ul>
<li>
<b>I4_UNIFORM</b> returns a scaled pseudorandom I4.
</li>
<li>
<b>POINT_UNIQUE_COUNT</b> counts the number of unique points.
</li>
<li>
<b>POINT_RADIAL_UNIQUE_COUNT</b> counts the number of unique points.
</li>
<li>
<b>POINT_RADIAL_TOL_UNIQUE_COUNT</b> counts the number of tolerably unique points.
</li>
<li>
<b>POINT_RADIAL_TOL_UNIQUE_INDEX</b> indexes the tolerably unique points.
</li>
<li>
<b>POINT_TOL_UNIQUE_COUNT</b> counts the number of tolerably unique points.
</li>
<li>
<b>POINT_TOL_UNIQUE_COUNT</b> counts the number of tolerably unique points.
</li>
<li>
<b>POINT_TOL_UNIQUE_INDEX</b> indexes the tolerably unique points.
</li>
<li>
<b>R8COL_DUPLICATES</b> generates an R8COL with some duplicate columns.
</li>
<li>
<b>R8COL_SORT_HEAP_INDEX_A</b> does an indexed heap ascending sort of an R8COL.
</li>
<li>
<b>R8MAT_TRANSPOSE_PRINT</b> prints an R8MAT, transposed.
</li>
<li>
<b>R8MAT_TRANSPOSE_PRINT_SOME</b> prints some of an R8MAT, transposed.
</li>
<li>
<b>R8MAT_UNIFORM_01</b> returns a unit pseudorandom R8MAT.
</li>
<li>
<b>R8VEC_COMPARE</b> compares two R8VEC's.
</li>
<li>
<b>R8VEC_PRINT</b> prints an R8VEC.
</li>
<li>
<b>R8VEC_SORT_HEAP_INDEX_A</b> does an indexed heap ascending sort of an R8VEC.
</li>
<li>
<b>R8VEC_UNIFORM_01</b> returns a unit pseudorandom R8VEC.
</li>
<li>
<b>TIMESTAMP</b> prints the current YMDHMS date as a time stamp.
</li>
</ul>
</p>
<p>
You can go up one level to <a href = "../f_src.html">
the FORTRAN90 source codes</a>.
</p>
<hr>
<i>
Last revised on 23 July 2010.
</i>
<!-- John Burkardt -->
</body>
<!-- Initial HTML skeleton created by HTMLINDEX. -->
</html>