-
Notifications
You must be signed in to change notification settings - Fork 59
/
Copy pathcvt_basis.html
401 lines (366 loc) · 12 KB
/
cvt_basis.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
<html>
<head>
<title>
CVT_BASIS - Data Clustering by K-Means Techniques
</title>
</head>
<body bgcolor="#EEEEEE" link="#CC0000" alink="#FF3300" vlink="#000055">
<h1 align = "center">
CVT_BASIS <br> Data Clustering by K-Means Techniques
</h1>
<hr>
<p>
<b>CVT_BASIS</b>
is a FORTRAN90 program which
computes good cluster centers
for a set of data.
</p>
<p>
The clustering process uses the K-Means algorithm, which can be
considered to be a discrete version of the CVT algorithm (Centroidal
Voronoi Tessellation).
</p>
<p>
The data is a collection of vectors, with each vector stored in
a separate file. The files are presumed to have "sequential" names,
such as "fred01.txt", "fred02.txt", and so on. Each file must be a
TABLE file, that is
a series of N lines, with M values on every line (although
comment lines may be inserted as well.)
</p>
<p>
The program is given the name of the first file in the sequence.
It reads the data from each file in the sequence, and carries out
the K Means clustering process to determine K cluster centers.
It writes each of these cluster centers out to a separate file.
</p>
<p>
The cluster centers will generally be "well spread out" in the space
spanned by the set of data. Such a set might be useful, for instance,
in determining a basis for a low-dimensional approximation of the
data.
</p>
<p>
<b>INPUT</b>: at run time, the user specifies:
<ul>
<li>
<i>uv0_file</i>, the name of the first data file (the program
will assume all the files are numbered consecutively).
Note that you may now specify more than one set of solution families.
Enter "none" if there are no more families, or else the name of the
first file in the next family. Up to 10 separate families of
files are allowed.
</li>
<li>
<i>cluster_lo, cluster_hi</i>, the range of cluster sizes to check.
In most cases, you simply want to specify the <b>same number</b>
for both these values, namely, the requested basis size.
</li>
<li>
<i>cluster_it_max</i>, the number of different times you want to
try to cluster the data; I often use 15.
</li>
<li>
<i>energy_it_max</i>, the number of times you want to try to improve
a given clustering by swapping points from one cluster to another;
I often use 50 or 100.
</li>
<li>
<i>comment</i>, "Y" if initial comments may be included in the
beginning of the output files. These comments always start with
a "#" character in column 1.
</li>
</ul>
</p>
<h3 align = "center">
Licensing:
</h3>
<p>
The computer code and data files described and made available on this web page
are distributed under
<a href = "../../txt/gnu_lgpl.txt">the GNU LGPL license.</a>
</p>
<h3 align = "center">
Related Data and Programs:
</h3>
<p>
<a href = "../../m_src/brain_sensor_pod/brain_sensor_pod.html">
BRAIN_SENSOR_POD</a>,
a MATLAB program which
applies the method of Proper Orthogonal Decomposition
to seek underlying patterns in sets of 40 sensor readings of
brain activity.
</p>
<p>
<a href = "../../datasets/burgers/burgers.html">
BURGERS</a>,
a data set directory which
contains solutions of the 1 dimensional Burgers equation;
</p>
<p>
<a href = "../../datasets/cavity_flow/cavity_flow.html">
CAVITY_FLOW</a>,
a dataset directory which
contains solutions of a driven cavity flow in 2D;
</p>
<p>
<a href = "../../f_src/cvt_basis_flow/cvt_basis_flow.html">
CVT_BASIS_FLOW</a>,
a FORTRAN90 program which
is similar to <b>CVT_BASIS</b>, but is specialized to handle
a particular family of fluid flow solutions.
</p>
<p>
<a href = "../../datasets/inout_flow/inout_flow.html">
INOUT_FLOW</a>,
a dataset directory which
contains solutions for flow in and out of a chamber in 2D;
</p>
<p>
<a href = "../../datasets/inout_flow2/inout_flow2.html">
INOUT_FLOW2</a>,
a dataset directory which
contains solutions for flow in and out of a chamber in 2D,
using a finer grid and more timesteps;
</p>
<p>
<a href = "../../f_src/svd_basis/svd_basis.html">
SVD_BASIS</a>,
a FORTRAN90 program which
uses the singular value decomposition to extract representative
modes from a set of data vectors.
</p>
<p>
<a href = "../../datasets/tcell_flow/tcell_flow.html">
TCELL_FLOW</a>,
a dataset directory which
contains solutions for flow through a T-cell in 2D;
</p>
<h3 align = "center">
Reference:
</h3>
<p>
<ol>
<li>
Franz Aurenhammer,<br>
Voronoi diagrams -
a study of a fundamental geometric data structure,<br>
ACM Computing Surveys,<br>
Volume 23, Number 3, pages 345-405, September 1991.
</li>
<li>
John Burkardt, Max Gunzburger, Hyung-Chun Lee,<br>
Centroidal Voronoi Tessellation-Based Reduced-Order
Modelling of Complex Systems,<br>
SIAM Journal on Scientific Computing,<br>
Volume 28, Number 2, 2006, pages 459-484.
</li>
<li>
John Burkardt, Max Gunzburger, Janet Peterson, Rebecca Brannon,<br>
User Manual and Supporting Information for Library of Codes
for Centroidal Voronoi Placement and Associated Zeroth,
First, and Second Moment Determination,<br>
Sandia National Laboratories Technical Report SAND2002-0099,<br>
February 2002.
</li>
<li>
Qiang Du, Vance Faber, Max Gunzburger,<br>
Centroidal Voronoi Tessellations: Applications and Algorithms,<br>
SIAM Review, Volume 41, 1999, pages 637-676.
</li>
<li>
Lili Ju, Qiang Du, Max Gunzburger,<br>
Probabilistic methods for centroidal Voronoi tessellations
and their parallel implementations,<br>
Parallel Computing,<br>
Volume 28, 2002, pages 1477-1500.
</li>
<li>
Wendy Martinez, Angel Martinez,<br>
Computational Statistics Handbook with MATLAB,<br>
Chapman and Hall / CRC, 2002.
</li>
</ol>
</p>
<h3 align = "center">
Source Code:
</h3>
<p>
<ul>
<li>
<a href = "cvt_basis.f90">cvt_basis.f90</a>, the source code.
</li>
<li>
<a href = "cvt_basis.sh">cvt_basis.sh</a>,
commands to compile and load the source code.
</li>
</ul>
</p>
<h3 align = "center">
Examples and Tests:
</h3>
<p>
<ul>
<li>
<a href = "run_01/run_01.html">run 01</a>, example seeking 2 clusters;
</li>
<li>
<a href = "run_02/run_02.html">run 02</a>, example seeking 4 clusters;
</li>
<li>
<a href = "run_03/run_03.html">run 03</a>, example seeking 8 clusters;
</li>
<li>
<a href = "run_04/run_04.html">run 04</a>, compute clusterings
of sizes 1 through 16, determine energies, and output size
versus energy data;
</li>
</ul>
</p>
<h3 align = "center">
List of Routines:
</h3>
<p>
<ul>
<li>
<b>MAIN</b> is the main routine for the CVT_BASIS program.
</li>
<li>
<b>ANALYSIS_RAW</b> computes the energy for a range of number of clusters.
</li>
<li>
<b>CH_CAP</b> capitalizes a single character.
</li>
<li>
<b>CH_EQI</b> is a case insensitive comparison of two characters for equality.
</li>
<li>
<b>CH_IS_DIGIT</b> returns .TRUE. if a character is a decimal digit.
</li>
<li>
<b>CH_TO_DIGIT</b> returns the integer value of a base 10 digit.
</li>
<li>
<b>CLUSTER_CENSUS</b> computes and prints the population of each cluster.
</li>
<li>
<b>CLUSTER_INITIALIZE_RAW</b> initializes the cluster centers to random values.
</li>
<li>
<b>CLUSTER_LIST</b> prints out the assignments.
</li>
<li>
<b>DATA_TO_GNUPLOT</b> writes data to a file suitable for processing by GNUPLOT.
</li>
<li>
<b>DIGIT_INC</b> increments a decimal digit.
</li>
<li>
<b>DIGIT_TO_CH</b> returns the character representation of a decimal digit.
</li>
<li>
<b>ENERGY_RAW</b> computes the total energy of a given clustering.
</li>
<li>
<b>FILE_COLUMN_COUNT</b> counts the number of columns in the first line of a file.
</li>
<li>
<b>FILE_EXIST</b> reports whether a file exists.
</li>
<li>
<b>FILE_NAME_INC</b> generates the next filename in a series.
</li>
<li>
<b>FILE_ROW_COUNT</b> counts the number of row records in a file.
</li>
<li>
<b>GET_UNIT</b> returns a free FORTRAN unit number.
</li>
<li>
<b>HMEANS_RAW</b> seeks the minimal energy of a cluster of a given size.
</li>
<li>
<b>I4_INPUT</b> prints a prompt string and reads an integer from the user.
</li>
<li>
<b>I4_RANGE_INPUT</b> reads a pair of integers from the user, representing a range.
</li>
<li>
<b>I4_UNIFORM</b> returns a scaled pseudorandom I4.
</li>
<li>
<b>I4VEC_PRINT</b> prints an integer vector.
</li>
<li>
<b>KMEANS_RAW</b> tries to improve a partition of points.
</li>
<li>
<b>NEAREST_CLUSTER_RAW</b> finds the cluster nearest to a data point.
</li>
<li>
<b>R8_UNIFORM_01</b> returns a unit pseudorandom R8.
</li>
<li>
<b>R8MAT_DATA_READ</b> reads data from an R8MAT file.
</li>
<li>
<b>R8MAT_HEADER_READ</b> reads the header from an R8MAT file.
</li>
<li>
<b>R8MAT_WRITE</b> writes an R8MAT file.
</li>
<li>
<b>R8VEC_NORM2</b> returns the 2-norm of a vector.
</li>
<li>
<b>R8VEC_RANGE_INPUT</b> reads two DP vectors from the user, representing a range.
</li>
<li>
<b>R8VEC_UNIT_EUCLIDEAN</b> normalizes a N-vector in the Euclidean norm.
</li>
<li>
<b>RANDOM_INITIALIZE</b> initializes the FORTRAN 90 random number seed.
</li>
<li>
<b>S_BLANK_DELETE</b> removes blanks from a string, left justifying the remainder.
</li>
<li>
<b>S_EQI</b> is a case insensitive comparison of two strings for equality.
</li>
<li>
<b>S_INPUT</b> prints a prompt string and reads a string from the user.
</li>
<li>
<b>S_REP_CH</b> replaces all occurrences of one character by another.
</li>
<li>
<b>S_TO_R8</b> reads an R8 from a string.
</li>
<li>
<b>S_TO_R8VEC</b> reads an R8VEC from a string.
</li>
<li>
<b>S_TO_I4</b> reads an I4 from a string.
</li>
<li>
<b>S_TO_I4VEC</b> reads an I4VEC from a string.
</li>
<li>
<b>S_WORD_COUNT</b> counts the number of "words" in a string.
</li>
<li>
<b>TIMESTAMP</b> prints the current YMDHMS date as a time stamp.
</li>
</ul>
</p>
<p>
You can go up one level to <a href = "../f_src.html">
the FORTRAN90 source codes</a>.
</p>
<hr>
<i>
Last revised on 27 November 2012.
</i>
<!-- John Burkardt -->
</body>
</html>