forked from pycassa/pycassa
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGES
561 lines (432 loc) · 23.9 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
Changes in Version 1.6.0
This release adds a few minor features and several important bug fixes.
The most important change to take note of if you are using composite
comparators is the change to the default inclusive/exclusive behavior for slice
ends.
Other than that, this should be a smooth upgrade from 1.5.x.
Features
* New script for easily building RPM packages
* Add request and parameter information to PoolListener callback
* Add ColumnFamily.xget(), a generator version of get() that automatically
pages over columns in reasonably sized chunks
* Add support for Int32Type, a 4-byte signed integer format
* Add constants for the highest and lowest possible TimeUUID values to
pycassa.util
Bug Fixes
* Various 2.4 syntax errors
* Raise AllServersUnavailable if server_list is empty
* Handle custom types inside of composites
* Don’t erase comment when updating column families
* Match Cassandra’s sorting of TimeUUIDType values when the timestamps
tie
* This could result in some columns being erroneously left off of the end
of column slices when datetime objects or timestamps were used for
column_start or column_finish.
* Use gevent’s queue in place of the stdlib version when gevent
monkeypatching has been applied.
* Avoid sub-microsecond loss of precision with TimeUUID timestamps when
using pycassa.util.convert_time_to_uuid()
* Make default slice ends inclusive when using CompositeType comparator
* Previously, the end of the slice was exclusive by default (as was the
start of the slice when column_reversed was True)
Changes in Version 1.5.1
This release only affects those of you using DateType data, which has been
supported since pycassa 1.2.0. If you are using DateType, it is very
important that you read this closely.
DateType data is internally stored as an 8 byte integer timestamp. Since
version 1.2.0 of pycassa, the timestamp stored has counted the number of
microseconds since the unix epoch. The actual format that Cassandra
standardizes on is milliseconds since the epoch.
If you are only using pycassa, you probably won’t have noticed any problems
with this. However, if you try to use cassandra-cli, sstable2json, Hector,
or any other client that supports DateType, DateType data written by pycassa
will appear to be far in the future. Similarly, DateType data written by
other clients will appear to be in the past when loaded by pycassa.
This release changes the default DateType behavior to comply with the
standard, millisecond-based format. If you use DateType, and you upgrade to
this release without making any modifications, you will have problems.
Unfortunately, this is a bit of a tricky situation to resolve, but the
appropriate actions to take are detailed below.
To temporarily continue using the old behavior, a new class has been
created: pycassa.types.OldPycassaDateType. This will read and write DateType
data exactly the same as pycassa 1.2.0 to 1.5.0 did.
If you want to convert your data to the new format, the other new class,
pycassa.types.IntermediateDateType, may be useful. It can read either the
new or old format correctly (unless you have used dates close to 1970 with
the new format) and will write only the new format. The best case for using
this is if you have DateType validated columns that don’t have a secondary
index on them.
To tell pycassa to use OldPycassaDateType or IntermediateDateType, use the
ColumnFamily attributes that control types: column_name_class,
key_validation_class, column_validators, and so on. Here’s an example:
from pycassa.types import OldPycassaDateType, IntermediateDateType
from pycassa.column_family import ColumnFamily
from pycassa.pool import ConnectionPool
pool = ConnectionPool('MyKeyspace', ['192.168.1.1'])
# Our tweet timeline has a comparator_type of DateType
tweet_timeline_cf = ColumnFamily(pool, 'tweets')
tweet_timeline_cf.column_name_class = OldPycassaDateType()
# Our tweet timeline has a comparator_type of DateType
users_cf = ColumnFamily(pool, 'users')
users_cf.column_validators['join_date'] = IntermediateDateType()
If you’re using DateType for the key_validation_class, column names, column
values with a secondary index on them, or are using the DateType validated
column as a non-indexed part of an index clause with get_indexed_slices()
(eg. “where state = ‘TX’ and join_date > 2012”), you need to be more careful
about the conversion process, and IntermediateDateType probably isn’t a good
choice.
In most of cases, if you want to switch to the new date format, a manual
migration script to convert all existing DateType data to the new format
will be needed. In particular, if you convert keys, column names, or indexed
columns on a live data set, be very careful how you go about it. If you need
any assistance or suggestions at all with migrating your data, please feel
free to send an email to [email protected]; I would be glad to help.
Changes in Version 1.5.0
The main change to be aware of for this release is the new no-retry behavior
for counter operations. If you have been maintaining a separate connection
pool with retries disabled for usage with counters, you may discontinue that
practice after upgrading.
Features
By default, counter operations will not be retried automatically. This
makes it easier to use a single connection pool without worrying about
overcounting.
Bug Fixes
* Don’t remove entire row when an empty list is supplied for the columns
parameter of remove() or the batch remove methods.
* Add python-setuptools to debian build dependencies
* Batch remove() was not removing subcolumns when the specified supercolumn
was 0 or other “falsey” values
* Don’t request an extra row when reading fewer than buffer_size rows with
get_range() or get_indexed_slices().
* Remove pool_type from logs, which showed up as None in recent versions
* Logs were erroneously showing the same server for retries of failed
operations even when the actual server being queried had changed
Changes in Version 1.4.0
This release is primarily a bugfix release with a couple
of minor features and removed deprecated items.
Features
* Accept column_validation_classes when creating or altering
column families with SystemManager
* Ignore UNREACHABLE nodes when waiting for schema version
agreement
Bug Fixes
* Remove accidental print statement in SystemManager
* Raise TypeError when unexpected types are used for
comparator or validator types when creating or altering
a Column Family
* Fix packing of column values using column-specific validators
during batch inserts when the column name is changed by packing
* Always return timestamps from inserts
* Fix NameError when timestamps are used where a DateType is
expected
* Fix NameError in python 2.4 when unpacking DateType objects
* Handle reading composites with trailing components
missing
* Upgrade ez_setup.py to fix broken setuptools link
Removed Deprecated Items
The following items have been removed:
* pycassa.connect()
* pycassa.connect_thread_local()
* ConnectionPool.status()
* ConnectionPool.recreate()
Changes in Version 1.3.0
This release adds full compatibility with Cassandra 1.0 and removes support
for schema manipulation in Cassandra 0.7.
In this release, schema manipulation should work with Cassandra 0.8 and 1.0,
but not 0.7. The data API should continue to work with all three versions.
Bug Fixes
* Don’t ignore columns parameter in ColumnFamilyMap.insert()
* Handle empty instance fields in ColumnFamilyMap.insert()
* Use the same default for timeout in pycassa.connect() as ConnectionPool
uses
* Fix typo which caused a different exception to be thrown when an
AllServersUnavailable exception was raised
* IPython 0.11 compatibility in pycassaShell
* Correct dependency declaration in setup.py
* Add UUIDType to supported types
Features
The filter_empty parameter was added to get_range() with a default of True;
this allows empty rows to be kept if desired
Deprecated
pycassa.connect()
pycassa.connect_thread_local()
Changes in Version 1.2.1
This is strictly a bug-fix release addressing a few issues created in 1.2.0.
Bug Fixes
* Correctly check for Counters in ColumnFamily when setting
default_validation_class
* Pass kwargs in ColumnFamilyMap to ColumnFamily
* Avoid potential UnboundLocal in ConnectionPool.execute() when get() fails
* Fix ez_setup dependency/bundling so that package installations using
easy_install or pip don’t fail without ez_setup installed
Changes in Version 1.2.0
This should be a fairly smooth upgrade from pycassa 1.1. The
primary changes that may introduce minor incompatibilities are
the changes to ColumnFamilyMap and the automatic skipping of
"ghost ranges" in .ColumnFamily.get_range().
Features
* Add ConnectionPool.fill()
* Add FloatType, DoubleType, DateType, and BooleanType support.
* Add CompositeType support for static composites. See "Composite Types"
for more details.
* Add timestamp, ttl to ColumnFamilyMap.insert() params
* Support variable-length integers with IntegerType. This allows more
space-efficient small integers as well as integers that exceed the size
of a long.
* Make ColumnFamilyMap a subclass of ColumnFamily instead of using one
as a component. This allows all of the normal adjustments normally done
to a ColumnFamily to be done to a ColumnFamilyMap instead. See "Class
Mapping with Column Family Map" for examples of using the new version.
* Expose the following ConnectionPool attributes, allowing them to be
altered after creation: max_overflow, pool_timeout, recycle,
max_retries, and logging_name. Previously, these were all supplied as
constructor arguments. Now, the preferred way to set them is to alter
the attributes after creation. (However, they may still be set in the
constructor by using keyword arguments.)
* Automatically skip "ghost ranges" in ColumnFamily.get_range().
Rows without any columns will not be returned by the generator,
and these rows will not count towards the supplied row_count.
Bug Fixes
* Add connections to ConnectionPool more readily when prefill is False.
Before this change, if the ConnectionPool was created with prefill=False,
connections would only be added to the pool when there was concurrent
demand for connections. After this change, if prefill=False and
pool_size=N, the first N operations will each result in a new connection
being added to the pool.
* Close connection and adjust the ConnectionPool‘s connection count after a
TApplicationException. This exception generally indicates programmer error,
so it’s not extremely common.
* Handle typed keys that evaluate to False
Deprecated
* ConnectionPool.recreate()
* ConnectionPool.status()
Miscellaneous
* Better failure messages for ConnectionPool failures
* More efficient packing and unpacking
* More efficient multi-column inserts in ColumnFamily.insert() and
ColumnFamily.batch_insert()
* Prefer Python 2.7’s collections.OrderedDict over the bundled version when
available
Changes in Version 1.1.1
Features
* Add max_count and column_reversed params to get_count()
* Add max_count and column_reversed params to multiget_count()
Bug Fixes
* Don’t retry operations after a TApplicationException. This exception is
reserved for programmatic errors (such as a bad API parameters), so
retries are not needed.
* If the read_consistency_level kwarg was used in a ColumnFamily
constructor, it would be ignored, resulting in a default read
consistency level of ONE. This did not affect the read consistency
level if it was specified in any other way, including per-method or by
setting the read_consistency_level attribute.
Changes in Version 1.1.0
This release adds compatibility with Cassandra 0.8, including support for
counters and key_validation_class. This release is backwards-compatible with
Cassandra 0.7, and can support running against a mixed cluster of both
Cassandra 0.7 and 0.8.
Changes related to Cassandra 0.8
* Addition of COUNTER_COLUMN_TYPE to system_manager.
* Several new column family attributes, including key_validation_class,
replicate_on_write, merge_shards_chance, row_cache_provider, and key_alias.
* The new ColumnFamily.add() and ColumnFamily.remove_counter() methods.
* Support for counters in pycassa.batch and ColumnFamily.batch_insert().
* Autopacking of keys based on key_validation_class.
Other Features
* ColumnFamily.multiget() now has a buffer_size parameter
* ColumnFamily.multiget_count() now returns rows in the order that the
keys were passed in, similar to how multiget() behaves. It also uses
the dict_class attribute for the containing class instead of always
using a dict.
* Autpacking behavior is now more transparent and configurable, allowing
the user to get functionality similar to the CLI’s assume command, whereby
items are packed and unpacked as though they were a certain data type,
even if Cassandra does not use a matching comparator type or validation
class. This behavior can be controlled through the following attributes:
- ColumnFamily.column_name_class
- ColumnFamily.super_column_name_class
- ColumnFamily.key_validation_class
- ColumnFamily.default_validation_class
- ColumnFamily.column_validators
* A ColumnFamily may reload its schema to handle changes in validation
classes with ColumnFamily.load_schema().
Bug Fixes
There were several related issues with overlow in ConnectionPool:
* Connection failures when a ConnectionPool was in a state of overflow
would not result in adjustment of the overflow counter, eventually
leading the ConnectionPool to refuse to create new connections.
* Settings of -1 for ConnectionPool.overflow erroneously caused overflow
to be disabled.
* If overflow was enabled in conjunction with prefill being disabled,
the effective overflow limit was raised to max_overflow + pool_size.
Other
* Overflow is now disabled by default in ConnectionPool.
* ColumnFamilyMap now sets the underlying ColumnFamily‘s
autopack_names and autopack_values attributes to False upon construction.
* Documentation and tests will no longer be included in the
packaged tarballs.
Removed Deprecated Items
The following deprecated items have been removed:
* ColumnFamilyMap.get_count()
* The instance parameter from ColumnFamilyMap.get_indexed_slices()
* The Int64 Column type.
* SystemManager.get_keyspace_description()
Deprecated
Athough not technically deprecated, most ColumnFamily constructor
arguments should instead be set by setting the corresponding
attribute on the ColumnFamily after construction. However, all
previous constructor arguments will continue to be supported if
passed as keyword arguments.
Changes in Version 1.0.8
* Pack IndexExpression values in get_indexed_slices() that are supplied
through the IndexClause instead of just the instance parameter.
* Column names and values which use Cassandra’s IntegerType are unpacked
as though they are in a BigInteger-like format. This is (backwards)
compatible with the format that pycassa uses to pack IntegerType data.
This fixes an incompatibility with the format that cassandra-cli and
other clients use to pack IntegerType data.
* Restore Python 2.5 compatibility that was broken through out of order
keyword arguments in ConnectionWrapper.
* Pack column_start and column_finish arguments in ColumnFamily *get*()
methods when the super_column parameter is used.
* Issue a DeprecationWarning when a method, parameter, or class that has
been deprecated is used. Most of these have been deprecated for several
releases, but no warnings were issued until now.
* Deprecations are now split into separate sections for each release in the changelog.
Deprecated in Version 1.0.8
* The instance parameter of ColumnFamilyMap.get_indexed_slices()
Changes in Version 1.0.7
* Catch KeyError in pycassa.columnfamily.ColumnFamily.multiget() empty row
removal. If the same non-existent key was passed multiple times, a
KeyError was raised when trying to remove it from the OrderedDictionary
after the first removal. The KeyError is caught and ignored now.
* Handle connection failures during retries. When a connection fails, it
tries to create a new connection to replace itself. Exceptions during
this process were not properly handled; they are now handled and count
towards the retry count for the current operation.
* Close connection when a MaximumRetryException is raised. Normally a
connection is closed when an operation it is performing fails, but this
was not happening for the final failure that triggers the
MaximumRetryException.
Changes in Version 1.0.6
* Add EOFError to the list of exceptions that cause a connection swap and
retry
* Improved autopacking efficiency for AsciiType, UTF8Type, and BytesType
* Preserve sub-second timestamp precision in datetime arguments for
insertion or slice bounds where a TimeUUID is expected. Previously,
precision below a second was lost.
* In a MaximumRetryException‘s message, include details about the last
Exception that caused the MaximumRetryException to be raised
* pycassa.pool.ConnectionPool.status() now always reports a non-negative
overflow; 0 is now used when there is not currently any overflow
* Created pycassa.types.Long as a replacement for pycassa.types.Int64.
Long uses big-endian encoding, which is compatible with Cassandra’s LongType,
while Int64 used little-endian encoding.
Deprecated in Version 1.0.6
* pycassa.types.Int64 has been deprecated in favor of pycassa.types.Long
Changes in Version 1.0.5
* Assume port 9160 if only a hostname is given
* Remove super_column param from pycassa.columnfamily.ColumnFamily.get_indexed_slices()
* Enable failover on functions that previously lacked it
* Increase base backoff time to 0.01 seconds
* Add a timeout parameter to pycassa.system_manager.SystemManger
* Return timestamp on single-column inserts
Changes in Version 1.0.4
* Fixed threadlocal issues that broke multithreading
* Fix bug in pycassa.columnfamily.ColumnFamily.remove() when a super_column
argument is supplied
* Fix minor PoolLogger logging bugs
* Added pycassa.system_manager.SystemManager.describe_partitioner()
* Added pycassa.system_manager.SystemManager.describe_snitch()
* Added pycassa.system_manager.SystemManager.get_keyspace_properties()
* Moved pycassa.system_manager.SystemManager.describe_keyspace() and
pycassa.system_manager.SystemManager.describe_column_family() to
pycassaShell describe_keyspace() and describe_column_family()
Deprecated in Version 1.0.4
* Renamed pycassa.system_manager.SystemManager.get_keyspace_description()
to pycassa.system_manager.SystemManager.get_keyspace_column_families()
and deprecated the previous name
Changes in Version 1.0.3
* Fixed supercolumn slice bug in get()
* pycassaShell now runs scripts with execfile to allow for multiline statements
* 2.4 compatability fixes
Changes in Version 1.0.2
* Failover handles a greater set of potential failures
* pycassaShell now loads/reloads pycassa.columnfamily.ColumnFamily instances when the underlying column family is created or updated
* Added an option to pycassaShell to run a script after startup
* Added pycassa.system_manager.SystemManager.list_keyspaces()
Changes in Version 1.0.1
* Allow pycassaShell to be run without specifying a keyspace
* Added pycassa.system_manager.SystemManager.describe_schema_versions()
Changes in Version 1.0.0
* Created the SystemManager class to allow for keyspace, column family, and
index creation, modification, and deletion. These operations are no longer
provided by a Connection class.
* Updated pycassaShell to use the SystemManager class
* Improved retry behavior, including exponential backoff and proper resetting
of the retry attempt counter
* Condensed connection pooling classes into only pycassa.pool.ConnectionPool
to provide a simpler API
* Changed pycassa.connection.connect() to return a connection pool
* Use more performant Thrift API methods for insert() and get() where possible
* Bundled OrderedDict and set it as the default dictionary class for column families
* Provide better TypeError feedback when columns are the wrong type
* Use Thrift API 19.4.0
Deprecated in Version 1.0.0
* ColumnFamilyMap.get_count() has been deprecated. Use
ColumnFamily.get_count() instead.
Changes in Version 0.5.4
* Allow for more backward and forward compatibility
* Mark a server as being down more quickly in Connection
Changes in Version 0.5.3
* Added PooledColumnFamily, which makes it easy to use connection pooling
automatically with a ColumnFamily.
Changes in Version 0.5.2
* Support for adding/updating/dropping Keyspaces and CFs in pycassa.connection.Connection
* get_range() optimization and more configurable batch size
* batch get_indexed_slices() similar to get_range()
* Reorganized pycassa logging
* More efficient packing of data types
* Fix error condition that results in infinite recursion
* Limit pooling retries to only appropriate exceptions
* Use Thrift API 19.3.0
Changes in Version 0.5.1
* Automatically detect if a column family is a standard column family or a super column family
* multiget_count() support
* Allow preservation of key order in multiget() if an ordered dictionary is used
* Convert timestamps to v1 UUIDs where appropriate
* pycassaShell documentation
* Use Thrift API 17.1.0
Changes in Version 0.5.0
* Connection Pooling support: pycassa.pool
* Started moving logging to pycassa.logger
* Use Thrift API 14.0.0
Changes in Version 0.4.3
* Autopack on CF’s default_validation_class
* Use Thrift API 13.0.0
Changes in Version 0.4.2
* Added batch mutations interface: pycassa.batch
* Made bundled thrift-gen code a subpackage of pycassa
* Don’t attempt to reencode already encoded UTF8 strings
Changes in Version 0.4.1
* Added batch_insert()
* Redifined insert() in terms of batch_insert()
* Fixed UTF8 autopacking
* Convert datetime slice args to uuids when appropriate
* Changed how thrift-gen code is bundled
* Assert that the major version of the thrift API is the same on the client
and on the server
* Use Thrift API 12.0.0
Changes in Version 0.4.0
* Added pycassaShell, a simple interactive shell
* Converted the test config from xml to yaml
* fixed overflow error on get_count()
* Only insert columns which exist in the model object
* Make ColumnFamilyMap not ignore the ColumnFamily’s dict_class
* Specify keyspace as argument to connect()
* Add support for framed transport and default to using it
* Added autopacking for column names and values
* Added support for secondary indexes with get_indexed_slices() and
pycassa.index
* Added truncate()
* Use Thrift API 11.0.0