-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the way features are saved #158
Comments
Is this for real? How is in python creating a tuple faster than appending a string ?? 😕 ❓ No doubt you, I'm just really surprised... |
Run this small (pure python) snippet of code, and even here the difference can be seen.
|
With this modified script and in my poor awesome rMBP 2015 I get these numbers (only 1.5 faster for me): strings 9.162139892578125
tuples 6.199376106262207 import time
features = {'some_feature_name_{}'.format(i):1 for i in range(1000000)}
keys = list(features.keys())
# WITH STRINGS
start = time.time()
for feature_name in keys:
for template_index in [-3. -2, -1, 0, 1, 2, 3]:
features['{}[{}]'.format(feature_name[:-3], template_index)] = 1
#print('strings', time.time()-start)
stringsTime = time.time()-start
# WITH TUPLES
start = time.time()
for feature_name in keys:
for template_index in (-3. -2, -1, 0, 1, 2, 3):
features[(feature_name[:-3], template_index)] = 1
#print('tuples', time.time()-start)
tuplesTime = time.time()-start
print('strings', stringsTime)
print('tuples', tuplesTime) |
Well there's is also another element to it I guess. In nalaf. Every time when adding a new feature.... We check if ends with '[0]' if not we append it.... Conversely with tuples we just check if the key is of type tuple... no string manipulation... So that is also having an impact. More precisely ends with '[some number]' (with regex). |
Currently when someone does:
token.features['my feature'] = value
we automatically append '[0]' and the feature becomes
'my feature[0]' : value
NOTE:
Huge improvement in performance, since python sucks with creating strings... For example for WindowFeatureGenerator we go from 16 second to less than 1 after such change.
NOTE:
Changes in nalaf and nala are trivial... not sure about other depending tools such as relna
The text was updated successfully, but these errors were encountered: