-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSWC pre-training task is very easy #190
Comments
@Maxtimer97 Pinging Maxime, what do you think about the difficulty of the pre-train subset? @V0XNIHILI Good point, the pre-train task is perhaps too easy given that all of the words are distinct and have many syllables. On the image datasets used for FSCIL, pre-train accuracy is usually around 80%. It does remain though that maintaining high accuracy through the continual learning portion is unsolved, and also that we monitor model complexity via network size and operations as well as accuracy. Do you have any other ideas for how we can select a subset of words from the MSWC set? |
Yes this is something we were already discussing before. Else we could make it harder by selecting the smallest words but we go away from promoting the processing of temporal information. For me this is OK because the target is not to convert the deep learning community to our dataset but rather the neuromorphic and tinyML one and considering the average model people play with right now (local plasticity, bio-plausible neurons, super sparse spiking, etc.), I think they will still have a challenge on this task. |
If we want to push for a real challenging temporal task with continual learning I would be up to brainstorm on that, but then we need to think of really long sequences and what transformers and state space models can do on such a task :) |
Selecting words differently could help a lot I guess (for example only selecting 'short' keywords) because on 10+2 class Google Speech Commands, the model from my comment only gets 95% validation accuracy. Even 5-15M parameter transformers only get 98.x% accuracy. For a harder task away from speech, I do not have a concrete proposal right now. Maybe it would be interesting to consult Neurobench's industry collaborators to see what they think could be a good task? For non real-life sequential tasks, building on top of miniImageNet could be an option. |
Ok I mean my remaining concern was the question of having a less temporal task with shorter keywords but I guess we are not testing long range memory anyway. So if you just want to move to shorter keywords @V0XNIHILI I guess you could go ahead, generate the new dataset and see how the baseline models perform. I can help if something doesn't work out of the box with the current models (actually I noticed LIF models maybe work even better than adLIF for the prototype approach, so it would be something to try for the shorter keywords). The remaining point then is just what it means for the paper @jasonlyik, can we just resubmit an adapted version with the shorter keywords and updated results? |
@Maxtimer97 In terms of the paper, we are still waiting for reviews, and especially early on before too many people are using the dataset it is the best time to change, if we want to change. |
So, how should we proceed then? For selecting shorter keywords, I think I will be able to do this. Then: what is the timeline/deadline for such a change? |
We should have until the final submission version is due, I will let you know when that may be. For now, could you look into shorter keywords within the next two or three weeks? |
@V0XNIHILI The reviews have returned. None were specifically concerned with the high starting accuracy on the MSWC task, but it is still something we can upgrade. We are planning to return the revision within the next 6-10 weeks. Are you still looking into the shorter keywords dataset changes? |
Okay, that sounds good. Yes I'd still like to do it but I haven't had the chance yet. I'll try to do it before the end of this month. If I can't manage timewise, I'll update you ASAP. |
Hey @V0XNIHILI, do you have any updates on this? |
Hey @jasonlyik, I have a small update. To stay true to the multilingual aspect of the original dataset, I kept five European languages (20 words each, total 100 words) and selected the shorted (in terms of characters) words with > 1000 samples + removing all words with equal spelling. Also I replaced Catalan by Spanish for the base set and Spanish for Turkish in the continual learning set as Spanish and Catalan are quite similar to each other in terms of pronunciation/intonation. I guess what remains then is to retrain on this set and see what the performance impact is? import json
path = 'metadata.json'
data = json.loads(open(path, 'r').read())
# Remove for each key in data, the key 'filenames' to make dict ligher
for key in data:
if 'filenames' in data[key]:
del data[key]['filenames'] def get_words(data, langs, count, min_samples, non_overlap_words = None):
wc_selected = {}
all_words = non_overlap_words or []
all_words = all_words[:]
for lang in langs:
wc_selected[lang] = {}
wc = data[lang]['wordcounts']
# Only select words that have a higher word count than 1000
for key in wc:
if wc[key] > min_samples:
wc_selected[lang][key] = wc[key]
sorted_words = sorted(wc_selected[lang], key=len)
for word in sorted_words:
if word in all_words:
sorted_words.remove(word)
selected_words = sorted_words[:count]
all_words.extend(selected_words)
if len(selected_words) < count:
raise ValueError('Not enough words for language: ' + lang)
# Sort by shortest word
wc_selected[lang] = selected_words
return wc_selected base_langs = ['en', 'de', 'fr', 'es', 'it']
base_words = get_words(data, base_langs, 20, 1000)
all_base_words = [word for lang in base_words for word in base_words[lang]]
cont_words = get_words(data, ['fa', 'tr', 'ru', 'cy', 'it', 'eu', 'pl', 'eo', 'pt', 'nl'], 10, 500, all_base_words) This yields: >>> base_words
{'en': ['has',
'the',
'big',
'car',
'sat',
'for',
'age',
'saw',
'cut',
'eye',
'yet',
'box',
'hey',
'six',
'lie',
'got',
'boy',
'son',
'job',
'set'],
'de': ['gab',
'mir',
'sie',
'tun',
'ihn',
'mit',
'ich',
'ihr',
'oft',
'tag',
'und',
'gar',
'ist',
'ort',
'für',
'aus',
'ihm',
'vor',
'gut',
'wer'],
'fr': ['rue',
'six',
'lui',
'non',
'bas',
'fin',
'les',
'oui',
'peu',
'moi',
'dit',
'été',
'qui',
'mes',
'des',
'une',
'dès',
'bon',
'mon',
'ici'],
'es': ['así',
'uno',
'era',
'día',
'año',
'han',
'por',
'hay',
'con',
'ese',
'san',
'una',
'del',
'muy',
'las',
'fue',
'que',
'sus',
'vez',
'los'],
'it': ['era',
'sua',
'che',
'due',
'per',
'tra',
'sul',
'più',
'suo',
'nel',
'dei',
'gli',
'dal',
'loro',
'sono',
'come',
'anni',
'dopo',
'alla',
'aveva']} >>> cont_words
{'fa': ['صدا', 'دهم', 'یاد', 'کمک', 'خون', 'قبل', 'های', 'این', 'هیچ', 'کنه'],
'tr': ['yüz',
'ise',
'iki',
'çok',
'beş',
'var',
'bin',
'bir',
'için',
'ancak'],
'ru': ['нам', 'эта', 'как', 'эти', 'его', 'мне', 'эту', 'все', 'уже', 'нет'],
'cy': ['awr', 'mwy', 'dim', 'ble', 'nhw', 'gan', 'eto', 'chi', 'ond', 'fod'],
'it': ['cui', 'nei', 'mai', 'sua', 'può', 'poi', 'due', 'sia', 'tre', 'tra'],
'eu': ['bat', 'eta', 'edo', 'dut', 'ere', 'dio', 'lan', 'zer', 'oso', 'den'],
'pl': ['jak', 'dla', 'gdy', 'też', 'był', 'ale', 'cię', 'czy', 'pod', 'nad'],
'eo': ['sen', 'min', 'unu', 'kaj', 'tio', 'laŭ', 'oni', 'ĝin', 'ili', 'tro'],
'pt': ['foi', 'mas', 'são', 'não', 'com', 'tem', 'seu', 'ela', 'ele', 'meu'],
'nl': ['wat', 'bij', 'een', 'aan', 'dat', 'het', 'van', 'uit', 'dan', 'was']} |
Looks good, I hope the shortest words aren't too difficult - especially with many of them being single syllable and also because MSWC has automated alignment the words may not be very clear. I'd guess we should shoot for something around the difficulty of the common FSCIL benchmarks, which looks to be around 75%-85% on base classes / session 0 (here's a latest FSCIL paper). I'll be finalizing the revision at the start of next month, if you could have any modification by the end of this month that would be great! |
Hey Jason, to give you an update: today I will schedule training on the above shortest words, but I will also try words with maximum length 5, 7 and 9. I can hopefully post the results of that tonight CET time. |
Validation accuracy progression: @jasonlyik sorry for my late reply, hopefully the results can still be useful... Legend:
Trained with the same settings as the pre-training done for the paper (same network, same lr, etc.). I used the same languages as before ( 3 characters only is definitely the hardest but aligns well with a goal of 75-85% validation accuracy on base classes. Furthermore, shorter words are also more realistic as keywords (i.e. "now", "yes", "go", ...) compared to longer words (i.e. "canada", "international", "development"). Let me know what you think and how we should proceed! |
Looks good Douwe, can we see how the prototypes work for the continual learning part as well? At this point I'm not sure we have enough time to push this new change as a major revision into the article, and I'd like to see more tests as well. The forward plans with the harness are to allow for many benchmarks to be included, so that one could find them all in one location and do whatever tests they would like. So, I think this more challenging keyword subset would be nice to add as something like an MSWC-short dataloader, which would enrich the FSCIL benchmark as a whole. |
@jasonlyik here are the graphs for the continual learning part. Seems like exactly 5 characters strikes a good trade-off. What do you think? Also, I quite like the idea of MSWC-short! |
Hey Douwe, sorry for not reacting ealier but this looks very nice! Are these results on the CNN or the SNN? Then I agree, it would be nice to have it as an alternative version of the task called MSWC-short! And maybe there would be space for supplementary materials on the paper to descibe it quickly? Anyway thank you for the work! |
On the CNN, the M5! @jasonlyik anything I can mean for the supplementary material of the paper with this data? |
Great to see that this moves more in line with standard FSCIL tasks! I have added these couple of sentences to the Methods section under the Keyword FSCIL task section:
@V0XNIHILI Could you package the script/data and send in a PR so that we can add it to the harness as well as auto-download from HuggingFace? |
Super, thanks a lot @jasonlyik! Yep, will do this!! |
I tried my hand at pre-training a few of the models I use on the MSWC base training set. I found that it was really easy to overfit (100% train accuracy) on this data (even with random time shifts):
This model has 126k parameters and performs equally well to the M5 model in the paper. I think with a bit more tuning, I could get 98% validation accuracy with this model. Meaning that for bigger models, 99/100% validation accuracy should be within close reach...
What are your thoughts?
The text was updated successfully, but these errors were encountered: