Jupyter notebooks that use the Fastai library
In this notebook (nbviewer version), we show how to use the fastai code from the first notebook of the 2020 fastai course with another dataset of images. The idea is to prove the possibility of quickly reusing the fastai code with its own data.
In this notebook (pdf, nbviewer version), a Web App is created from the training of a Deep Learning model which uses all the fastai v2 techniques taught by Jeremy Howard in his 2020 course (Practical Deep Learning for Coders). A post in medium (Product based on a Deep Learning model (by fastai v2)) gives all explanation about the use of its app.
Faster than training from scratch — Fine-tuning the English GPT-2 in any language with Hugging Face and fastai v2 (practical case with Portuguese)
In this notebook (nbviewer version), instead of training from scratch, we will see how to fine-tune in just over a day, on one GPU and with a little more than 1GB of training data an English pre-trained transformer-based language model to any another language. As a practical case, we fine-tune to Portuguese the English pre-trained GPT-2 by wrapping the Transformers and Tokenizers libraries of Hugging Face into fastai v2. We thus create a new language model: GPorTuguese-2, a language model for Portuguese text generation (and more NLP tasks...).
Note: as the full notebook is very detailed, use this fast notebook (nbviewer version) if you just want to run the code without explanation.
In this study, we will see that, while it is true that a BBPE tokenizer (Byte-level Byte-Pair-Encoding) trained on a huge monolingual corpus can tokenize any word of any language (there is no unknown token), it requires on average almost 70% of additional tokens when it is applied to a text in a language different from that used for its training. This information is key when it comes to choosing a tokenizer to train a natural language model like a Transformer model.
- post in medium
- notebook Byte-level-BPE_universal_tokenizer_but.ipynb (nbviewer version)
- Wikipedia downloading functions in the file nlputils_fastai2.py
The script 05_pet_breeds_DDP.py gives the code to run for training a Deep Learning model in Distributed Data Parallel (DDP) mode with fastai v2. It is inspired by the notebook 05_pet_breeds.ipynb from the fastbook (fastai v2), the Distributed and parallel training fastai v2 documentation and the notebook train_imagenette.py.
In order to get it run, you need to launch the following command within a fastai 2 virtual environment in a Terminal of a server with at least 2 GPUs:
python -m fastai2.launch 05_pet_breeds_DDP.py
The notebook 05_pet_breeds_DataParallel.ipynb (nbviewer version) gives the code to run for training a Deep Learning model in Data Parallel (DP) mode with PyTorch and fastai v2. It is inspired by the notebook 05_pet_breeds.ipynb from the fastbook (fastai v2), the Distributed and parallel training fastai v2 documentation and the notebook train_imagenette.py.
The objective of this notebook (nbviewer version) is to explain how to create parameters groups for a model with fastai v2 in order to train each one with a different learning rate, how to pass the list of Learning rates and how to check the Learning Rates effectively used by the Optimizer during the training.
The objective of this notebook is to explain how fastai v2 deals with batch sizes for the training and validation datasets.
The objective of this notebook is to show that the sizes of pkl files created by learn.export() of fastai v2 are different depending on the batch size used. This is odd, no?
Este repositório pode user usado como ponto de partida para fazer deploy de modelos do fastai no Heroku.
A aplicativo simples descrito aqui está em https://glasses-or-not.herokuapp.com/. Teste com imagens de você com e sem oculos!
Este é um tutorial rápido para fazer o deploy no Heroku dos seus modelos treinados com apenas alguns cliques. Ele vem com este repositório template que usa o modelo de Classificação de Ursos do Jeremy Howard da lição 2.
Images | Reduction of images channels to 3 in order to use the normal fastai Transfer Learning techniques
This notebook lesson1-pets_essential_with_xc_to_3c.ipynb (nbviewer) shows how to modify learner.py to a new file learner_xc_to_3c.py (learner x channels to 3 channels) to put a ConvNet in a fastai cnn_learner() before the pre-trained model like resnet (followed by a normalization by imagenet_stats).
This ConvNet as first layer allows to transform any images of the dataloader with n channels to an image with 3 channels. During the training, the filters of this ConvNet as first layer will be learnt. Thanks to that, it is possible to go on using fastai Transfer Learning functions even with images with more than 3 channels RGB like satellite images.
Warning As the Oxford IIIT Pet dataset already has 3 channels by image, there is no need here to change this number of channels. We only used this dataset to create our code. However, it would be more interesting to apply this code to images with more than 3 channels like images with 16 channels of the Dstl Satellite Imagery Feature Detection.
Following our publication of the WikiExtractor.py file which is platform-independent (ie running on all platforms, especially Windows), we publish our nlputils2.py file, which is the platform-independent version of the nlputils.py file of the fastai NLP course (more: we have split the original methods into many to use them separately and we have added one that cleans a text file).
[ EDIT 09/23/2019 ]
- The repository of the nlputils2.py file has changed to https://github.com/piegu/language-models
- Its new link is: https://github.com/piegu/language-models/blob/master/nlputils2.py
The extraction script WikiExtractor.py does not work when running fastai on Windows 10 because of the 'utf-8' encoding that is platform-dependent default in the actual code of the file.
Thanks to Albert Villanova del Moral that did the pull request "Force 'utf-8' encoding without relying on platform-dependent default" (but not merged until now (31st of August, 2019) by the script author Giuseppe Attardi), we know how to change the code. Thanks to both of them!
Links:
- Original WikiExtractor (but not updated with platform independent code)
- Updated WikiExtractor from Albert Villanova del Mora (UPDATED !!!)
- My file WikiExtractor.py saved here with the platform independent code (ie, working on all platforms and in particular on Windows)
O Hackathon Brasal/PCTec-UnB 2019 foi uma maratona de dados (dias 9 e 10 de maio de 2019), que reuniu estudantes, profissionais e comunidade, com o desafio de em dois dias, realizaram um projeto de Bussiness Intelligence para um cliente real: Brasal Veículos. Aconteceu no CDT da Universidade de Brasília (UnB) no Brasil. Nesse contexto, minha equipe desenvolveu o projeto "Vendedor IA" (VIA), um conjunto de modelos de Inteligência Artificial (IA) usando o Deep Learning cujo princípio é descrito nos 2 jupyter notebooks que foram criados:
- Data clean (vendas_veiculos_brasal_data_clean.ipynb): é o notebook de preparação da tabela de dados de vendas para treinar os modelos do VIA.
- Regressão (vendedor_IA_vendas_veiculos_brasal_REGRESSAO.ipynb): é o notebook de treinamento do modelo que fornece o orçamento que o cliente está disposto a gastar na compra de um veículo.
The objective of the jupyter notebook MURA | Abnormality detection is to show how the fastai v1 techniques and code allow to get a top-level classifier in the world of health. [ NEW ] We managed to increase our kappa score in this notebook (part 2).
[ EDIT 06/11/2019 ] This Web app is not online anymore. If you want to deploy it on Render, check the "Deploying on Render" fastai guide.
It is an images classifier that use the Deep Learning model resnet (the resnet50 version) that won the ImageNet competition in 2015 (ILSVRC2015). It classifies an image into 1000 categories.
The objective of the jupyter notebook pretrained-imagenet-classifier-fastai-v1.ipynb is to use fastai v1 instead of Pytorch code in order to classify images into 1000 classes by using an ImageNet winner model.
The jupyter notebook data-augmentation-by-fastai-v1.ipynb presents the code to apply transformations on images with fastai v1.
The jupyter notebook lesson1-quick.ipynb is an exercise that was proposed on 17/04/2018 & 21/04/2018 to the participants of the Deep Learning study group of Brasilia (Brazil). Link to the thread : http://forums.fast.ai/t/deep-learning-brasilia-revisao-licoes-1-2-3-e-4/14993
The jupyter notebook lesson1-DogBreed.ipynb is an exercise that was proposed on 17/04/2018 & 21/04/2018 to the participants of the Deep Learning study group of Brasilia (Brazil). Link to the thread : http://forums.fast.ai/t/deep-learning-brasilia-revisao-licoes-1-2-3-e-4/14993
https://github.com/piegu/fastai-projects/blob/master/howto_make_predictions_on_test_set