Choose and work on one out of the six distinct datasets provided.
You can assume that they are correctly labelled and cleaned.
Detailed descriptions are provided in the READEME.md
files of each challenge.
- Bike Deployment
- Fashion MNIST Classification
- Cryptocurrency Trading
- Volatility Pairs Trading
- Real Estate Housing
- Toxic Comment Classification
The submissions are assessed for their:
Briefly summarise the goal of the challenge of your choice and provide an abstract of your analysis.
Succinctness is always welcome! If you can show something with 3 lines of math or use well-accepted terminology to explain something concisely, please do so! Long texts are, in general, not preferred.
Descriptions of the datasets are intentionally not provided to enable you to research for them and gain insight into the raw data.
Any type of meaningful visualization and basic statistical analysis of the data is highly encouraged. Do not spend too much time on that, you should be able to get this with just a few lines of code.
Please avoid spending lots of time trying to produce the perfect looking graph. Simple ones will suffice.
Clearly state your assumptions before fitting any model. This will, hopefully, lead you to better support your model theoretically and also help us assess your model selection and approach.
Provide insight into the suggested model to address the problem.
Using sklearn
's API fit
, predict
, score
methods is expected but
not the purpose of these challenges. So just coding a model is not enough!
We are interested in how you made the decisions for coming up with your model.
As far as model selection and validation go, we do not expect you to perform grid cross-validation for everything, since this is computationally demanding. We have relaxed the requirements for the optimality of the final model, as long as there is a systematic approach. On the other hand, we do not trust oracles, so if you plan to hardcode model hyperparameters you had better come up with a good reasoning!
Note: Model accuracy is not the main objective of these challenges. State-of-the-art models can be found online, but if they are not properly presented and reasoned then the solutions will be consider of inferior quality compared to accurate but well developed and explained solutions.
Compress your solution and send us the .zip
or .tar
file via email.
Format the email subject according to:
ADST Challenge 20-21: <challenge_name>_<full_name>
where:
<challenge_name>
: is the name of the project given above<full_name>
: your full legal name, same as the one stated in your google form submission
Note that you should be a member of ICDSS society in order to accept your submission.
You are allowed to use any programming language, as long as you provide us with an automated way to setup our environment for reproducing your results. You should expect us to use Ubuntu >18.04 LTS for this purpose. If you have some particular preference between the two please specify it in your submission.
You can choose to submit:
-
a
pdf
report accompanied with documented scripts for reproducing the results. In this case we expect a well method-level documentation convention used so that we can follow the code along the report. There is no preference in the structure of the report, it can also look like a presentation.
Note: Quality matters only. The challenge can be successfully addressed and reported with just a handful of jupyter-cells/slides/report-pages.
Please fill in your details and upload your resume through this jotform
We will not answer to personal emails, please use issues for asking questions!
Requests for deadline extensions will be ignored.