- Overview
- DCLI Practicum 2018 Min Project Guide
- Description
- Defined Question
- Data Pipeline Process
- Final Data Insight
- Cleaned Dataset
- Project Documentation
- Presentation
- Wednesday, 29th August, 2018
- Time: 09:30 – 13:00
The DCLI Practicum 2018 Group Mini Project is a key component of this year’s practicum offering and skills building. It is designed to allow participating fellows deepened the skills they are exposed to during the first 3 weeks of the practicum through a collaborative hands-on project. The emphasis will be on skills acquired from working with data such as using the School of Data Pipeline through a demonstration of the concepts, techniques and tools they have been exposed to.
This year’s practicum will focus on gender-related data to allow the fellows gain some experience working on this dataset as well as leverage on the expertise of their host organisations. Gender-related data is also a main focus in the Sustainable Development Goals (SDG) which makes it vital for DCLI Practicum fellows to understand the opportunities and challenges that may exist when working on data-related issues.
Fellows will present their group mini project on the final day of the DCLI Practicum at the Final Team Presentation on 30th August, 2018. The project will be evaluated on the following:
-
Defined Question/Hypothesis: a clear and specific definition of a gender data related question or hypothesis to be explored along the data pipeline.
-
Data Pipeline: a demonstration of familiarity and understanding of the Find, Get, Verify, Clean and Analyse stages of the data pipeline including a demonstration of the right tools to use for each stage.
-
Final Data Insight: a presentation of the final insight(s) that supports or refutes the question or hypothesis defined for the project.
-
Cleaned Dataset: a final copy of the verified and cleaned machine-readable dataset that can be accessed publicly.
-
Project Documentation: a documentation of their project work outlining the data pipeline process.
-
Collaboration: a clear demonstration of team work through the use of collaboration tools and each team member’s show of a relative mastery of the fundamentals of the data pipeline.
-
Presentation: a final group presentation summarising the key takeaways of the mini project.
This is the first step of the mini project where the team will agree on a question or hypothesis they will like answered through a final insight obtained through data. This question should be focused on gender-related data and Africa. The team can choose to work on a specific sector, community, country or region within Africa. Sub-questions or hypotheses can be included in this stage to help clarify areas of interests or identify relevant datasets during the subsequent Find and Get stages of the data pipeline.
This section lists the steps taken for the Find, Get, Verify, Clean and Analyse stages. This section should detail what the team did including initial actions and their final results. It should also list any challenges faced, actions taken to overcome them and any best practices the team identified. It is crucial that the specific tools and techniques used in these stages are also included for reference and reproducibility.
This stage can be written as a prose or could be done using a table as shown below for each stage:
Stage | |
---|---|
Tasks | |
Actions Taken | |
Tools/Techniques | |
Results | |
Challenges | |
Lessons | |
Best Practices |
The output for this section is the final insight that answers the question or the hypothesis from the define stage. This should be a one page/slide infographic, plot, table with all the required details to communicate the final insight. This section should leverage the visual design principles discussed during the practicum including audience knowledge, context and interests and well as simplicity of delivery.
Elements that will be assessed include:
- An appropriate, stand-alone title
- Visual design choices (colour, positioning, font etc)
- Appropriate choice of data representation through plots
- References
Special attention should be paid to making the output one that can be understood on its own without requiring additional explanation from the authors or research by the reader.
Reproduciblilty in data work is key to driving insights generated by project authors. In order to encourage this, the team should upload a clean copy of the machine-readable dataset(s) used for their analysis and presentation. This can be uploaded to an online platform including Google Drive, DropBox, Amazon S3 Bucket, GitHub or DataHub.io. Uploaded datasets must be accessible to anyone with internet connection with requiring a password. Datasets should also include any relevant metadata such as description of variable names, units and any reference to methodologies used for data collection.
This is a major part of the project and requires a record of the processes and steps that fellows took to complete this project. If the sessions above are followed, fellows should be able to detail their data process in a document. The project documentation will serve as a central repository of all the decisions and outputs that were involved in producing the final insight. A common occurrence in a majority of data project is the possibility of failure or unexpected insights. The documentation process allows us to capture expected and unexpected outputs for future reference. This reference serves as a vital resource for future repetition, reproduction or advancement of work done.
Documentation can be done using word processing or text editing tools such as Microsoft Word, Google Docs or GitHub. We highly encourage using a tool that allows for seamless collaboration and a record of changes made.
Finally, fellows will deliver a presentation of their mini project at the final event on Thursday, 30th August, 2018. The presentation should be a summary of the main takeaways from the mini project. It should highlight key insights from the 7 stages of the data pipeline, challenges, best practices and recommendations for future work.
This presentation should not be more than 15 slides with a focus on key takeaways and areas of intrigue that will drive the audience to delve more into the detailed resources later on. Each fellow should speak about some aspect of the work done during the presentation and be ready to answer questions from the audience. The team should aim for a 10-15 min presentation with 5 mins for questions from the audience.