+ +
+ +
+

Deep Learning Exploration with AI-Ready Datasets#

+

Objective: Evaluate students’ ability to explore and implement deep learning models for their AI-ready datasets, benchmark these models against classical machine learning methods, deliver high-quality software, and analyze results critically.

+
+
+

1. Dataset Preparation and Exploration (10%)#

+
    +
  • AI-Ready Data Utilization (4%): Demonstrates the use of the previously prepared AI-ready dataset effectively, ensuring consistency in preprocessing across models.

  • +
  • Exploratory Data Analysis (EDA) (3%): Includes visualizations and summaries to understand data distribution, temporal/spatial features, or domain-specific nuances.

  • +
  • Problem Setup (3%): Clearly defines the problem (e.g., regression/classification) and aligns the data with deep learning requirements (e.g., reshaping for CNNs, sequence creation for RNNs).

  • +
+
+
+
+

2. Model Benchmarking Against CML (10%)#

+
    +
  • Baseline Models (5%): Reports results from previous classical machine learning benchmarks (e.g., random forests, SVMs, or gradient boosting) with minimal additional work.

  • +
  • Performance Comparison (5%): Provides a high-level comparison of CML methods to deep learning models using relevant metrics (e.g., accuracy, RMSE, F1-score).

  • +
+
+
+
+

3. Model Architecture Exploration (35%)#

+
    +
  • Implementation and Justification (8%): Implements at least three deep learning architectures (e.g., FCN, CNN, RNN, U-Net). Justifies architecture choice based on dataset and problem type.

  • +
  • Parameter Tuning (8%): Explores hyperparameters (e.g., learning rate, number of layers, filter sizes) and documents experiments systematically.

  • +
  • Incorporation of Physics-Informed Loss (4%): Implements physics-informed loss where appropriate, with a clear explanation of its relevance to the geoscientific problem.

  • +
  • Innovation and Complexity (8%): Includes innovative approaches like hybrid architectures, custom loss functions, or data augmentation specific to geoscience applications.

  • +
  • Exploration and Analysis (7%): Investigates losses, activation functions, and layer design, demonstrating a strong understanding of model behavior.

  • +
+
+
+
+

4. Performance Evaluation (20%)#

+
    +
  • Quantitative Evaluation (6%): Provides comprehensive metrics for all models, including accuracy, precision, recall, F1, RMSE, or domain-specific measures.

  • +
  • Generalization Testing (7%): Evaluates model performance on unseen or out-of-distribution data and discusses overfitting or underfitting tendencies.

  • +
  • Discussion on Narrow vs. General AI (4%): Reflects on the role of the implemented models as narrow AI and contrasts this with the broader concept of general AI, tying the discussion to the problem domain and dataset.

  • +
  • Visualization of Results (3%): Uses visualizations like confusion matrices, ROC curves, loss vs. epoch plots, or spatial/temporal error maps.

  • +
+
+
+
+

5. Software Delivery and Code Quality (20%)#

+
    +
  • Standard Practice for Training Neural Networks (10%):

    +
      +
    • Code is modular and organized in a single notebook.

    • +
    • Includes components like Dataset, DataLoader, model design as a class, training function, and training loop.

    • +
    • Explores training parameters and visualizes learning curves.

    • +
    +
  • +
  • Saving Results (5%):

    +
      +
    • Saves model weights, training logs, and performance metrics to a CSV/JSON file.

    • +
    +
  • +
  • Code Quality and Documentation (5%):

    +
      +
    • Follows best practices for readability, commenting, and modularity, ensuring reproducibility.

    • +
    +
  • +
+
+
+
+

6. Reporting and Interpretation (5%)#

+
    +
  • Scientific Communication (3%): Presents results clearly and concisely in a well-structured report or notebook, with appropriate figures, tables, and explanations.

  • +
  • Domain Insights (2%): Discusses implications of findings for geoscience, such as physical relevance, data limitations, or potential for real-world applications.

  • +
+
+
+
+

7. Ethical and Computational Considerations (5%)#

+
    +
  • Computational Efficiency (3%): Documents computational costs (e.g., training time, memory usage) and discusses their impact on model choice.

  • +
  • Ethical Considerations (2%): Reflects on ethical implications, including biases in data, transparency of model predictions, and alignment with societal goals.

  • +
+
+

Total: 100%

+
+
+ + + + +
+ +