Post

[D-70] Meeting Log 2

[D-70] Meeting Log 2

Project Sync / Status Update Summary

Model Configuration Issues

  • Vocabulary Size Problem: The current configuration has an incorrect vocab size (32101 instead of 32102) causing index errors

  • Configuration File Requirements:

    • Need to use YAML configuration file instead of direct model creation
    • Should load the recommended YAML file for proper model initialization
    • Must use configuration_gen instead of configuration_pegs for the generative model

Model Head Implementation

  • Current Status: Using MLM head from the existing implementation
  • Required Investigation: Need to research how transformer heads are implemented for decoder-only models
  • Decision Pending: Determine whether to modify the existing MLM head or implement a new head based on transformer library standards

Training Environment Setup

  • GPU Configuration: Code is running on GPU successfully

  • CUDA Version: Currently using CUDA 11.8, though the spectral states model was designed for CUDA 12.4

  • Development Environment Issues:

    • VS Code remote SSH connection problems causing frequent logouts
    • Workaround: Using terminal and Jupyter Notebook for development
    • Jupyter Notebook can be run without GPU allocation to preserve resources

Model Training Progress

  • Dataset: Using the same categorized datasets (good issues followed by code)

  • Model Type: Switched to Mamba model configuration while maintaining the same training approach

  • Monitoring Requirements:

    • Implement Weights & Biases (W&B) for tracking training metrics
    • Plot loss, accuracy, and perplexity during training
    • Generate visualizations for thesis documentation

Thesis Timeline

  • Submission Deadline: Confirmed thesis submission date

  • Scope: Empirical study on autoregressive state space models

  • Approach: Focus on detailed study of current model performance rather than achieving state-of-the-art results

Action Items

  • Fix vocabulary size configuration (change from 32101 to 32102)

  • Implement YAML configuration file loading

  • Research transformer head implementations for decoder-only models

  • Set up Weights & Biases for training monitoring

  • Plot training metrics (loss, accuracy, perplexity)

  • Schedule next meeting for Monday (before two-week break)

This post is licensed under CC BY 4.0 by the author.