[D-70] Meeting Log 2
Project Sync / Status Update Summary
Model Configuration Issues
Vocabulary Size Problem: The current configuration has an incorrect vocab size (32101 instead of 32102) causing index errors
Configuration File Requirements:
- Need to use YAML configuration file instead of direct model creation
- Should load the recommended YAML file for proper model initialization
- Must use configuration_gen instead of configuration_pegs for the generative model
Model Head Implementation
- Current Status: Using MLM head from the existing implementation
- Required Investigation: Need to research how transformer heads are implemented for decoder-only models
- Decision Pending: Determine whether to modify the existing MLM head or implement a new head based on transformer library standards
Training Environment Setup
GPU Configuration: Code is running on GPU successfully
CUDA Version: Currently using CUDA 11.8, though the spectral states model was designed for CUDA 12.4
Development Environment Issues:
- VS Code remote SSH connection problems causing frequent logouts
- Workaround: Using terminal and Jupyter Notebook for development
- Jupyter Notebook can be run without GPU allocation to preserve resources
Model Training Progress
Dataset: Using the same categorized datasets (good issues followed by code)
Model Type: Switched to Mamba model configuration while maintaining the same training approach
Monitoring Requirements:
- Implement Weights & Biases (W&B) for tracking training metrics
- Plot loss, accuracy, and perplexity during training
- Generate visualizations for thesis documentation
Thesis Timeline
Submission Deadline: Confirmed thesis submission date
Scope: Empirical study on autoregressive state space models
Approach: Focus on detailed study of current model performance rather than achieving state-of-the-art results
Action Items
Fix vocabulary size configuration (change from 32101 to 32102)
Implement YAML configuration file loading
Research transformer head implementations for decoder-only models
Set up Weights & Biases for training monitoring
Plot training metrics (loss, accuracy, perplexity)
Schedule next meeting for Monday (before two-week break)