[D-70] Meeting Log 2

Posted Dec 17, 2025

By Yoojin Kim

1 min read

[D-70] Meeting Log 2

Project Sync / Status Update Summary

Model Configuration Issues

Vocabulary Size Problem: The current configuration has an incorrect vocab size (32101 instead of 32102) causing index errors
Configuration File Requirements:
- Need to use YAML configuration file instead of direct model creation
- Should load the recommended YAML file for proper model initialization
- Must use configuration_gen instead of configuration_pegs for the generative model

Model Head Implementation

Current Status: Using MLM head from the existing implementation
Required Investigation: Need to research how transformer heads are implemented for decoder-only models
Decision Pending: Determine whether to modify the existing MLM head or implement a new head based on transformer library standards

Training Environment Setup

GPU Configuration: Code is running on GPU successfully
CUDA Version: Currently using CUDA 11.8, though the spectral states model was designed for CUDA 12.4
Development Environment Issues:
- VS Code remote SSH connection problems causing frequent logouts
- Workaround: Using terminal and Jupyter Notebook for development
- Jupyter Notebook can be run without GPU allocation to preserve resources

Model Training Progress

Dataset: Using the same categorized datasets (good issues followed by code)
Model Type: Switched to Mamba model configuration while maintaining the same training approach
Monitoring Requirements:
- Implement Weights & Biases (W&B) for tracking training metrics
- Plot loss, accuracy, and perplexity during training
- Generate visualizations for thesis documentation

Thesis Timeline

Submission Deadline: Confirmed thesis submission date
Scope: Empirical study on autoregressive state space models
Approach: Focus on detailed study of current model performance rather than achieving state-of-the-art results

Action Items

Fix vocabulary size configuration (change from 32101 to 32102)
Implement YAML configuration file loading
Research transformer head implementations for decoder-only models
Set up Weights & Biases for training monitoring
Plot training metrics (loss, accuracy, perplexity)
Schedule next meeting for Monday (before two-week break)

This post is licensed under CC BY 4.0 by the author.