Named for the blue morpho butterfly, Morpho menelaus. Its wings carry no pigment — the color emerges from the physical arrangement of nanoscale structures, invisible until light hits at the right angle. In the same way, the phenotypic patterns this project seeks are not present in any single clinical variable; they emerge from structural relationships in the data, visible only through the right computational lens. The word morpho (μορφή) means "form" — discovering hidden forms is the project's central aim.
A multi-phase research pipeline for psychiatric epidemiology. Covers data cleaning, descriptive statistics with publication-quality figures, deep statistical mining (EFA, clustering, change-point detection), and a multi-branch deep learning architecture for non-linear patient phenotype discovery from electronic health records.
morpho/
├── 00_raw_data/ # Raw data (tracked in internal shared mode; see GIT_VALVE.md)
├── 01_cleaned_data/ # Cleaned CSVs (same)
├── 02_figures/ # Phase I statistical figures
├── 03_manuscript/ # Manuscript drafts
├── 04_research_framework/ # Research design documents
├── 05_process_log/ # Data cleaning audit trail
├── 06_reference/ # Reference materials
├── 07_deep_analysis/ # Phase III deep mining outputs
├── 08_ml_architecture_proposal/ # Architecture design iterations
│
├── psychnet/ # Phase IV: deep learning codebase
│ ├── data/ # Data pipeline
│ ├── models/ # Model architecture (~15.9M params)
│ ├── training/ # Training engine (AMP / early stop / W&B)
│ ├── evaluation/ # Metrics + interpretability + visualization
│ ├── scripts/train.py # Main entry: 9-stage training pipeline
│ ├── configs/default.yaml # Hyperparameter configuration
│ ├── ARCHITECTURE.md # Technical architecture (paper Figure 1)
│ └── GUIDE.md # Full operating guide
│
├── .github/workflows/ci.yml # CI: lint + smoke test + privacy check
├── LICENSE # Apache 2.0
├── CITATION.cff # Machine-readable citation
├── ETHICS.md # Ethics / IRB / de-identification
├── DATA_DICTIONARY.md # Variable definitions and coding
├── CONTRIBUTING.md # Contribution guidelines
├── CHANGELOG.md # Version history
├── Makefile # Unified task runner
└── .gitignore # Excludes all patient data
| Phase | Scope | Status |
|---|---|---|
| I. Data profiling | Cleaning, 6 publication-quality figures | ✅ Done |
| II. Organization | Standardized directory structure | ✅ Done |
| III. Deep mining | EFA, RF, clustering, change-point, catastrophic expenditure | ✅ Done |
| IV. Deep learning | Multi-branch architecture (6,876 lines) | ✅ Code complete |
| V. Training | GPU training + hyperparameter tuning | ⏳ Pending |
| VI. Manuscript | Target: top-tier journals | 📋 Planning |
Patient Record (17 features)
│
┌────┴────┐
▼ ▼
DeepInsight Feature
Portrait Tokenizer
(32×32) (17 tokens)
│ │
▼ ▼
ViT-Clinical FT-Transformer ~15.9M params
(384-dim) (256-dim)
│ │
└────┬────┘
▼
Cross-Modal Fusion
(Bidirectional Attention + Gating)
│
┌────┼────┬────┬────┐
▼ ▼ ▼ ▼ ▼
Task1 Task2 Task3 Task4 Embedding
See psychnet/ARCHITECTURE.md for full details.
make setup # Install + verify environment
make smoke # Model forward-pass test
make train-quick # Quick training (~1 min)
make train # Full training (~1-2h on GPU)Or directly:
cd psychnet/
python setup.py
python scripts/train.py --quick
python scripts/train.py --config configs/default.yamlModel weights (*.pt) use Git LFS (GitHub blocks plain blobs over 100MB). Install Git LFS (brew install git-lfs or git-lfs.com), then:
make push-github
# same as: ./scripts/push_to_github.shThis runs git lfs push then git push --force-with-lease (needed once after LFS migrate). Upload can take several minutes on slow networks.
| Aspect | Measure |
|---|---|
| Random seed | Fixed (seed: 42) |
| Config snapshot | Auto-saved per run (config.json) |
| Dependencies | Pinned lower bounds in requirements.txt |
| Cleaning log | Full audit trail in 05_process_log/ |
| Data dictionary | DATA_DICTIONARY.md |
| CI | GitHub Actions: lint + smoke + privacy guard |
Internal shared mode (current): this private repository may contain
de-identified analytical tables and run outputs so collaborators have a
full working tree. Before any public release, follow GIT_VALVE.md
to restore .gitignore exclusions and review git history.
For external readers, de-identified data remains available upon reasonable request with a signed Data Use Agreement where applicable. See ETHICS.md.
This study follows the Declaration of Helsinki and applicable data protection regulations. IRB approval, informed consent, and de-identification procedures are documented in ETHICS.md.
@software{morpho2026,
title = {Morpho},
year = {2026},
license = {Apache-2.0},
version = {4.0.0}
}Machine-readable citation: CITATION.cff
