GitHub - Jah-yee/morpho · GitHub
Skip to content

Jah-yee/morpho

Folders and files

Morpho

License Python PyTorch

Named for the blue morpho butterfly, Morpho menelaus. Its wings carry no pigment — the color emerges from the physical arrangement of nanoscale structures, invisible until light hits at the right angle. In the same way, the phenotypic patterns this project seeks are not present in any single clinical variable; they emerge from structural relationships in the data, visible only through the right computational lens. The word morpho (μορφή) means "form" — discovering hidden forms is the project's central aim.


Overview

A multi-phase research pipeline for psychiatric epidemiology. Covers data cleaning, descriptive statistics with publication-quality figures, deep statistical mining (EFA, clustering, change-point detection), and a multi-branch deep learning architecture for non-linear patient phenotype discovery from electronic health records.


Repository Structure

morpho/
├── 00_raw_data/                    # Raw data (tracked in internal shared mode; see GIT_VALVE.md)
├── 01_cleaned_data/                # Cleaned CSVs (same)
├── 02_figures/                     # Phase I statistical figures
├── 03_manuscript/                  # Manuscript drafts
├── 04_research_framework/          # Research design documents
├── 05_process_log/                 # Data cleaning audit trail
├── 06_reference/                   # Reference materials
├── 07_deep_analysis/               # Phase III deep mining outputs
├── 08_ml_architecture_proposal/    # Architecture design iterations
│
├── psychnet/                       # Phase IV: deep learning codebase
│   ├── data/                       #   Data pipeline
│   ├── models/                     #   Model architecture (~15.9M params)
│   ├── training/                   #   Training engine (AMP / early stop / W&B)
│   ├── evaluation/                 #   Metrics + interpretability + visualization
│   ├── scripts/train.py            #   Main entry: 9-stage training pipeline
│   ├── configs/default.yaml        #   Hyperparameter configuration
│   ├── ARCHITECTURE.md             #   Technical architecture (paper Figure 1)
│   └── GUIDE.md                    #   Full operating guide
│
├── .github/workflows/ci.yml        # CI: lint + smoke test + privacy check
├── LICENSE                          # Apache 2.0
├── CITATION.cff                     # Machine-readable citation
├── ETHICS.md                        # Ethics / IRB / de-identification
├── DATA_DICTIONARY.md               # Variable definitions and coding
├── CONTRIBUTING.md                  # Contribution guidelines
├── CHANGELOG.md                     # Version history
├── Makefile                         # Unified task runner
└── .gitignore                       # Excludes all patient data

Phases

Phase Scope Status
I. Data profiling Cleaning, 6 publication-quality figures ✅ Done
II. Organization Standardized directory structure ✅ Done
III. Deep mining EFA, RF, clustering, change-point, catastrophic expenditure ✅ Done
IV. Deep learning Multi-branch architecture (6,876 lines) ✅ Code complete
V. Training GPU training + hyperparameter tuning ⏳ Pending
VI. Manuscript Target: top-tier journals 📋 Planning

Architecture

  Patient Record (17 features)
         │
    ┌────┴────┐
    ▼         ▼
DeepInsight   Feature
 Portrait    Tokenizer
 (32×32)    (17 tokens)
    │         │
    ▼         ▼
ViT-Clinical  FT-Transformer        ~15.9M params
 (384-dim)    (256-dim)
    │         │
    └────┬────┘
         ▼
  Cross-Modal Fusion
  (Bidirectional Attention + Gating)
         │
    ┌────┼────┬────┬────┐
    ▼    ▼    ▼    ▼    ▼
  Task1 Task2 Task3 Task4 Embedding

See psychnet/ARCHITECTURE.md for full details.


Quick Start

make setup              # Install + verify environment
make smoke              # Model forward-pass test
make train-quick        # Quick training (~1 min)
make train              # Full training (~1-2h on GPU)

Or directly:

cd psychnet/
python setup.py
python scripts/train.py --quick
python scripts/train.py --config configs/default.yaml

Push to GitHub (large checkpoints)

Model weights (*.pt) use Git LFS (GitHub blocks plain blobs over 100MB). Install Git LFS (brew install git-lfs or git-lfs.com), then:

make push-github
# same as: ./scripts/push_to_github.sh

This runs git lfs push then git push --force-with-lease (needed once after LFS migrate). Upload can take several minutes on slow networks.


Reproducibility

Aspect Measure
Random seed Fixed (seed: 42)
Config snapshot Auto-saved per run (config.json)
Dependencies Pinned lower bounds in requirements.txt
Cleaning log Full audit trail in 05_process_log/
Data dictionary DATA_DICTIONARY.md
CI GitHub Actions: lint + smoke + privacy guard

Data Availability

Internal shared mode (current): this private repository may contain de-identified analytical tables and run outputs so collaborators have a full working tree. Before any public release, follow GIT_VALVE.md to restore .gitignore exclusions and review git history.

For external readers, de-identified data remains available upon reasonable request with a signed Data Use Agreement where applicable. See ETHICS.md.


Ethics

This study follows the Declaration of Helsinki and applicable data protection regulations. IRB approval, informed consent, and de-identification procedures are documented in ETHICS.md.


Citation

@software{morpho2026,
  title   = {Morpho},
  year    = {2026},
  license = {Apache-2.0},
  version = {4.0.0}
}

Machine-readable citation: CITATION.cff


Documents

Document Purpose
ETHICS.md IRB, de-identification, data availability
GIT_VALVE.md Internal shared vs public .gitignore / what to restore
DATA_DICTIONARY.md Variable definitions, coding rules
CITATION.cff Machine-readable citation metadata
CONTRIBUTING.md How to contribute
CHANGELOG.md Version history
SECURITY.md Vulnerability reporting
psychnet/ARCHITECTURE.md Model architecture + paper figures
psychnet/GUIDE.md Full pipeline guide

License

Apache License 2.0

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

No contributors

Languages