Sunbelt Computer Software

Morpho

Named for the blue morpho butterfly, Morpho menelaus. Its wings carry no pigment — the color emerges from the physical arrangement of nanoscale structures, invisible until light hits at the right angle. In the same way, the phenotypic patterns this project seeks are not present in any single clinical variable; they emerge from structural relationships in the data, visible only through the right computational lens. The word morpho (μορφή) means "form" — discovering hidden forms is the project's central aim.

Overview

A multi-phase research pipeline for psychiatric epidemiology. Covers data cleaning, descriptive statistics with publication-quality figures, deep statistical mining (EFA, clustering, change-point detection), and a multi-branch deep learning architecture for non-linear patient phenotype discovery from electronic health records.

Repository Structure

morpho/
├── 00_raw_data/                    # Raw data (tracked in internal shared mode; see GIT_VALVE.md)
├── 01_cleaned_data/                # Cleaned CSVs (same)
├── 02_figures/                     # Phase I statistical figures
├── 03_manuscript/                  # Manuscript drafts
├── 04_research_framework/          # Research design documents
├── 05_process_log/                 # Data cleaning audit trail
├── 06_reference/                   # Reference materials
├── 07_deep_analysis/               # Phase III deep mining outputs
├── 08_ml_architecture_proposal/    # Architecture design iterations
│
├── psychnet/                       # Phase IV: deep learning codebase
│   ├── data/                       #   Data pipeline
│   ├── models/                     #   Model architecture (~15.9M params)
│   ├── training/                   #   Training engine (AMP / early stop / W&B)
│   ├── evaluation/                 #   Metrics + interpretability + visualization
│   ├── scripts/train.py            #   Main entry: 9-stage training pipeline
│   ├── configs/default.yaml        #   Hyperparameter configuration
│   ├── ARCHITECTURE.md             #   Technical architecture (paper Figure 1)
│   └── GUIDE.md                    #   Full operating guide
│
├── .github/workflows/ci.yml        # CI: lint + smoke test + privacy check
├── LICENSE                          # Apache 2.0
├── CITATION.cff                     # Machine-readable citation
├── ETHICS.md                        # Ethics / IRB / de-identification
├── DATA_DICTIONARY.md               # Variable definitions and coding
├── CONTRIBUTING.md                  # Contribution guidelines
├── CHANGELOG.md                     # Version history
├── Makefile                         # Unified task runner
└── .gitignore                       # Excludes all patient data

Phases

Phase	Scope	Status
I. Data profiling	Cleaning, 6 publication-quality figures	✅ Done
II. Organization	Standardized directory structure	✅ Done
III. Deep mining	EFA, RF, clustering, change-point, catastrophic expenditure	✅ Done
IV. Deep learning	Multi-branch architecture (6,876 lines)	✅ Code complete
V. Training	GPU training + hyperparameter tuning	⏳ Pending
VI. Manuscript	Target: top-tier journals	📋 Planning

Architecture

  Patient Record (17 features)
         │
    ┌────┴────┐
    ▼         ▼
DeepInsight   Feature
 Portrait    Tokenizer
 (32×32)    (17 tokens)
    │         │
    ▼         ▼
ViT-Clinical  FT-Transformer        ~15.9M params
 (384-dim)    (256-dim)
    │         │
    └────┬────┘
         ▼
  Cross-Modal Fusion
  (Bidirectional Attention + Gating)
         │
    ┌────┼────┬────┬────┐
    ▼    ▼    ▼    ▼    ▼
  Task1 Task2 Task3 Task4 Embedding

See psychnet/ARCHITECTURE.md for full details.

Quick Start

make setup              # Install + verify environment
make smoke              # Model forward-pass test
make train-quick        # Quick training (~1 min)
make train              # Full training (~1-2h on GPU)

Or directly:

cd psychnet/
python setup.py
python scripts/train.py --quick
python scripts/train.py --config configs/default.yaml

Push to GitHub (large checkpoints)

Model weights (*.pt) use Git LFS (GitHub blocks plain blobs over 100MB). Install Git LFS (brew install git-lfs or git-lfs.com), then:

make push-github
# same as: ./scripts/push_to_github.sh

This runs git lfs push then git push --force-with-lease (needed once after LFS migrate). Upload can take several minutes on slow networks.

Reproducibility

Aspect	Measure
Random seed	Fixed (`seed: 42`)
Config snapshot	Auto-saved per run (`config.json`)
Dependencies	Pinned lower bounds in `requirements.txt`
Cleaning log	Full audit trail in `05_process_log/`
Data dictionary	DATA_DICTIONARY.md
CI	GitHub Actions: lint + smoke + privacy guard

Data Availability

Internal shared mode (current): this private repository may contain de-identified analytical tables and run outputs so collaborators have a full working tree. Before any public release, follow GIT_VALVE.md to restore .gitignore exclusions and review git history.

For external readers, de-identified data remains available upon reasonable request with a signed Data Use Agreement where applicable. See ETHICS.md.

Ethics

This study follows the Declaration of Helsinki and applicable data protection regulations. IRB approval, informed consent, and de-identification procedures are documented in ETHICS.md.

Citation

@software{morpho2026,
  title   = {Morpho},
  year    = {2026},
  license = {Apache-2.0},
  version = {4.0.0}
}

Machine-readable citation: CITATION.cff

Documents

License

Apache License 2.0

Document	Purpose
ETHICS.md	IRB, de-identification, data availability
GIT_VALVE.md	Internal shared vs public `.gitignore` / what to restore
DATA_DICTIONARY.md	Variable definitions, coding rules
CITATION.cff	Machine-readable citation metadata
CONTRIBUTING.md	How to contribute
CHANGELOG.md	Version history
SECURITY.md	Vulnerability reporting
psychnet/ARCHITECTURE.md	Model architecture + paper figures
psychnet/GUIDE.md	Full pipeline guide

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Morpho

Overview

Repository Structure

Phases

Architecture

Quick Start

Push to GitHub (large checkpoints)

Reproducibility

Data Availability

Ethics

Citation

Documents

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
00_raw_data		00_raw_data
01_cleaned_data		01_cleaned_data
02_figures		02_figures
03_manuscript		03_manuscript
04_research_framework		04_research_framework
05_process_log		05_process_log
06_reference		06_reference
07_deep_analysis		07_deep_analysis
08_ml_architecture_proposal		08_ml_architecture_proposal
psychnet		psychnet
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DATA_DICTIONARY.md		DATA_DICTIONARY.md
ETHICS.md		ETHICS.md
GIT_VALVE.md		GIT_VALVE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
example_training_pipeline.py		example_training_pipeline.py

Sunbelt Computer Software

PL/B Language Development and Support

Folders and files

Latest commit

History

Repository files navigation

Morpho

Overview

Repository Structure

Phases

Architecture

Quick Start

Push to GitHub (large checkpoints)

Reproducibility

Data Availability

Ethics

Citation

Documents

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

Packages

Contributors