www.lesswrong.com/posts/BJ7AqXeigNKXLqZyx/mnemonic-portraits-for-19-023-human-ge...
2 corrections found
Pfam clan database sorts human proteins among 563 structural folds ("clans") like "Beta-propeller" and "Cystine-knot".
Pfam clans are not simply structural folds. In Pfam, a clan is an evolutionary grouping of related Pfam entries, supported by sequence, structure, function, or profile-HMM similarity.
Full reasoning
Pfam’s own documentation defines a clan as a collection of Pfam entries that share a single evolutionary origin. That is different from saying clans are just “structural folds.” Structural similarity can be one kind of evidence for clan membership, but Pfam also uses sequence motifs, functional similarity, and profile-HMM similarity. In other words, a clan is closer to an evolutionary superfamily-level grouping than to a pure fold classification.
So the post’s wording is inaccurate because it equates the Pfam clan concept with “structural folds,” which is not how Pfam defines clans.
2 sources
- Frequently Asked Questions (FAQs) - Pfam Documentation
Pfam defines a clan as a collection of entries that have arisen from a single evolutionary origin. Evidence of their evolutionary relationship can be in the form of similarity in tertiary structures, or, when structures are not available, from common sequence motifs.
- Grouping Pfam entries into Clans | Pfam
In Pfam there is a hierarchical level of classification which integrates evolutionary related entries in to sets, termed Clans. The relationship between entries in a Clan may be defined by: sequence similarity ... similarity of known three-dimensional structures ... functional similarity and/or similarity between their profile HMMs.
This metric basically tells you how well tolerated mutations in this gene are, from 0.0 (intolerable, black) to 2.0 (tolerable, white).
LOEUF is not a general mutation-tolerance score. It specifically measures intolerance to predicted loss-of-function variation in a gene.
Full reasoning
LOEUF stands for loss-of-function observed/expected upper bound fraction. In gnomAD’s own documentation, it is defined as a continuous metric for a gene’s intolerance to loss-of-function variation. The original gnomAD/Nature paper likewise defines LOEUF in terms of predicted loss-of-function variants.
That means LOEUF does not summarize how well a gene tolerates mutations in general. It does not directly cover all variant classes equally (for example, missense, synonymous, regulatory, or other non–loss-of-function changes). So describing it as a generic “how well tolerated mutations in this gene are” metric is too broad and misstates what LOEUF measures.
3 sources
- gnomAD v4.0 Gene Constraint | gnomAD browser
The loss-of-function observed/expected upper bound fraction (LOEUF) score is a continuous metric designed to demonstrate a gene's intolerance to loss-of-function variation.
- The mutational constraint spectrum quantified from variation in 141,456 humans | Nature
For downstream analyses, unless otherwise specified, we use the 90% upper bound of this confidence interval, which we term the loss-of-function observed/expected upper bound fraction (LOEUF).
- Targeting de novo loss-of-function variants in constrained disease genes improves diagnostic rates in the 100,000 Genomes Project - PMC
The loss-of-function observed over expected upper bound fraction, or LOEUF score, is a metric that places each gene on a continuous scale of loss-of-function constraint.