Last update: 20250825
QV design principles
A short guide for writing portable, reusable qualifying variant (QV) sets.
QV sets are written to be tool agnostic, using standard field names and simple logic (field, operator, value). This minimises custom or pipeline specific content and makes each set portable across tools and studies. Tool specific flags, for example PLINK or Hail, are optional in a separate section, while the main definitions describe only the biological or statistical criteria. This design keeps QV files reusable, easy to review, and consistent across projects.
Purpose
- Describe what qualifies, not how a specific tool implements the rule
- Enable sharing, review, reuse, and exact reproduction across pipelines
Scope
- Applicable to research and clinical genomics
- Covers variant filtering, classification criteria, and analysis thresholds
Core principles
- Tool agnostic: express biology or statistics, not flags
- Atomic rules: one concept per rule
- Composable: combine rules with explicit logic
- Clear metadata: every set is self describing
- Stable IDs and versions
Minimal schema
qv_set_id: string
version: string
title: string
description: string
metadata:
created: YYYY-MM-DD
authors: [ "Name One", "Name Two" ]
tags: [ "GWAS", "QC" ]
filters:
rule_key:
description: string
logic: keep_if | drop_if
field: string # optional if a single field applies
threshold_min: number | null
threshold_max: number | null
notes: string # optional
criteria:
label_key:
description: string
logic: and | or
conditions:
- field: string | null
operator: "==" | "!=" | "<" | "<=" | ">" | ">=" | "in" | "not_in" | "exists"
value: scalar | null
values: [ scalar, ... ] # for in or not_in
group_by: [ field, ... ] # optional grouping
count: ">=2" # comparator for grouped counts
additional_criteria: # optional nested condition
field: string
operator: string
value: scalar
dependencies: [ "other_label_key" ] # optional cross references
implementations: {} # optional tool hints
notes: [] # optional free text
Dev notes
here is a developer note for the minimal qv yaml builder
purpose
* Single line entry to build valid QV YAML from a tiny DSL
* No dropdowns, no multi-field forms
* Immediate YAML preview and removable line chips
scope
* Supports meta, filters, criteria, notes
* Users type one statement per line
* Lines can be added and removed
* YAML exported with stable ordering
ui
* One text input for a single statement
* Buttons: add, reset, copy, download
* Chips list of added statements grouped by type
* YAML preview panel
* Lightweight examples shown as text under the input
* No menus, no wizards
input syntax
* One line per statement
* Keywords: meta, filter, criteria, note
* Tokens are key=value pairs where relevant
* Values can be numbers, booleans, null, quoted strings, or comma lists
grammar
* meta
`meta <key> <value>`
or `meta <key>=<value>`
Examples
`meta qv_set_id qv_gwas_common_v1_20250827`
`meta version 1.0.0`
`meta title="GWAS common QC"`
`meta created 2025-08-27`
`meta authors=["Alice","Bob"]`
`meta tags=GWAS,QC,PCA`
* filter
`filter <label> field=<name> operator<op> value=<val> [logic=<keep_if|drop_if>] [desc="text"]`
Examples
`filter maf_minimum field=MAF operator>= value=0.01 desc="Minimum MAF"`
`filter hwe field=HWE_P operator>= value=1e-6 logic=keep_if`
* criteria
Aggregates by repeating the same label. One condition per line.
`criteria <label> [field=<name> operator<op> [value=<val>]] [group_by=a,b,...] [count="<cmp>"] [logic=<and|or>] [desc="text"] [add.field=<f> add.operator<op> add.value=<v>]`
Examples
`criteria ps5 field=IMPACT operator== value=HIGH desc="Comp het with HIGH"`
`criteria ps5 group_by=sample,SYMBOL count=">=2"`
`criteria ps3 field=genotype operator== value=1 add.field=Inheritance add.operator=in add.value=AD,AD/AR logic=or`
* note
`note "free text"`
operators
* `== != < <= > >= in not_in exists`
* For `in` and `not_in` use comma lists, eg `A,B,C`
* For `exists` omit value
parsing rules
* Numbers: `0.01`, `1e-6` parsed as numbers
* Booleans: `true`, `false`
* Null: `null`
* Quoted strings preserved
* Comma lists become arrays
* `authors=["A","B"]` is allowed, also `authors=A,B`
* For `group_by`, split comma list to array and trim
* For `count`, store the comparator string as entered, eg `>=2`
* `add.field`, `add.operator`, `add.value` create `additional_criteria` for that condition
aggregation
* Repeated `criteria <label>` lines append to `criteria.<label>.conditions`
* `desc=` on the first criteria line sets `criteria.<label>.description`
If given again, last one wins
* `logic=` on any criteria line sets `criteria.<label>.logic`
* Repeated `filter <label>` updates the same filter block
yaml assembly order
* Top level: `qv_set_id`, `version`, `title`, `description`, `metadata`, `run`, `filters`, `criteria`, `ld_pruning`, `pca`, `covariates`, `association`, `implementations`, `notes`
* `metadata`: `created`, `authors`, `tags`
* Emit filters and criteria in entry order
* Inside filter: `description`, `logic`, then other keys
* Inside criteria label: `description`, `logic`, `dependencies` if supported later, then `conditions`
* Inside each condition: `field`, `operator`, `value`, `group_by`, `count`, `additional_criteria`
* Do not quote numeric or boolean scalars unnecessarily
error handling
* If a line cannot be parsed, show a small inline error and do not add
* Minimal validation messages, eg unknown keyword, missing label, missing operator
* Preserve user line in the input for correction
chips behaviour
* Each added line appears as a chip showing the original text
* Click a chip to remove that exact line and rebuild YAML
* Chips grouped by type with tiny headings for readability
non goals
* No background fetching, no schema discovery
* No automatic inference of complex tool implementation flags
testing checklist
* Add basic meta fields and confirm YAML order
* Add two filters and confirm threshold\_min and threshold\_max mapping via operators
* Build multi condition criteria ps5 as in corpus
* Build criteria with additional\_criteria for ps3
* Add note lines and confirm list output
* Try `exists` operator
* Try numbers, booleans, arrays, and quoted strings
* Remove chips and confirm YAML updates deterministically
future extensions (optional)
* `dependencies=` on criteria lines
* Export and import of the one line script
* Simple undo of last add
this aligns the UI with a single line input model, avoids menus, and preserves a stable YAML layout for human review and version control.