A pipeline for multi-trait genome-wide association studies (GWAS) using MANTA.

The pipeline performs the following analysis steps:

  • Split genotype file
  • Preprocess phenotype and covariate data
  • Test for association between phenotypes and genetic variants
  • Collect summary statistics

The pipeline uses Nextflow as the execution backend. Please check Nextflow documentation for more information.


  • Unix-like operating system (Linux, MacOS, etc.)
  • Java 8 or later
  • Docker (v1.10.0 or later) or Singularity (v2.5.0 or later)

Quickstart (~2 min)

  1. Install Nextflow:

    curl -fsSL | bash
  2. Make a test run:

    nextflow run dgarrimar/mvgwas-nf -with-docker

Notes: move the nextflow executable to a directory in your $PATH. Set -with-singularity to use Singularity instead of Docker.

(*) Alternatively you can clone this repository:

git clone
cd mvgwas-nf
nextflow run -with-docker

Pipeline usage

Launching the pipeline with the --help parameter shows the help message:

nextflow run --help
N E X T F L O W  ~  version 20.04.1
Launching `` [amazing_roentgen] - revision: 56125073b7

mvgwas-nf: A pipeline for multivariate Genome-Wide Association Studies
Performs multi-trait GWAS using using MANTA (

nextflow run [options]

--pheno PHENOTYPES          phenotype file
--geno GENOTYPES            indexed genotype VCF file
--cov COVARIATES            covariate file
--l VARIANTS/CHUNK          variants tested per chunk (default: 10000)
--t TRANSFOMATION           phenotype transformation: none, sqrt, log (default: none)
--i INTERACTION             test for interaction with a covariate: none,  (default: none)
--ng INDIVIDUALS/GENOTYPE   minimum number of individuals per genotype group (default: 10)
--dir DIRECTORY             output directory (default: result)
--out OUTPUT                output file (default: mvgwas.tsv)

Input files and format

mvgwas-nf requires the following input files:

  • Genotypes. bgzip-compressed and indexed VCF genotype file.

  • Phenotypes. Tab-separated file with phenotype measurements (quantitative) for each sample (i.e. n samples x q phenotypes). The first column should contain sample IDs. Columns should be named.

  • Covariates. Tab-separated file with covariate measurements (quantitative or categorical) for each sample (i.e. n samples x k covariates). The first column should contain sample IDs. Columns should be named.

Example data is available for the test run.

Pipeline results

An output text file containing the multi-trait GWAS summary statistics (default: ./result/mvgwas.tsv), with the following information:

  • CHR: chromosome
  • POS: position
  • ID: variant ID
  • REF: reference allele
  • ALT: alternative allele
  • F: pseudo-F statistic
  • R2: fraction of variance explained by the variant
  • P: P-value

The output folder and file names can be modified with the --dir and --out parameters, respectively.

Cite mvgwas-nf

If you find mvgwas-nf useful in your research please cite the related publication:

Garrido-Martín, D., Calvo, M., Reverter, F., Guigó, R. A fast non-parametric test of association for multiple traits. bioRxiv (2022).

Version History

master @ aaa979d (earliest) Created 15th Feb 2023 at 11:58 by Diego Garrido-Martín

add citation

Frozen master aaa979d
Created: 15th Feb 2023 at 11:58

Total size: 7.8 MB
