The Polygenic Score Catalog Calculator (`pgsc_calc`)

Introduction

pgsc_calc is a bioinformatics best-practice analysis pipeline for calculating polygenic [risk] scores on samples with imputed genotypes using existing scoring files from the Polygenic Score (PGS) Catalog and/or user-defined PGS/PRS.

Pipeline summary

The workflow performs the following steps:

Downloading scoring files using the PGS Catalog API in a specified genome build (GRCh37 and GRCh38).
Reading custom scoring files (and performing a liftover if genotyping data is in a different build).
Automatically combines and creates scoring files for efficient parallel computation of multiple PGS
- Matching variants in the scoring files against variants in the target dataset (in plink bfile/pfile or VCF format)
Calculates PGS for all samples (linear sum of weights and dosages)
Creates a summary report to visualize score distributions and pipeline metadata (variant matching QC)

And optionally:

Genetic Ancestry: calculate similarity of target samples to populations in a reference dataset (1000 Genomes (1000G)), using principal components analysis (PCA)
PGS Normalization: Using reference population data and/or PCA projections to report individual-level PGS predictions (e.g. percentiles, z-scores) that account for genetic ancestry

See documentation for a list of planned features under development.

Quick start

Install Nextflow (>=22.10.0)
Install Docker or Singularity (v3.8.3 minimum) (please only use Conda as a last resort)
Download the pipeline and test it on a minimal dataset with a single command:
```
nextflow run pgscatalog/pgsc_calc -profile test,
```

Start running your own analysis!

nextflow run pgscatalog/pgsc_calc -profile  --input samplesheet.csv --pgs_id PGS001229

See getting started for more details.

Documentation

Full documentation is available on Read the Docs

Credits

pgscatalog/pgsc_calc is developed as part of the PGS Catalog project, a collaboration between the University of Cambridge’s Department of Public Health and Primary Care (Michael Inouye, Samuel Lambert) and the European Bioinformatics Institute (Helen Parkinson, Laura Harris).

The pipeline seeks to provide a standardized workflow for PGS calculation and ancestry inference implemented in nextflow derived from an existing set of tools/scripts developed by Inouye lab (Rodrigo Canovas, Scott Ritchie, Jingqin Wu) and PGS Catalog teams (Samuel Lambert, Laurent Gil).

The adaptation of the codebase, nextflow implementation, and PGS Catalog features are written by Benjamin Wingfield, Samuel Lambert, Laurent Gil with additional input from Aoife McMahon (EBI). Development of new features, testing, and code review is ongoing including Inouye lab members (Rodrigo Canovas, Scott Ritchie) and others. A manuscript describing the tool is in preparation. In the meantime if you use the tool we ask you to cite the repo and the paper describing the PGS Catalog resource:

PGS Catalog Calculator (in preparation). PGS Catalog Team. https://github.com/PGScatalog/pgsc_calc
Lambert et al. (2021) The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics. 53:420–425 doi:10.1038/s41588-021-00783-5.

This pipeline is distrubuted under an Apache License amd uses code and infrastructure developed and maintained by the nf-core community (Ewels et al. Nature Biotech (2020) doi:10.1038/s41587-020-0439-x), reused here under the MIT license.

Additional references of open-source tools and data used in this pipeline are described in CITATIONS.md.

This work has received funding from EMBL-EBI core funds, the Baker Institute, the University of Cambridge, Health Data Research UK (HDRUK), and the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101016775 INTERVENE.

Version History

v2.0.0 (latest) Created 1st Nov 2024 at 11:50 by Samuel Lambert

Merge pull request #382 from PGScatalog/integrate-utils-1.4.1

2.0.0 release

Frozen v2.0.0 205cbfd

v2.0.0-beta.3 Created 17th Oct 2024 at 12:44 by Samuel Lambert

2.0.0-beta.3 (#349)

bump version
bump pgscatalog-utils version to 1.3.0
update scoring files used in test profile
add warnings to the test profile
test log warning -> info
add error for -profile test and --run_ancestry
bump match version in test suite
Update modules.config
Update environment.yml
Update test.yml
update test profile message
check scoring variants matches input listed variants
fix sscore.vars path
suppress zcat warnings
use --force instead of --quiet to handle uncompressed data
stop using System.exit, which is deprecated by nextflow
make CI ignore docs
ignore docs on PRs too
bump fraposa version
bump fraposa version
drop subscribe on error checking because it hides error causes

Frozen v2.0.0-beta.3 96fbb23

v2.0.0-beta.2 Created 17th Oct 2024 at 12:44 by Samuel Lambert

Merge pull request #341 from PGScatalog/dev

v2.0.0-beta.2

Frozen v2.0.0-beta.2 69c467e

v2.0.0-alpha.1 Created 11th Aug 2023 at 15:30 by Samuel Lambert

v2.0.0-alpha.1 (#153)

fix test profile
change standard test to pull workflow with -r ${GITHUB_REF}
fix standard test
use local data for pytest
fix nextflow -r ${branch name}
oops
change report tag dev -> 2.0
update changelog

Frozen v2.0.0-alpha.1 28a0971

v1.3.2 Created 10th Aug 2023 at 10:10 by Samuel Lambert

add singularity to test suite

Frozen v1.3.2 bd1ca59

v2.0.0-alpha Created 10th Aug 2023 at 10:09 by Samuel Lambert

Merge pull request #135 from PGScatalog/dev

v2 release

Frozen v2.0.0-alpha af4882c

main @ d14e43e (earliest) Created 10th Aug 2023 at 10:01 by Samuel Lambert

use mamba in test

Frozen main d14e43e

The Polygenic Score Catalog Calculator
v2.0.0-alpha

v2.0.0 (latest)

v2.0.0-beta.3

v2.0.0-beta.2

v2.0.0-alpha.1

v1.3.2

v2.0.0-alpha

main @ d14e43e (earliest)

The Polygenic Score Catalog Calculator (`pgsc_calc`)

Introduction

Pipeline summary

Quick start

Documentation

Credits

Version History

v2.0.0 (latest) Created 1st Nov 2024 at 11:50 by Samuel Lambert

v2.0.0-beta.3 Created 17th Oct 2024 at 12:44 by Samuel Lambert

v2.0.0-beta.2 Created 17th Oct 2024 at 12:44 by Samuel Lambert

v2.0.0-alpha.1 Created 11th Aug 2023 at 15:30 by Samuel Lambert

v1.3.2 Created 10th Aug 2023 at 10:10 by Samuel Lambert

v2.0.0-alpha Created 10th Aug 2023 at 10:09 by Samuel Lambert

main @ d14e43e (earliest) Created 10th Aug 2023 at 10:01 by Samuel Lambert

Creators

Submitter

The Polygenic Score Catalog Calculator v2.0.0-alpha v2.0.0 (latest) v2.0.0-beta.3 v2.0.0-beta.2 v2.0.0-alpha.1 v1.3.2 v2.0.0-alpha main @ d14e43e (earliest)

The Polygenic Score Catalog Calculator (pgsc_calc)

Introduction

Pipeline summary

Quick start

Documentation

Credits

Version History

v2.0.0 (latest) Created 1st Nov 2024 at 11:50 by Samuel Lambert

v2.0.0-beta.3 Created 17th Oct 2024 at 12:44 by Samuel Lambert

v2.0.0-beta.2 Created 17th Oct 2024 at 12:44 by Samuel Lambert

v2.0.0-alpha.1 Created 11th Aug 2023 at 15:30 by Samuel Lambert

v1.3.2 Created 10th Aug 2023 at 10:10 by Samuel Lambert

v2.0.0-alpha Created 10th Aug 2023 at 10:09 by Samuel Lambert

main @ d14e43e (earliest) Created 10th Aug 2023 at 10:01 by Samuel Lambert

Creators

Submitter

Related items

The Polygenic Score Catalog Calculator
v2.0.0-alpha

v2.0.0 (latest)

v2.0.0-beta.3

v2.0.0-beta.2

v2.0.0-alpha.1

v1.3.2

v2.0.0-alpha

main @ d14e43e (earliest)

The Polygenic Score Catalog Calculator (`pgsc_calc`)