The Polygenic Score Catalog Calculator (pgsc_calc
)
Introduction
pgsc_calc
is a bioinformatics best-practice analysis pipeline for calculating
polygenic [risk] scores on samples with imputed genotypes using existing scoring
files from the Polygenic Score (PGS) Catalog
and/or user-defined PGS/PRS.
Pipeline summary
[!IMPORTANT]
- Whole genome sequencing (WGS) data are not currently supported by the calculator
- It’s possible to create compatible gVCFs from WGS data. We plan to improve support for WGS data in the near future.
The workflow performs the following steps:
- Downloading scoring files using the PGS Catalog API in a specified genome build (GRCh37 and GRCh38).
- Reading custom scoring files (and performing a liftover if genotyping data is in a different build).
- Automatically combines and creates scoring files for efficient parallel computation of multiple PGS
- Matching variants in the scoring files against variants in the target dataset (in plink bfile/pfile or VCF format)
- Calculates PGS for all samples (linear sum of weights and dosages)
- Creates a summary report to visualize score distributions and pipeline metadata (variant matching QC)
And optionally:
- Genetic Ancestry: calculate similarity of target samples to populations in a reference dataset (1000 Genomes (1000G)), using principal components analysis (PCA)
- PGS Normalization: Using reference population data and/or PCA projections to report individual-level PGS predictions (e.g. percentiles, z-scores) that account for genetic ancestry
See documentation for a list of planned features under development.
PGS applications and libraries
pgsc_calc
uses applications and libraries internally developed at the PGS Catalog, which can do helpful things like:
- Query the PGS Catalog to bulk download scoring files in a specific genome build
- Match variants from scoring files to target variants
- Adjust calculated PGS in the context of genetic ancestry
If you want to write Python code to work with PGS, check out the pygscatalog
repository to learn more.
If you want a simpler way of working with PGS, ignore this section and continue below to learn more about pgsc_calc
.
Quick start
-
Install
Nextflow
(>=23.10.0
) -
Install
Docker
orSingularity (v3.8.3 minimum)
(please only useConda
as a last resort) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run pgscatalog/pgsc_calc -profile test,
-
Start running your own analysis!
nextflow run pgscatalog/pgsc_calc -profile --input samplesheet.csv --pgs_id PGS001229
See getting started for more details.
Documentation
Full documentation is available on Read the Docs
Credits
pgscatalog/pgsc_calc is developed as part of the PGS Catalog project, a collaboration between the University of Cambridge’s Department of Public Health and Primary Care (Michael Inouye, Samuel Lambert) and the European Bioinformatics Institute (Helen Parkinson, Laura Harris).
The pipeline seeks to provide a standardized workflow for PGS calculation and ancestry inference implemented in nextflow derived from an existing set of tools/scripts developed by Inouye lab (Rodrigo Canovas, Scott Ritchie, Jingqin Wu) and PGS Catalog teams (Samuel Lambert, Laurent Gil).
The adaptation of the codebase, nextflow implementation, and PGS Catalog features are written by Benjamin Wingfield, Samuel Lambert, Laurent Gil with additional input from Aoife McMahon (EBI). Development of new features, testing, and code review is ongoing including Inouye lab members (Rodrigo Canovas, Scott Ritchie) and others. If you use the tool we ask you to cite our paper describing software and updated PGS Catalog resource:
-
Lambert, Wingfield et al. (2024) Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization. Nature Genetics. doi:10.1038/s41588-024-01937-x.
This pipeline is distrubuted under an Apache License amd uses code and infrastructure developed and maintained by the nf-core community (Ewels et al. Nature Biotech (2020) doi:10.1038/s41587-020-0439-x), reused here under the MIT license.
Additional references of open-source tools and data used in this pipeline are described in
CITATIONS.md
.
This work has received funding from EMBL-EBI core funds, the Baker Institute, the University of Cambridge, Health Data Research UK (HDRUK), and the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101016775 INTERVENE.
Version History
v2.0.0 (latest) Created 1st Nov 2024 at 11:50 by Samuel Lambert
Merge pull request #382 from PGScatalog/integrate-utils-1.4.1
2.0.0 release
Frozen
v2.0.0
205cbfd
v2.0.0-beta.3 Created 17th Oct 2024 at 12:44 by Samuel Lambert
2.0.0-beta.3 (#349)
-
bump version
-
bump pgscatalog-utils version to 1.3.0
-
update scoring files used in test profile
-
add warnings to the test profile
-
test log warning -> info
-
add error for -profile test and --run_ancestry
-
bump match version in test suite
-
Update modules.config
-
Update environment.yml
-
Update test.yml
-
update test profile message
-
check scoring variants matches input listed variants
-
fix sscore.vars path
-
suppress zcat warnings
-
use --force instead of --quiet to handle uncompressed data
-
stop using System.exit, which is deprecated by nextflow
-
make CI ignore docs
-
ignore docs on PRs too
-
bump fraposa version
-
bump fraposa version
-
drop subscribe on error checking because it hides error causes
Frozen
v2.0.0-beta.3
96fbb23
v2.0.0-beta.2 Created 17th Oct 2024 at 12:44 by Samuel Lambert
Merge pull request #341 from PGScatalog/dev
v2.0.0-beta.2
Frozen
v2.0.0-beta.2
69c467e
v2.0.0-alpha.1 Created 11th Aug 2023 at 15:30 by Samuel Lambert
v2.0.0-alpha.1 (#153)
-
fix test profile
-
change standard test to pull workflow with -r ${GITHUB_REF}
-
fix standard test
-
use local data for pytest
-
fix nextflow -r ${branch name}
-
oops
-
change report tag dev -> 2.0
-
update changelog
Frozen
v2.0.0-alpha.1
28a0971
v1.3.2 Created 10th Aug 2023 at 10:10 by Samuel Lambert
add singularity to test suite
Frozen
v1.3.2
bd1ca59
v2.0.0-alpha Created 10th Aug 2023 at 10:09 by Samuel Lambert
Merge pull request #135 from PGScatalog/dev
v2 release
Frozen
v2.0.0-alpha
af4882c
main @ d14e43e (earliest) Created 10th Aug 2023 at 10:01 by Samuel Lambert
use mamba in test
Frozen
main
d14e43e
Creators
Submitter
Views: 3463 Downloads: 796
Created: 10th Aug 2023 at 10:01
Last updated: 1st Nov 2024 at 11:50
None