Workflow Type: Python
Frozen
Frozen
Work-in-progress
GALOP - Genome Assembly using Long reads Pipeline
This repository contains an exact copy of the standard Genoscope long reads assembly pipeline.
At the moment, this is not intended for users to download as it uses grid submission commands that will only work at Genoscope. As time goes on, we intend to make this pipeline available to a broader audience. However, genome assembly and polishing commands are accessible in the lib/assembly.py
and lib/polishing.py
files.
galop.py -h
Mandatory arguments:
--step {assembly,polishing}
Defines if the program will launch assembly or polishing scripts (default: None)
Assembly step arguments:
--proj PROJECT_CODE, -p PROJECT_CODE
Project and material codes, can be given multiple times (eg. -p BCM,A,B -p BWW,AB)
(default: None)
-i INPUT_FILE Nanopore reads fastq file (default: )
--size GENOME_SIZE, -s GENOME_SIZE
Estimated size of the genome in Mb (default: None)
--cov READSET_COVERAGE, -c READSET_COVERAGE
Coverage to use for longest and filtlong subsets (default: 30)
--assemblers ASSEMBLER_LIST
Comma-separated list of assemblers to use (e.g. '--assemblers
Smartdenovo,Raven,Wtdbg2'will not launch flye nor Necat. Choices: Flye, Hifiasm, Necat,
Nextdenovo, Raven, Shasta,Smartdenovo, Wtdbg2 (default:
Smartdenovo,Wtdbg2,Flye,Necat,Nextdenovo)
--readsets READSET_LIST
Comma-separated list of readsets to use (e.g. '--readsets Filtlong,Longest' will not
launch assemblies with all reads (default: Full,Filtlong,Longest)
--no-readset Disables readset creation (default: False)
--all-readsets Disables the use of lsRunProj to check for readset validity and instead use all available
readsets (default: False)
--force Skips directory creation (default: False)
--nano-raw Use --nano-raw instead of --nano-hq in Flye (default: False)
--pacbio Look for PacBio runs when building readsets. (default: False)
Polishing step arguments:
--model MEDAKA_MODEL, -m MEDAKA_MODEL
Model to use for medaka polishing (default: r941_prom_sup_g507)
--pe1 PE1_PATH Path to the Illumina R1 file (.gz or .fastq) (default: None)
--pe2 PE2_PATH Path to the Illumina R2 file (.gz or .fastq) (default: None)
--assembly ASSEMBLY, -a ASSEMBLY
FULL PATH to the assembly to polish (default: )
--assembly_dir ASSEMBLY_DIR
FULL PATH to the directory ouput of the 'nanopore_assembly_pipeline --step assembly'
(default: )
--racon Enables the racon step (default: False)
--no_medaka Skip the medaka step (default: False)
Optional arguments:
--dir OUTPUT_DIRECTORY, -d OUTPUT_DIRECTORY
Output directory (default: None)
--help, -h Show this help message and exit
Submission arguments:
--submode {msub,local}
Either submit using ccc_msub or run in local mode (default: msub)
--nolaunch Creates submission scripts but does not launch them (default: False)
--account ACCOUNT Account to use for submission (default: bistace)
--qos {long,week,nolimit,xlarge,xxlarge}
QoS to use for submission (default: )
--assembly_queue {normal,xlarge,small,broadwell,xxlarge}
Cluster queue to use for the assembly step (default: normal)
--assembly_core ASSEMBLY_CORE_NUMBER
Number of cores to use for the assembly step (default: 36)
--polishing_queue {normal,xlarge,small,broadwell,xxlarge}
Cluster queue to use for the polishing step (default: normal)
--polishing_core POLISHING_CORE_NUMBER
Number of cores to use for the polishing step (default: 36)
--wait Wait for all jobs to finish before exiting (default: False)
Version History
main @ a1c22db (latest) Created 14th Nov 2024 at 06:55 by Benjamin Istace
Update README.md
Frozen
main
a1c22db
main @ aa63fa8 (earliest) Created 12th Nov 2024 at 07:37 by Benjamin Istace
Add files via upload
Frozen
main
aa63fa8
Creators and Submitter
Creators
Submitter
Citation
Istace, B., Aury, J.-M., & Belser, C. (2024). GALOP - Genome Assembly using Long reads Pipeline. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1200.2
Activity
Views: 296 Downloads: 118
Created: 12th Nov 2024 at 07:37
Last updated: 14th Nov 2024 at 06:55
Annotated Properties
Topic annotations
Attributions
None
Collections