Multiplexing PrimalScheme AmpSeq

Proposal and description

This is a workflow to produce well-designed primer pools for Fluidigm multiplexing on a 48.48 access array from the output of primal scheme and clustal omega. It should work generally for any similar multiplexing ampseq process by adjusting relevant parameters like pool size, amplicon length, and overlap.

Input data is a list of fasta files containing genomic sequences, typically genes and flanking regions, and relevant design parameters for the ampseq system. Output will include a list of predicted amplicons, the primers to generate those amplicons, and the relevant positional information and reaction specifications.

R scripts will handle most of the data transformation. Shell scripts are used for execution of scripts and some data gathering processes. A major component, primalsceme, is run inside a singularity shell and requires execution via SLURM. I am planning to use Snakemake to easily run all the components. Alternatively, I may use a shell script. The Workflow Summary section below goes into more detail about the purpose of each script.

Some areas require substantial work to make this perform as a polished pipeline. First, I have never used Snakemake and don’t know how it might work or be executed. Second, part of my workflow requires using Clustal Omega, which I have been accessing via a web portal. I don’t know if I will be able to automate web portal access and results retrieval. Alternatively, I believe clustal omega can be run as a script, but I’d probably have to set it up in Singularity. Third, designing the pipeline to be flexible for a wide range of parameters could be tricky. So for I have only tested parts of this workflow on a very narrow set of parameters specific to the Fluidigm 48.48 access array and illumina sequencing. I also have only tested it on a carefully selected set of genomic regions. I have no good ideas how to handle exceptions gracefully at this time, other than simply exiting and reporting the exceptions when possible.

Workflow Summary

Clustering of homologous sequences
1. Generate a list of genomic sequences as fasta files.
2. Submit to Clustal Omega.
3. Use tree to group highly related sequences. Currently a manual process.
4. Each group or ungrouped individual sequences get their own fasta files.
Primer generation (more detail)
1. Submit each fasta file to primal scheme with desired parameters using runprimalscheme.sh. This uses SLURM and a Singularity shell.
2. Extract and prepare the coverage log summary and produce visusuals using
  1. grepcoverage.sh pulls coverage information from output files of primalscheme.
  2. formatcoverage.R improves readability and formatting of the coverage data from primalscheme.
  3. analyzecoverage.R produces visual analysis of the results from primalscheme. Useful for determining if primalscheme did a good job designing amplicons given the set of constraining parameters. May help selection of best set of parameters to use for amplicon design.
Design of fluidigm pools (more detail)
1. Retrieve full list of primers using fetchprimers.sh
2. Separate primers into two pools based on overlap as specified by primal scheme using separatepools.R
3. Submit each pool to clustal omega.
4. Generate pairwise comparison of primer identity using assessmatricies.R
5. Split primary pools (designed by primalsceme) into secondary and tertiary pools to minimize identity while keeping pairs together. splitpools.R

TODO

Notes

Local paths so I don’t forget

# OSC
/fs/scratch/PAS1755/$MYDIR/Primal-to-Fluidigm

# Personal computer
/Users/$USER/Documents/GitHub/MPAS

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github		.github
README.assets		README.assets
fluidigm_pool_design		fluidigm_pool_design
primalscheme		primalscheme
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multiplexing PrimalScheme AmpSeq

Proposal and description

Workflow Summary

TODO

Notes

Local paths so I don’t forget

About

Releases

Packages

Languages

License

Fresnedo-Lab/MPAS

Folders and files

Latest commit

History

Repository files navigation

Multiplexing PrimalScheme AmpSeq

Proposal and description

Workflow Summary

TODO

Notes

Local paths so I don’t forget

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages