Tools for mining secondary metabolite biosynthesis gene clusters



2metDB

2metDB / SecmetDB is a standalone tool (webserver locally installed on the user's machine) that offers the possibility to mine for PKS and NRPS biosynthetic gene clusters in whole genome protein fasta files. The Algorithms used are the same than at the PKS/NRPS web server / Predictive Blast Server.

Reference (PKS/NRPS web server):

Link:


antiSMASH

see antiSMASH page.


ARTS

ARTS (Antibiotic Resistant Target Seeker) uses antiSMASH to predict biosynthetic gene clusters (BGCs), prioritizes detected BGCs and identifies drug targets.

Reference:

Link:


BAGEL

BAGEL is a mining tool and database for ribosomally synthesized and post-translationally modified peptides (RIPPs) like for example lanthipeptides, bacteriocins or . BAGEL can identify RIPP biosynthetic gene clusters in (meta)genomic data, and classify and analyze the putative the products.

References:

Link:


BiG-SCAPE

The Biosynthetic Gene Similaryty Clustering and Prospecting Engine BiG-SCAPE is a tool that uses distances between gene clusters (e.g. identified with antiSMASH) to build sequence similarity networks, which then are used to build gene cluster families. By mapping known gene clusters from the MIBiG dataset these data can be used for sequence-based dereplication of gene clusters.

Links:


CASSIS and SMIPS

Toolkit consiting of the tools CASSIS (Cluster Assignment by Islands of Sites) and SMIPS (Secondary Metabolites by InterProScan). SMIPS uses domain annotation provided by InterProScan to predict anchor genes encoding core biosynthetic enzymes (PKS, NRPS, DMATS) in eukaryotic genomic sequences. The data obtaines with SMIPS then serves as input for CASSIS, which uses an automated motif-based search for transcription factors to predict other genes associated with the "anchor" gene, i.e. gene clusters. The tool is available as webserver and for download.

Reference:

Links:


CLUSEAN

CLUSEAN // CLuster SEquence ANalyzer

CLUSEAN is a Bioperl based annotation pipeline for secondary metabolite biosynthetic gene clusters. It allows automated homology searches, identification of conserved protein domains in PKS and NRPS gene clusters, classification of enzymes, and specificity predictions for NRPS A-domains. The CLUSEAN results are annotated in EMBL files but also can be exported in MS Excel.

Reference:

Link:


ClusterFinder

ClusterFinder uses an probabilistic approach to detect putative secondary metabolite gene clusters in genomic and metagenomic data. Clusterfinder is available as standalone software and also integrated into antiSMASH and IMG-ABC.

Reference:

Download source code:


ClustScan Professional

see ClustScan Professional entry in the PKS/NRPS tools section.


eSNaPD // environmental Surveyor of Natural Product Diversity

eSNaPD is a tool to survey secondary metabolite gene cluster diversity in metagenomic DNA sequences, also taking into account metadata of the data, i.e. geographic sampling location.

References:

Link:


EvoMining

EvoMining uses phylogenomics to identify secondary metabolite biosynthetic gene clusters (BGCs) that encode duplicates of primary metabolism enzymes, but display a divergent phylogeny. This is based on an observation that such primary metabolic isoezymes are often included in secondary metabolite BGCs.

Reference:

Link:


FunGeneClusterS

Prediction of fungal gene clusters based on genome and transcriptome data. R-based webserver and offline-version available.

References:

Link:


MIDDAS-M

MIDDAS-M (a motif-independent de novo detection algorithm for SMB gene clusters) is a gene cluster mining tool that uses genome and transcriptome data to identify gene clusters in fungal genomes.

Reference:

Link:


MIPS-CG

MIPS-CG (motif-independent prediction without core genes) attempts to identify completely novel secondary metabolite biosynthetic gene clusters using only genome data. It does not use known sequences (or motifs) of core genes and transcriptome data.

References:

Link (Note: currently offline):


NaPDoS // Natural Products Domain Seeker

see NapDoS entry in the PKS/NRPS tools section.


PhytoClust

PhytoClust is dedicated to detection of biosynthetic gene clusters for secondary metabolites in plant genomes.

References:

Link:


PKMiner

PKMiner is a domain classifier predicting novel biosynthetic gene clusters of type II PKSs and aromatic polyketide chemotypes based on their annotated aromatase and cyclase domains.

References:

Link:


plantiSMASH

plantiSMASH is a version of antiSMASH dedicated to plant genomes.

References:

Link:


PRISM / GNP

GNP (Genes to Natural Products) is an integrated platform to link gene cluster data (for PKS/NRPS clusters) to LC-MS/MS data. Within the genome mining modules of GNP, gene clusters can be detected and putative biosynthetic products predicted. These prediction can be used in a second step to identify corresponding peaks in LC-MS/MS data of the strains within the GNP / iSNAP Database module. The PRISM component provides a web-based genome mining tool for nonribosomal peptides and type I and II polyketides, providing a very good structure prediction. After its initial release, PRISM was extended with support for RiPP cluster detection and analysis, and very recently, PRISM 3 was released, which now enables prediction of structures of a greater range of secondary metabolites.

References:

Link:

Source code:


RiPPMiner

RiPPMiner predicts chemical structures of ribosomally synthesized and post-translationally modified peptides (RiPPs).

Reference:

Link:


RODEO

RODEO (Rapid ORF Description and Evaluation Online) detects biosynthetic gene clusters (BGCs) encoding ribosomally synthesized and post-translationally modified peptides (RiPPs). RODEO is available as a standalone application, and also is integrated into antiSMASH.

References:

Link:

  • Main page: http://www.ripp.rodeo/
  • Webtool: http://rodeo.scs.illinois.edu/
  • antiSMASH

SANDPUMA

SANDPUMA (Specificity of AdenylatioN Domain Prediction Using Multiple Algorithms) predicts substrate specificities of adenylation domains of NRPS. Sandpuma is integrated into antiSMASH 4.

Reference:

Link:


SBSPKS

SBSPKS (Structure based sequence analysis of PKS and NRPS) allows various chemical analyses for experimentally characterized biosynthetic gene clusters (BGCs) encoding PKS/NRPS. Recently, its version 2 was released.

Reference:

Link:


SeMPI

SeMPI (Secondary Metabolite Prediction and identification) predicts structures of secondary metabolites biosynthesized by type I modular PKS. It uses antiSMASH and StreptomeDB 2.0 as backend engines. SeMPI can also be considered as a dereplication tool.

Reference:

Link:


SMURF / Secondary Metabolite Unknown Region Finder

SMURF is a web-based search platform to mine secondary metabolite biosynthetic gene clusters in fungi. SMURF employs a HMM based search strategy to identify conserved domains in PKS, NRPS, hybrid-PKS/NRPS and terpenoid gene clusters.

Reference:

Link: