Software Tools Repository

BTS

Language: R/Shiny

Repository: BTS

Reference(s):
Hunter Dlugas, Janaka S.S. Liyanage, Seongho Kim. BTS: Basic Testing System – an R/Shiny-Based Web Tool for Statistical Analysis.

Keywords: Education Statistical Data Analysis

Description: BTS is a web-based R Shiny application for teaching, learning, and performing statistical analyses common in the biological/healthcare sciences.

coAlign

Language: MATLAB

Repository: coAlign

DOI: 10.1002/cem.3236

Reference(s):
Li Z, Kim S, Zhong S, Zhong Z, Kato I, Zhang X. Coherent Point Drift Peak Alignment Algorithms Using Distance and Similarity Measures for Two-Dimensional Gas Chromatography Mass Spectrometry Data. J Chemom. 2020 Aug;34(8):e3236. doi: 10.1002/cem.3236. Epub 2020 Mar 28. PMID: 33505107; PMCID: PMC7837599.
Deng B, Kim S, Li H, Heath E, Zhang X. Global peak alignment for comprehensive two-dimensional gas chromatography mass spectrometry using point matching algorithms. J Bioinform Comput Biol. 2016 Dec;14(6):1650032. doi: 10.1142/S0219720016500323. Epub 2016 Sep 9. PMID: 27650662; PMCID: PMC5226864.

Keywords: Metabolomics Peak Alignment GCxGC-MS Point Matching Algorithm

Description: This is a MATLAB package for global peak alignment on GCxGC-MS Data using point matching algorithms. coAlign can align homogeneous and heterogeneous peaks.

cSILAC

Language: R

Repository: cSILAC

DOI: 10.1016/j.cmpb.2016.09.017

Reference(s):
Kim S, Carruthers N, Lee J, Chinni S, Stemmer P. Classification-based quantitative analysis of stable isotope labeling by amino acids in cell culture (SILAC) data. Comput Methods Programs Biomed. 2016 Dec;137:137-148. doi: 10.1016/j.cmpb.2016.09.017. Epub 2016 Sep 22. PMID: 28110720; PMCID: PMC5260509.

Keywords: Proteomics SILAC PSO Classification

Description: This is an R package for Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC) data. cSILAC can do a differential analysis based on PSO-based classification.

EIder (ComID)

Language: MATLAB

Repository: EIder (ComID)

DOI: 10.1016/j.chroma.2016.04.064

Reference(s):
Koo I, Kim S, Shi B, Lorkiewicz P, Song M, McClain C, Zhang X. EIder: A compound identification tool for gas chromatography mass spectrometry data. J Chromatogr A. 2016 May 27;1448:107-114. doi: 10.1016/j.chroma.2016.04.064. Epub 2016 Apr 23. PMID: 27131963.

Keywords: Metabolomics GC-MS Compound Identification

Description: EIder (EI mass spectrum identifier) provides users with eight literature-reported spectrum matching algorithms for compound identification from gas chromatography mass spectrometry (GC-MS) data.

gen2stage

Language: R

Repository: gen2stage

DOI: 10.29220/CSAM.2019.26.2.163

Reference(s):
Kim S, Wong WK. Phase II Two-Stage Single-Arm Clinical Trials for Testing Toxicity Levels. Commun Stat Appl Methods. 2019 Mar;26(2):163-173. doi: 10.29220/CSAM.2019.26.2.163. Epub 2019 Mar 31. PMID: 31106162; PMCID: PMC6522135.

Keywords: Clinical Trial Design Phase 2 Single-Arm Two-Stage

Description: This is an R package for generalized phase II single-arm two-stage designs for efficacy or toxicity.

HTrpart

Language: R, C

Repository: HTrpart

DOI: 10.5281/zenodo.15237184

Reference(s):
Dyson G. An Application of the Patient Rule-Induction Method to Detect Clinically Meaningful Subgroups from Failed Phase III Clinical Trials. Int J Clin Biostat Biom. 2021;7(1):038. doi: 10.23937/2469-5831/1510038. Epub 2021 Jun 28. PMID: 34632463; PMCID: PMC8496893.

Keywords: Classification Hypothesis testing Regression trees

Description: Recursively partition a dataset with a continuous or categorical response using the maximum mean rank (continuous outcome) or proportion of any observed class (categorical outcome) as the splitting mechanism to allow for hypothesis testing of each split. The partitioning will stop when there are not more statistically significant splits.

ICAOD

Language: R

Repository: ICAOD

DOI: 10.32614/rj-2022-043

Reference(s):
Masoudi E, Holling H, Wong WK, Kim S. ICAOD: An R Package for Finding Optimal designs for Nonlinear Statistical Models by Imperialist Competitive Algorithm. R J. 2022 Sep;14(3):20-45. doi: 10.32614/rj-2022-043. Epub 2022 Dec 19. PMID: 36779039; PMCID: PMC9912186.

Keywords: Optimal Design Imperialist Competitive Algorithm Nonlinear Models

Description: Finds optimal designs for nonlinear models using a metaheuristic algorithm called Imperialist Competitive Algorithm (ICA).

iFDR

Language: R

Repository: iFDR

DOI: 10.1002/cem.2665

Reference(s):
Kim S, Zhang X. Discovery of False Identification Using Similarity Difference in GC-MS based Metabolomics. J Chemom. 2015 Feb 1;29(2):80-86. doi: 10.1002/cem.2665. PMID: 25937705; PMCID: PMC4414261.

Keywords: Metabolomics GC-MS Compound Identification Similarity Difference

Description: This is an R package to control FDR in compound identification. iFDR can control the False Identification Rate in compound identification using an empirical Beta model.

iOPT

Language: R

Repository: iOPT

DOI: 10.1093/bioinformatics/bts083

Reference(s):
Kim S, Koo I, Wei X, Zhang X. A method of finding optimal weight factors for compound identification in gas chromatography-mass spectrometry. Bioinformatics. 2012 Apr 15;28(8):1158-63. doi: 10.1093/bioinformatics/bts083. Epub 2012 Feb 13. PMID: 22333245; PMCID: PMC3324511.

Keywords: Metabolomics GCxGC-MS Peak Alignment Cosine Similarity

Description: This is an R package for finding optimal weight factors on compound identification. iOPT can find optimal weight factors to obtain more accurate identification.

iPAD

Language: R

Repository: iPAD

DOI: 10.1186/1471-2105-14-123

Reference(s):
Jeong J, Zhang X, Shi X, Kim S, Shen C. An efficient post-hoc integration method improving peak alignment of metabolomics data from GCxGC/TOF-MS. BMC Bioinformatics. 2013 Apr 10;14:123. doi: 10.1186/1471-2105-14-123. PMID: 23575005; PMCID: PMC3637833.

Keywords: Metabolomics GCxGC-MS Peak Alignment Post-Hoc

Description: This is an R package for post-hoc peak alignment on GCxGC-MS Data. iPAD can iteratively align homogeneous peaks based on post-hoc analysis.

Mendelian Randomization Analysis

Language: R/Shiny

Repository: Mendelian Randomization Analysis

DOI: 10.3390/math10203743

Reference(s):
Liyanage, J.S.S.; Estepp, J.H.; Srivastava, K.; Rashkin, S.R.; Sheehan, V.A.; Hankins, J.S.; Takemoto, C.M.; Li, Y.; Cui, Y.; Mori, M.; et al. A Versatile and Efficient Novel Approach for Mendelian Randomization Analysis with Application to Assess the Causal Effect of Fetal Hemoglobin on Anemia in Sickle Cell Anemia. Mathematics 2022, 10, 3743.

Keywords: Mendelian randomization Extreme phenotype sequencing Causal inference Genome-wide association studies Next generation sequencing studies

Description: This web-based R Shiny application facilitates genetic causal inferences through Mendelian randomization analysis for a one-sample design, based on samples drawn using either a random sampling design or a nonrandom extreme phenotype sequencing design.

MRCount: Mendelian Randomization Analysis for Count Outcomes

Language: R/Shiny

Repository: MRCount

DOI: 10.1002/gepi.22602

Reference(s):
Liyanage, J. S. S.; Hankins, J. S.; Estepp, J. H.; Srivastava, D.; Rashkin, S. R.; Takemoto, C.; Li, Y.; Cui, Y.; Mori, M.; Weiss, M. J.; & Kang, G. (2025). A Novel One-Sample Mendelian Randomization Approach for Count-Type Outcomes That Is Robust to Correlated and Uncorrelated Pleiotropic Effects. Genetic epidemiology, 49(1), e22602.

Keywords: Mendelian randomization Count data Generalized linear models Genetic causal inferences Instrumental variables

Description: This web-based R Shiny application facilitates genetic causal inferences from count data through Mendelian randomization analysis for a one-sample design.

mSPA

Language: R

Repository: mSPA

DOI: 10.1093/bioinformatics/btr188

Reference(s):
Kim S, Fang A, Wang B, Jeong J, Zhang X. An optimal peak alignment for comprehensive two-dimensional gas chromatography mass spectrometry using mixture similarity measure. Bioinformatics. 2011 Jun 15;27(12):1660-6. doi: 10.1093/bioinformatics/btr188. Epub 2011 Apr 14. PMID: 21493650; PMCID: PMC3106184.

Keywords: Metabolomics GCxGC-MS Peak Alignment

Description: This is an R package for peak alignment on GCxGC-MS Data. mSPA can align homogeneous peaks based on mixture similarity measures.

msPeak

Language: R

Repository: msPeak

DOI: 10.1177/0962280217709817

Reference(s):
Kim S, Ouyang M, Jeong J, Shen C, Zhang X. A NEW METHOD OF PEAK DETECTION FOR ANALYSIS OF COMPREHENSIVE TWO-DIMENSIONAL GAS CHROMATOGRAPHY MASS SPECTROMETRY DATA. Ann Appl Stat. 2014 Jun;8(2):1209-1231. doi: 10.1214/14-aoas731. PMID: 25264474; PMCID: PMC4175529.

Keywords: Metabolomics GCxGC-MS Peak Detection

Description: This is an R package for peak detection using Bayes factor and mixture probability models. msPeak can do peak detection for GCxGC-MS.

msPeakG

Language: R

Repository: msPeakG

DOI: 10.1016/j.csda.2016.07.015

Reference(s):
Kim S, Jang H, Koo I, Lee J, Zhang X. Normal-Gamma-Bernoulli Peak Detection for Analysis of Comprehensive Two-Dimensional Gas Chromatography Mass Spectrometry Data. Comput Stat Data Anal. 2017 Jan;105:96-111. doi: 10.1016/j.csda.2016.07.015. Epub 2016 Aug 3. PMID: 27667882; PMCID: PMC5029791.

Keywords: Metabolomics GCxGC-MS Peak Detection

Description: This is an R package for peak detection using Normal-Gamma-Bernoulli models. msPeakG can do peak detection for GCxGC-MS.

MZsearch

Language: Bash; Python

Repository: MZsearch

DOI: 10.26434/chemrxiv-2024-5fm7t

Reference(s):
Hunter Dlugas, Xiang Zhang, Seongho Kim. MZsearch: A Python-Based Compound Identification Tool for GC-MS and LC-MS/MS-Based Metabolomics.
Dlugas, H., Zhang, X., & Kim, S. (2024). Liquid Chromatography - Tandem Mass Spectrometry (LC-MS/MS) and Gas Chromatography - Mass Spectrometry (GC-MS) Reference Libraries from Global Natural Products Social Molecular Networking (GNPS) and National Institute of Standards and Technology (NIST) WebBook Processed for Spectral Library Matching (V1.0) [Data set]. Zenodo.
Dlugas H, Zhang X, Kim S. Comparative analysis of continuous similarity measures for compound identification in mass spectrometry-based metabolomics. ChemRxiv. 2024

Keywords: Compound Identification Metabolomics Similarity Measures

Description: A Python-based tool for spectral library matching, MZsearch is available in two versions: a command-line interface and Python modules for integration into custom code.

NNs for binary classification

Language: Python; R

Repository: NNs_for_binary_classification

DOI: 10.3390/metabo15030174

Reference(s):
Dlugas, H.; Kim, S. A Comparative Study of Network-Based Machine Learning Approaches for Binary Classification in Metabolomics. Metabolites 2025, 15, 174.

Keywords: Artificial Neural Network Bayesian Neural Network Convolutional Neural Network Deep Learning Classification Feedforward Neural Network Kolmogorov-Arnold Network Machine Learning Metabolomics Spiking Neural Network

Description: Implementation of five network-based machine learning models (Bayesian neural networks, convolutional neural networks, feedforward neural networks, Kolmogorov-Arnold networks, and spiking neural networks) to binary classification tasks based on high-dimensional metabolomics data.

ppcor

Language: R

Repository: ppcor

DOI: 10.5351/CSAM.2015.22.6.665

Reference(s):
Kim S. ppcor: An R Package for a Fast Calculation to Semi-partial Correlation Coefficients. Commun Stat Appl Methods. 2015 Nov;22(6):665-674. doi: 10.5351/CSAM.2015.22.6.665. Epub 2015 Nov 30. PMID: 26688802; PMCID: PMC4681537.
Kim S. P-value calculation methods for semi-partial correlation coefficients. Commun Stat Appl Methods. 2022 May;29(3):397-402. doi: 10.29220/csam.2022.29.3.397. Epub 2022 May 31. PMID: 35756137; PMCID: PMC9230004.

Keywords: Partial Correlation Part Correlation

Description: Calculates partial and semi-partial (part) correlations along with p-value.

PRIMsurvdiff

Language: R

Repository: PRIMsurvdiff

DOI: 10.23937/2469-5831/1510038

Reference(s):
Dyson G. An Application of the Patient Rule-Induction Method to Detect Clinically Meaningful Subgroups from Failed Phase III Clinical Trials. Int J Clin Biostat Biom. 2021;7(1):038. doi: 10.23937/2469-5831/1510038. Epub 2021 Jun 28. PMID: 34632463; PMCID: PMC8496893.

Keywords: Classification PRIM Phase III Subgroup Survival

Description: Use the Patient Rule-Induction Method (PRIM) to identify subgroups of patients based on pre-treatment clinical and demographic characteristics where the experimental treatment is more effective than the standard of care treatment and better than observed in the entire clinical trial cohort with a time-to-event outcome.

ShinyMetID

Language: R/Shiny

Repository: ShinyMetID

DOI: 10.1016/j.chemolab.2023.104861

Reference(s):
Oh Y, Kim S, Kim S, Jeong J. ShinyMetID: An R shiny package for metabolite identification by mass spectral matching. Chemometr Intell Lab Syst. 2023 Sep 15;240:104861. doi: 10.1016/j.chemolab.2023.104861. Epub 2023 Jun 1. PMID: 37771843; PMCID: PMC10538253.

Keywords: Compound Identification Metabolomics GC-MS

Description: This is an R/Shiny package for compound identification for GC-MS data.

spatial2stage

Language: R

Repository: spatial2stage

DOI: 10.1016/j.csda.2021.107420

Reference(s):
Kim S, Wong WK. Spatial Two-stage Designs for Phase II Clinical Trials. Comput Stat Data Anal. 2022 May;169:107420. doi: 10.1016/j.csda.2021.107420. Epub 2022 Jan 6. PMID: 35058669; PMCID: PMC8765730.

Keywords: Clinical Trial Design Phase 2 Single-Arm Two-Stage

Description: This is an R package for spatial two-stage designs for phase II clinical trials.

ss2stagePSO

Language: R/Shiny

Repository: ss2stagePSO

DOI: 10.1177/0962280217709817

Reference(s):
Kim S, Wong WK. Extended two-stage adaptive designs with three target responses for phase II clinical trials. Stat Methods Med Res. 2018 Dec;27(12):3628-3642. doi: 10.1177/0962280217709817. Epub 2017 May 23. PMID: 28535716; PMCID: PMC5515697.

Keywords: Clinical Trial Design Phase 2 Single-Arm Two-Stage PSO

Description: This is an R package for the sample size determination for phase II (middle development) single-arm adaptive two-stage design.

ssLogitNorm

Language: R/Shiny

Repository: ssLogitNorm

DOI: 10.1177/0962280215572407

Reference(s):
Kim S, Heath E, Heilbrun L. Sample size determination for logistic regression on a logit-normal distribution. Stat Methods Med Res. 2017 Jun;26(3):1237-1247. doi: 10.1177/0962280215572407. Epub 2015 Mar 4. PMID: 25744106; PMCID: PMC4560689.

Keywords: Clinical Trial Design Power and Sample Size Estimation Logit-Normal

Description: This is an R package for the power and sample size determination on a logit-normal distribution.

swpa2gc

Language: R

Repository: swpa2gc

DOI: 10.1186/1471-2105-12-235

Reference(s):
Kim S, Koo I, Fang A, Zhang X. Smith-Waterman peak alignment for comprehensive two-dimensional gas chromatography-mass spectrometry. BMC Bioinformatics. 2011 Jun 15;12:235. doi: 10.1186/1471-2105-12-235. PMID: 21676240; PMCID: PMC3133553.

Keywords: Metabolomics Smith-Waterman Peak Alignment

Description: This is an R package for peak alignment on GCxGC-MS Data. SWPA can align homogeneous and heterogeneous peaks based on mixture similarity measures.

time2event

Language: R

Repository: time2event

DOI: 10.5281/zenodo.15151385

Reference(s):
Kim, S. (2025). time2event: an R package for survival and competing risk analyses with time-to-event data as covariates. Zenodo.

Keywords: Survival Analysis Time-Varing Covariate

Description: Cox proportional hazard and competing risk regression analyses can be performed with time-to-event data as covariates.

Software Tools*

BTS

coAlign

cSILAC

EIder (ComID)

gen2stage

HTrpart

ICAOD

iFDR

iOPT

iPAD

Mendelian Randomization Analysis

MRCount: Mendelian Randomization Analysis for Count Outcomes

mSPA

msPeak

msPeakG

MZsearch

NNs for binary classification

ppcor

PRIMsurvdiff

ShinyMetID

spatial2stage

ss2stagePSO

ssLogitNorm

swpa2gc

time2event