Faculty Software


CaPTk is a software platform facilitating radiomic analysis of radiographic images of cancer, currently focusing on brain, breast, and lung cancer. CaPTk integrates advanced, validated tools and machine learning algorithms to perform various aspects of medical image analysis that specifically address real clinical needs.

Comprehensive Automated Machine Learning Analysis Pipeline

Comprehensive Automated Machine Learning Analysis Pipeline: A rigorous machine learning analysis pipeline for binary classification including exploratory analysis, data processing, feature processing, ML modeling (11 methods) with hyperparameter sweeps, visualizations, and statistical analysis.


DLATK is a python-based end to end human text analysis package, specifically suited for social media and social scientific applications.


EpiVIA was developed for the joint profiling of the epigenome and lentiviral integration site analysis at population and single-cell resolutions.


Genetic Architecture Model Emulator for Testing and Evaluating Software (GAMETES) is software package for the generation of complex single nucleotide polymorphism (SNP) models and datasets for simulated association studies. GAMETES can generate precise n-way pure epistatic interaction, as well as univariate, additive, and heterogeneous associations in SNP datasets.


Grabseqs is a command-line tool that aims to simplify access to next-generation sequencing data and metadata stored in public repositories. Data from multiple repositories (NCBI’s Short Read Archive, MG-RAST, iMicrobe) are available in a standardized format through grabseqs


HeatITup is an algorithm for efficient and robust identification, classification, and quantification of Internal Tandem Duplication (ITD) mutations.


IM-PET identifies target promoters of distal transcriptional enhancers by integrating multiple types of genomics data.


inteGREAT is a graph-based algorithm for robust and scalable differential integration of transcriptomic and proteomic data sets.


Manubot is a tool to write and publish papers using the GitHub platform.


Pollution-Associated Risk Geospatial Analysis SITE (PARGASITE) is an online web-application and R package that can be used to estimate levels of pollutants in the U.S. for 1997 through 2019 at user-defined geographic locations and time ranges. Measures correspond to monthly and yearly raster files (Jan 2005 to Dec 2019) for PM2.5, Ozone, NO2, SO2, and CO covering the US and Puerto Rico that were created from United States Environmental Protection Agency (EPA) regulatory monitor data. The R package allows the user to obtain more customized output as well as work with the raster layers directly.


scATAC-pro is a comprehensive workbench for single-cell chromatin accessibility sequencing data.


Scedar (Single-cell exploratory data analysis for RNA-Seq) is a reliable and easy-to-use Python package for efficient visualization, imputation of gene dropouts, detection of rare transcriptomic profiles, and clustering of large-scale single cell RNA-seq (scRNA-seq) datasets.


scikit-ExSTraCS: A scikit-learn compatible implementation of ExSTraCS designed for supervised classification in noisy, larger-scale, and complex data with particular emphasis on the detection and characterization of heterogenous patterns of association.


scikit-rebate: Relief-based Algorithm Training Environment (ReBATE) is a feature importance and selection package that gives the user the flexibility to run a number of ‘core’ Relief-algorithm based strategies including Relief, ReliefF, SURF, SURF*, MultiSURF, and MultiSURF*.  Relief wrapper algorithms are also included to improved core algorithm performance in very large feature spaces (i.e. > 10,000 features).  These algorithms can be easily incorporated into a scikit-learn machine learning pipeline.


SPHARM-MAT is implemented based on a powerful 3D Fourier surface representation method called SPHARM, which creates parametric surface models using spherical harmonics. It is a matlab-based 3D shape modeling and analysis toolkit, and is designed to aid statistical shape analysis for relating morphometric changes in 3D structures of interest to different conditions.


stripenn is a command line interface python package developed for detection of architectural stripes from 3D genome conformation (e.g., Hi-C) data using image processing technique


Sunbeam is a pipeline written in snakemake that simplifies and automates many of the steps in metagenomic sequencing analysis. In addition to a modular design allowing users to only run their desired analytical steps from the core pipeline, Sunbeam is extensible—users can extend the pipeline through a simple extension framework, and install extensions written by the dev team or other users.


TooManyCells is a suite of graph-based tools for efficient, and unbiased clustering and visualization of single cell RNA-seq and ATAC-seq data sets.


The Tree-based Pipeline Optimization Tool (TPOT) is an open-source and Python-based automated machine learning (AutoML) method and software package which uses genetic programming to discover and optimize machine learning pipelines using algorithms in the scikit-learn library.


VisCello is a software platform for hosting and interactive analysis and visualizaiton of single cell data


The Xmeta is an R package and also an online platform to facilitate comprehensive meta-analysis for users with or without programming skills. It features two analytic paths: 1) for R users, you can install the “xmeta” package and directly call the main functions; and 2) for people who do not use R, you can use the web-based secure meta-analysis pipeline to personalize your own analysis. It includes a wide variety of analyses for univariate, multivariate and network meta-analysis, for continuous, binary and time to event outcomes. In addition, it also includes a rich set of model diagnosis tools and data visualizations for different types of analyses.