
msgen - R functions for interfacing with the Microsoft Genomics service on Azure

Colby T. Ford, Ph.D.

msgen icon


The Microsoft Genomics service in Azure can power genome sequencing using a cloud implementation of the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) for secondary analysis. The pipeline can take in multiple FASTQ and BAM files and provides alignment and variant outputs. The msgen package provides an interface to use the service from within R.

R Package Installation

You can install the latest stable version from GitHub using the following command:



Submit a workflow

submit_workflow(subscription_key = "04afabfc...",
                region = "eastus",
                process = "snapgatk",
                reference = "b37m1",
                description = "Submission from cford/msgen R package.",
                input_storage_account_name = "mygenomicsstorage",
                input_storage_account_key= "6GyBAbvgw5sqo2...",
                input_container_name = "myinputdata",
                blob_name_1 = "chr21_1.fq.gz",
                blob_name_2 = "chr21_2.fq.gz",
                output_container_name = "myoutputdata")

List all your workflows

list_workflows(subscription_key = "04afabfc...",
               region = "eastus")

Check the status of your workflow

get_workflow_status(subscription_key = "04afabfc...",
                    region = "eastus",
                    workflow_id = "12g3c5a...")

Cancel a workflow

cancel_workflow(subscription_key = "04afabfc...",
                region = "eastus",
                workflow_id = "12g3c5a...")


This open source R package/project is licensed under the Apache 2.0 License - see the LICENSE file for details

Note: The Microsoft Genomics service, Azure, and the msgen Python command-line interface are all Copyright (c) Microsoft Corporation. All rights reserved.