Portable Microhaplotype Object (PMO)Portable Microhaplotype Object (PMO) Portable Microhaplotype Object (PMO)
  • Home
  • Format Info
    • Overview of Format

    • PMO fields overview
    • PMO within a Data Analysis Ecosystem
    • History of Format Development

    • History of how PMO Format was derived
    • Overview of Format For Bioinformaticians

    • PMO Examples
    • Format Overview For Developers
  • PMO App
  • pmotools-python
    • Overview
    • Installation
    • Manual
    • Python Interface Tutorials
    • Building a PMO with minimum required fields
    • Updating the Specimen Meta Information in a minimum PMO
    • Building a PMO including optional sections and fields
    • Getting basic information from a PMO file
    • Command line Interface Tutorials
    • Command line interface guide
    • Extracting allele tables from a PMO
    • Subsetting a PMO
    • Getting basic information from a PMO
    • Extracting panel info from PMO
    • Handling Multiple PMOs
    • Validating PMO files
  • Resources
    • References
    • Documentation
    • Documentation Source Code
    • Comment or Report an issue for Documentation

    • pmotools-python
    • pmotools-python Source Code
    • Comment or Report an issue for pmotools-python

Contents

  • Quick Start
  • Motivation
  • Objectives
  • PMO structure and ontology
  • Convenience Tools
  • PMO in the Wider Ecosystem
  • Development Approach
  • Community Engagement

Portable Microhaplotype Object (PMO)

Multiplexed targeted sequencing is now widely used to generate data for the most informative genomic regions of organisms, but the lack of an appropriate data standard has hindered data sharing, reuse, and downstream analysis. Here, we provide details for an extensible standard and related convenience utilities to store lossless, compact representations of phased, processed target sequences (microhaplotypes) along with an efficient relational ontology in a portable JSON file.

* specimen_info and library_sample_info only need identifiers which can be autogenerated from the detected_microhalotypes

Quick Start

To get started, you can use the PMO App to build a PMO or use the pmotools-python package to build a PMO. We recommend using the PMO App if you don’t have any programming/ Python experience and the pmotools-python package for more advanced usage.

PMO App documentation pmotools-python documentation

If you are new to building a PMO, the template below can be a useful starting point.

Download Excel PMO Template

Motivation

Targeted amplicon sequencing is now established as a sensitive and efficient means of obtaining relevant information about a wide variety of organisms. Applications are broad and expanding, including microbiome analysis, pathogen identification, detection of antimicrobial resistance, and tracking the spread of viruses, bacteria, and eukaryotic pathogens.

Many of these applications utilize the full sequences provided by individual reads because they contain multiple, phased variants (microhaplotypes)(Oldoni et al. 2019) - information that is lost when decomposing these data into independent variants such as SNPs. This information is particularly valuable when sequencing samples containing organisms with more than one sequence per target, such as mixed bacterial samples, commonly polyclonal pathogens (e.g., Plasmodium) (Tessema et al. 2022),(LaVerriere et al. 2022),(Jacob et al. 2021),(Kattenberg Johanna Helena et al. 2023),(Aranda-Díaz et al. 2025),(Sadler et al. 2024), and diploid or polyploid organisms. Thus, data formats designed for small variants that do not preserve full sequences, such as the popular variant call format (VCF), are not well suited to store microhaplotype data.

Objectives

  • Provide a structured and flexible framework to help individual researchers and groups to organize their data in a findable and accessible way.
  • Create a standard for data sharing, including for repositories, academic reports, and public health entities, to aid in interoperability, transparency, and reproducibility
  • Maximize data reuse by lowering the barriers for making data publicly available in a standardized format.
  • Provide a consistent format to allow harmonization of downstream analysis tools across data sets and minimize the need for tedious and error-prone tasks such as data reshaping

PMO structure and ontology

Please see this page for details on the internal structure of a PMO and its naming schema.

Convenience Tools

There are two ways to build a PMO using our tools:

  • PMO App - An interactive, no-code PMO builder. The first load may take a moment. The app runs entirely in your browser, and all data remain on your computer—nothing is uploaded.
  • pmotools-python - Build and work with PMO files in Python or from the command line. See the documentation and tutorials.

PMO in the Wider Ecosystem

PMO serves as a convergence point within the broader data ecosystem between sample collection/allele calling and various downstream analyses. Please see here for more details

Development Approach

The microbiome community has created data standards for a single locus, including BIOM and ESS-DIVE. Here, we extend these standards to an arbitrary number of loci in a framework extensible to any type of targeted sequence data. The format is lossless, allowing recovery of full sequence data, while achieving data compression of ~6x and up to ~80x with additional compression using standard tools (e.g., gzip). Optional fields allow data generators with domain expertise to include additionally processed sequence data such as variants with masked domains, e.g. for highly error prone areas such as tandem repeats. Notably, the framework provides a robust relational ontology for sample, laboratory, and bioinformatic metadata in addition to sequence data, mitigating the common problem of partially or completely orphaned data. A full ontology has been built out for Plasmodium, leveraging existing fields where available, and the modular structure can be flexibly extended to other biological systems, including those containing multiple types of organisms. All data are encoded in a standard JSON file, enhancing portability and ease-of-interpretation. The end result is a design which is efficient, lightweight, and flexible, organizing metadata together with genetic data. Finally, we have created a set of convenience utilities to make it easy to create, manipulate, share, import, and export PMO files.

Please see here for a detailed breakdown of how the fields chosen relate to other standards

Community Engagement

The development of the PMO format would not be possible without the community engagement and feedback that we have received and we want to continue to incorporate feedback while we maintain the format. Please reach out to info@plasmogenepi.org with any general questions or feedback.

If you have questions on the documentation of the format, you can use the github issues page to ask questions, post suggestions for both the documentation and format: https://github.com/PlasmoGenEpi/PMO_Docs/issues

If you have questions on the python implementation of interacting with the format, you can use github issues page here: https://github.com/PlasmoGenEpi/pmotools-python/issues

References

Aranda-Díaz, Andrés, Eric Neubauer Vickers, Kathryn Murie, et al. 2025. “Sensitive and Modular Amplicon Sequencing of Plasmodium Falciparum Diversity and Resistance for Research and Public Health.” Sci. Rep. 15 (March): 10737.
Jacob, Christopher G, Nguyen Thuy-Nhien, Mayfong Mayxay, et al. 2021. “Genetic Surveillance in the Greater Mekong Subregion and South Asia to Support Malaria Control and Elimination.” Elife 10 (August).
Kattenberg Johanna Helena, Fernandez-Miñope Carlos, van Dijk Norbert J., et al. 2023. “Malaria Molecular Surveillance in the Peruvian Amazon with a Novel Highly Multiplexed Plasmodium Falciparum AmpliSeq Assay.” Microbiology Spectrum 0 (0): e00960–22.
LaVerriere, Emily, Philipp Schwabl, Manuela Carrasquilla, et al. 2022. “Design and Implementation of Multiplexed Amplicon Sequencing Panels to Serve Genomic Epidemiology of Infectious Disease: A Malaria Case Study.” Mol. Ecol. Resour. 22 (6): 2285–303.
Oldoni, Fabio, Kenneth K Kidd, and Daniele Podini. 2019. “Microhaplotypes in Forensic Genetics.” Forensic Sci. Int. Genet. 38 (January): 54–69.
Sadler, Jacob M, Alfred Simkin, Valery P K Tchuenkam, et al. 2024. “Application of a New Highly Multiplexed Amplicon Sequencing Tool to Evaluate Plasmodium Falciparum Antimalarial Resistance and Relatedness in Individual and Pooled Samples from Dschang, Cameroon.” Front. Parasitol. 3: 1509261.
Tessema, Sofonias K, Nicholas J Hathaway, Noam B Teyssier, et al. 2022. “Sensitive, Highly Multiplexed Sequencing of Microhaplotypes from the Plasmodium Falciparum Heterozygome.” J. Infect. Dis. 225 (April): 1227–37.
Source Code
---
title: "Portable Microhaplotype Object (PMO)"
---


```{r setup, echo=F}
source("common.R")
```

Multiplexed targeted sequencing is now widely used to generate data for the most informative genomic regions of organisms, but the lack of an appropriate data standard has hindered data sharing, reuse, and downstream analysis. Here, we provide details for an extensible standard and related convenience utilities to store lossless, compact representations of phased, processed target sequences (microhaplotypes) along with an efficient relational ontology in a portable JSON file.

![](images/high_level_tables.png)
* specimen_info and library_sample_info only need identifiers which can be autogenerated from the detected_microhalotypes 

## Quick Start

To get started, you can use the PMO App to build a PMO or use the pmotools-python package to build a PMO. We recommend using the PMO App if you don't have any programming/ Python experience and the pmotools-python package for more advanced usage.

[{{< fa laptop >}} PMO App documentation](pmotools-app-usage/pmotools-app.qmd){.btn .btn-secondary}
[{{< fa laptop >}} pmotools-python documentation](pmotools-python-usages/general_info.qmd){.btn .btn-secondary}

If you are new to building a PMO, the template below can be a useful starting point.

[{{< fa download >}} Download Excel PMO Template](overview/PMO_building_template.xlsx){.btn .btn-primary download="PMO_building_template.xlsx"}

## Motivation

**Targeted amplicon sequencing** is now established as a sensitive and efficient means of obtaining relevant information about a wide variety of organisms. Applications are broad and expanding, including microbiome analysis, pathogen identification, detection of antimicrobial resistance, and tracking the spread of viruses, bacteria, and eukaryotic pathogens. 

Many of these applications utilize the full sequences provided by individual reads because they contain multiple, phased variants (**microhaplotypes**)[@Oldoni2019-kb] - information that is lost when decomposing these data into independent variants such as SNPs. This information is particularly valuable when sequencing samples containing organisms with more than one sequence per target, such as mixed bacterial samples, commonly polyclonal pathogens (e.g., Plasmodium) [@Tessema2022-fg],[@LaVerriere2022-ya],[@Jacob2021-ib],[@Kattenberg_Johanna_Helena2023-jw],[@Aranda-Diaz2025-yg],[@Sadler2024-jw], and diploid or polyploid organisms. Thus, data formats designed for small variants that do not preserve full sequences, such as the popular variant call format (VCF), are not well suited to store microhaplotype data. 


## Objectives  

*  Provide a **structured and flexible framework** to help individual researchers and groups to organize their data in a findable and accessible way.
*  Create a standard for **data sharing**, including for repositories, academic reports, and public health entities, to aid in interoperability, transparency, and reproducibility
*  Maximize **data reuse** by lowering the barriers for making data publicly available in a standardized format. 
*  Provide a consistent format to allow **harmonization of downstream analysis tools** across data sets and minimize the need for tedious and error-prone tasks such as data reshaping


## PMO structure and ontology

Please see this [page](format/FormatOverview.qmd) for details on the internal structure of a PMO and its naming schema. 


## Convenience Tools

There are two ways to build a PMO using our tools:

* [PMO App](https://pmotools.app/) - An interactive, no-code PMO builder. The first load may take a moment. The app runs entirely in your browser, and all data remain on your computer—nothing is uploaded.
* [pmotools-python](pmotools-python-usages/pmotools-python.qmd) - Build and work with PMO files in Python or from the command line. See the [documentation](https://plasmogenepi.github.io/pmotools-python) and [tutorials](pmotools-python-usages/pmotools-python.qmd).

## PMO in the Wider Ecosystem

PMO serves as a convergence point within the broader data ecosystem between sample collection/allele calling and various downstream analyses. Please see [here](PMO_Ecosystem/Pmo_in_the_ecosystem.qmd) for more details


## Development Approach 

The microbiome community has created data standards for a single locus, including BIOM and ESS-DIVE. Here, we extend these standards to an arbitrary number of loci in a framework extensible to any type of targeted sequence data. The format is lossless, allowing recovery of full sequence data, while achieving data compression of ~6x and up to ~80x with additional compression using standard tools (e.g., gzip). Optional fields allow data generators with domain expertise to include additionally processed sequence data such as variants with masked domains, e.g. for highly error prone areas such as tandem repeats. Notably, the framework provides a robust relational ontology for sample, laboratory, and bioinformatic metadata in addition to sequence data, mitigating the common problem of partially or completely orphaned data. A full ontology has been built out for Plasmodium, leveraging existing fields where available, and the modular structure can be flexibly extended to other biological systems, including those containing multiple types of organisms. All data are encoded in a standard JSON file, enhancing portability and ease-of-interpretation. The end result is a design which is efficient, lightweight, and flexible, organizing metadata together with genetic data. Finally, we have created a set of convenience utilities to make it easy to create, manipulate, share, import, and export PMO files.

Please see [here](format/DevelopmentOfFormat.qmd) for a detailed breakdown of how the fields chosen relate to other standards


## Community Engagement

The development of the PMO format would not be possible without the community engagement and feedback that we have received and we want to continue to incorporate feedback while we maintain the format. Please reach out to [info@plasmogenepi.org](mailto:info@plasmogenepi.org) with any general questions or feedback. 

If you have questions on the documentation of the format, you can use the github issues page to ask questions, post suggestions for both the documentation and format: <https://github.com/PlasmoGenEpi/PMO_Docs/issues> 

If you have questions on the python implementation of interacting with the format, you can use github issues page here: <https://github.com/PlasmoGenEpi/pmotools-python/issues>



 

A PlasmoGenEpi project