Portable Microhaplotype Object (PMO) Portable Microhaplotype Object (PMO) Portable Microhaplotype Object (PMO)
  • Home
  • Format Info
    • Overview of Format

    • PMO fields overview
    • PMO within a Data Analysis Ecosystem
    • History of Format Development

    • History of how PMO Format was derived
    • Overview of Format For Bioinformaticians

    • PMO Examples
    • Format Overview For Developers
  • PMO App
  • pmotools-python
    • Overview
    • Installation
    • Manual
    • Python Interface Tutorials
    • Building a PMO with minimum required fields
    • Updating the Specimen Meta Information in a minimum PMO
    • Building a PMO including optional sections and fields
    • Getting basic information from a PMO file
    • Command line Interface Tutorials
    • Command line interface guide
    • Extracting allele tables from a PMO
    • Subsetting a PMO
    • Getting basic information from a PMO
    • Extracting panel info from PMO
    • Handling Multiple PMOs
    • Validating PMO files
  • Resources
    • References
    • Documentation
    • Documentation Source Code
    • Comment or Report an issue for Documentation

    • pmotools-python
    • pmotools-python Source Code
    • Comment or Report an issue for pmotools-python

Contents

  • Required data
  • Merging info pmo
  • Adding a key to set a specimen_name for library_sample_name

Building a PMO with Minimum Required Fields

  • Show All Code
  • Hide All Code

In this tutorial we will show an example of using pmotools-python to generate a PMO that only contains the minimal information. We will use example data downloaded from the SRA associated with the following study:

Furstenau, T. N., Whealy, R., Timm, S., Roberts, A., Maltinsky, S., Wells, S. J., Drake, K., Ross, A., Bolduc, C., Pearson, T., & Fofanov, V. Y. (2025). High-throughput targeted amplicon screening tool for characterizing intrahost diversity in Staphylococcus aureus directly from sample. Microbial Genomics, 11(6). https://doi.org/10.1099/mgen.0.001427

Code
import pandas as pd
from pmotools.pmo_builder.panel_information_to_pmo import panel_info_table_to_pmo, merge_panel_info_dicts
from pmotools.pmo_builder.metatable_to_pmo import library_sample_info_table_to_pmo, specimen_info_table_to_pmo
from pmotools.pmo_builder.mhap_table_to_pmo import (
    mhap_table_to_pmo, 
    create_minimum_library_specimen_dict_from_mhap_table)
from pmotools.pmo_builder.merge_to_pmo import merge_to_pmo
from pmotools.pmo_engine.pmo_writer import * 
from pmotools.pmo_engine.pmo_checker import PMOChecker
import numpy as np
from pmotools.pmo_builder import panel_information_to_pmo

Required data

The minimum amount of information needed to create a PMO is the microhaplotype data and information on the panel used (at a minimum, the target’s primers). The files we will use from this study are:

  • allele_data.tsv.gz - results of microhaplotype called data
  • Furstenau2025_primers.tsv - primers used in the experiment

Merging info pmo

First we read in our data

Code
mhap_info_df = pd.read_csv("allele_data.tsv.gz", sep='\t')
primers = pd.read_csv("Furstenau2025_primers.tsv", sep='\t')

Next we convert our panel information to corresponding sections of PMO using the following code

Code
pmo_panel_and_target_info = panel_info_table_to_pmo(primers, 
                                                    panel_name = "staph_aureus_Furstenau2025",
                                                    target_name_col = "target",
                                                    forward_primers_seq_col = "forward", 
                                                    reverse_primers_seq_col = "reverse")

We also convert our microhaplotype data to corresponding sections of PMO

Code
pmo_mhaps = mhap_table_to_pmo(
                       microhaplotype_table=mhap_info_df, 
                       library_sample_name_col='s_Sample',
                       target_name_col='p_name',
                       seq_col='h_Consensus',
                       reads_col='c_ReadCnt')

Now we merge these two sections together to create a true PMO

Code
# merge into pmo
staph_aureus_pmo = merge_to_pmo(
    panel_target_info = pmo_panel_and_target_info,
    mhap_info = pmo_mhaps
)

We can check that our PMO complies with the ontology fully using the below

Code
# Validate the PMO file against schema 
checker = PMOChecker()
checker.validate_pmo_json(staph_aureus_pmo)

Now we have merged and validated our PMO we can write it to a file.

Code
# write out
pmowriter = PMOWriter()
pmowriter.write_out_pmo(staph_aureus_pmo, "minimum_Furstenau2025_PMO.json.gz", overwrite=True)

Adding a key to set a specimen_name for library_sample_name

If we only supply the panel and microhaplotype information (like in the example above) the specimen_names and library_sample_names are auto generated from microhaplotype data. The with specimen_names will be identical to the libary_sample_names. This can be changed by supplying a table that supplies a specimen_name to be used for each library_sample_name. Here we use another file we generated from SRA metadata for this dataset:

  • sra_info_table.tsv - this has the SRA/ENA meta information

Below we show how the current specimen_name and library_sample_name relate to each other in the PMO we generated above. Notice that we use a function from pmotools to export this information into a table to look at

Code
from pmotools.pmo_engine.pmo_exporter import PMOExporter
lib_to_spec_df = PMOExporter.list_library_sample_names_per_specimen_name(staph_aureus_pmo)
lib_to_spec_df.head()
specimen_name library_sample_name library_sample_count
0 SRR30825770 SRR30825770 1
1 SRR30825771 SRR30825771 1
2 SRR30825772 SRR30825772 1
3 SRR30825773 SRR30825773 1
4 SRR30825774 SRR30825774 1

We want to change how these are generateds, so we first read in the SRA information

Code
sra_info = pd.read_csv("sra_info_table.tsv", sep = '\t')
sra_info.head()
run_accession experiment_title sample_accession project_name submission_accession library_min_fragment_size bam_md5 assembly_software library_prep_longitude library_selection ... sequencing_primer_lot first_public transposase_protocol study_alias library_prep_location rna_prep_3_protocol ph sequencing_longitude tissue_type isolation_source
0 SRR31969808 Illumina MiSeq sequencing: AmpSeq of Staphyloc... SAMN46224567 NaN SRA2049563 NaN NaN NaN NaN PCR ... NaN 2025-01-14 NaN PRJNA1209594 NaN NaN NaN NaN NaN nares
1 SRR31969809 Illumina MiSeq sequencing: AmpSeq of Staphyloc... SAMN46224567 NaN SRA2049563 NaN NaN NaN NaN PCR ... NaN 2025-01-14 NaN PRJNA1209594 NaN NaN NaN NaN NaN nares
2 SRR31969810 Illumina MiSeq sequencing: AmpSeq of Staphyloc... SAMN46224566 NaN SRA2049563 NaN NaN NaN NaN PCR ... NaN 2025-01-14 NaN PRJNA1209594 NaN NaN NaN NaN NaN nares
3 SRR31969817 NextSeq 500 sequencing: WGS of Staphylococcus ... SAMN46224576 NaN SRA2049563 NaN NaN NaN NaN size fractionation ... NaN 2025-01-14 NaN PRJNA1209594 NaN NaN NaN NaN NaN nares
4 SRR31969820 NextSeq 500 sequencing: WGS of Staphylococcus ... SAMN46224576 NaN SRA2049563 NaN NaN NaN NaN size fractionation ... NaN 2025-01-14 NaN PRJNA1209594 NaN NaN NaN NaN NaN nares

5 rows × 192 columns

We convert this into a dictionary we can use in pmotools

Code
# create a dictionary key
lib_to_spec_key = sra_info.set_index('run_accession')['sample_alias'].to_dict()

now we build our tables with this information

Code
# supply key when building library_sample_info and specimen_info 
library_sample_and_spec_renamed_infos = create_minimum_library_specimen_dict_from_mhap_table(
    pmo_mhaps["detected_microhaplotypes"], 
    panel_name = "staph_aureus_Furstenau2025", 
    library_sample_specimen_key = lib_to_spec_key)

Finally, we can use this information along with the sections we already generated above (pmo_panel_and_target_info and pmo_mhaps) to merge into a new pmo

Code
# now build with renamed 
staph_aureus_pmo_renamed = merge_to_pmo(
    specimen_info = library_sample_and_spec_renamed_infos["specimen_info"],
    library_sample_info = library_sample_and_spec_renamed_infos["library_sample_info"],
    panel_target_info = pmo_panel_and_target_info,
    mhap_info = pmo_mhaps
)

Again, we can validate that this PMO complies with the schema

Code
checker.validate_pmo_json(staph_aureus_pmo)

Below we can see that our library and sample names are different

Code
lib_to_spec_renamed_df = PMOExporter.list_library_sample_names_per_specimen_name(staph_aureus_pmo_renamed)
lib_to_spec_renamed_df.head()
specimen_name library_sample_name library_sample_count
0 85b498-Wk16-Nasal SRR30825770 1
1 85b498-Wk28-Nasal SRR30825771 1
2 85b498-Wk12-Nasal SRR30825772 1
3 85b498-Wk20-Nasal SRR30825773 1
4 85b498-Wk14-Nasal SRR30825774 1

Finally we can write this PMO to a new file

Code
pmowriter.write_out_pmo(staph_aureus_pmo_renamed, "minimum_Furstenau2025_new_names_PMO.json.gz", overwrite=True)

For more information on adding extra information to your minimal PMO see Update meta in a minimal pmo

For more information on generating a PMO with extra metadata see PMO Generation

 

A PlasmoGenEpi project