Portable Microhaplotype Object (PMO) Portable Microhaplotype Object (PMO) Portable Microhaplotype Object (PMO)
  • Home
  • Format Info
    • Overview of Format

    • PMO fields overview
    • PMO within a Data Analysis Ecosystem
    • History of Format Development

    • History of how PMO Format was derived
    • Overview of Format For Bioinformaticians

    • PMO Examples
    • Format Overview For Developers
  • PMO App
  • pmotools-python
    • Overview
    • Installation
    • Manual
    • Python Interface Tutorials
    • Building a PMO with minimum required fields
    • Updating the Specimen Meta Information in a minimum PMO
    • Building a PMO including optional sections and fields
    • Getting basic information from a PMO file
    • Command line Interface Tutorials
    • Command line interface guide
    • Extracting allele tables from a PMO
    • Subsetting a PMO
    • Getting basic information from a PMO
    • Extracting panel info from PMO
    • Handling Multiple PMOs
    • Validating PMO files
  • Resources
    • References
    • Documentation
    • Documentation Source Code
    • Comment or Report an issue for Documentation

    • pmotools-python
    • pmotools-python Source Code
    • Comment or Report an issue for pmotools-python

Contents

  • Read in data
    • Adding in specimen meta information

Updating a PMO with new metadata

  • Show All Code
  • Hide All Code

In this tutorial we will read a PMO we previously created and update it with extra metadata. This example uses data downloaded from the SRA associated with the following study:

Furstenau, T. N., Whealy, R., Timm, S., Roberts, A., Maltinsky, S., Wells, S. J., Drake, K., Ross, A., Bolduc, C., Pearson, T., & Fofanov, V. Y. (2025). High-throughput targeted amplicon screening tool for characterizing intrahost diversity in Staphylococcus aureus directly from sample. Microbial Genomics, 11(6). https://doi.org/10.1099/mgen.0.001427

Code
import pandas as pd
from pmotools.pmo_engine.pmo_writer import * 
from pmotools.pmo_engine.pmo_reader import PMOReader
from pmotools.pmo_builder.metatable_to_pmo import specimen_info_table_to_pmo
from pmotools.pmo_builder.pmo_updater import PMOUpdater

Read in data

First we read in a PMO that we previously created on the Create minimal pmo page.

Code
staph_aureus_pmo = PMOReader.read_in_pmo("minimum_Furstenau2025_new_names_PMO.json.gz")

Adding in specimen meta information

Right now the specimen_info is simply the specimen_name, but we can add additional metadata. For example, if we just look at the first sample we can see it only has the specimen_name field completed

Code
staph_aureus_pmo["specimen_info"][0]
{'specimen_name': '85b498-Wk16-Nasal'}

First we load the metadata from SRA. There are several columns in the SRA/ENA metadata but for now we use as example the geographic location country and the date of collection collection_date.

Code
sra_info = pd.read_csv("sra_info_table.tsv", sep = '\t')
sra_meta_of_interest = sra_info[['sample_alias', 'country', 'collection_date']].drop_duplicates()
sra_meta_of_interest.head()
sample_alias country collection_date
0 2b2068n1 USA: Arizona 2019
2 2a4023n1 USA: Arizona 2019
3 2b4022n1 USA: Arizona 2019
5 2b3034n1 USA: Arizona 2019
9 2b2048n1 USA: Arizona 2019

We will want a field for country and state separately, so we will create new columns by splitting the existing country column on “:”

Code
sra_meta_of_interest[['country_only', 'state']] = sra_meta_of_interest['country'].str.split(':', expand=True)
sra_meta_of_interest.head()
sample_alias country collection_date country_only state
0 2b2068n1 USA: Arizona 2019 USA Arizona
2 2a4023n1 USA: Arizona 2019 USA Arizona
3 2b4022n1 USA: Arizona 2019 USA Arizona
5 2b3034n1 USA: Arizona 2019 USA Arizona
9 2b2048n1 USA: Arizona 2019 USA Arizona

Now build the specimen metadata section of PMO using specimen_info_table_to_pmo

Code
pmo_spec_info = specimen_info_table_to_pmo(
                            sra_meta_of_interest, 
                            specimen_name_col='sample_alias',
                            collection_date_col='collection_date',
                            collection_country_col='country_only',
                            geo_admin1_col='state',
                           )

Now merge this information into our PMO data using PMOUpdater.merge_dicts_by_key

Code
from pmotools.pmo_builder.pmo_updater import PMOUpdater

staph_aureus_pmo["specimen_info"] = PMOUpdater.merge_dicts_by_key(
    staph_aureus_pmo["specimen_info"],
    pmo_spec_info, 
    key_field="specimen_name")

Now we can see the specimen_info has metadata (we just show the first specimen in this example)

Code
staph_aureus_pmo["specimen_info"][0]
{'specimen_name': '85b498-Wk16-Nasal',
 'collection_date': '2022-02-07',
 'collection_country': 'USA',
 'geo_admin1': ' Phoenix'}

Now let’s write our PMO to a file

Code
pmowriter = PMOWriter()
pmowriter.write_out_pmo(staph_aureus_pmo, "minimum_Furstenau2025_PMO_with_spec_meta.json.gz", overwrite=True)

We can validate the file using the command line functionality in pmotools

Code
!pmotools-python validate_pmo --pmo minimum_Furstenau2025_PMO_with_spec_meta.json.gz --jsonschema_version 1.1.0
 

A PlasmoGenEpi project