Updating a PMO with new metadata

In this tutorial we will read a PMO we previously created and update it with extra metadata. This example uses data downloaded from the SRA associated with the following study:

Furstenau, T. N., Whealy, R., Timm, S., Roberts, A., Maltinsky, S., Wells, S. J., Drake, K., Ross, A., Bolduc, C., Pearson, T., & Fofanov, V. Y. (2025). High-throughput targeted amplicon screening tool for characterizing intrahost diversity in Staphylococcus aureus directly from sample. Microbial Genomics, 11(6). https://doi.org/10.1099/mgen.0.001427

Code

import pandas as pd
from pmotools.pmo_engine.pmo_writer import * 
from pmotools.pmo_engine.pmo_reader import PMOReader
from pmotools.pmo_builder.metatable_to_pmo import specimen_info_table_to_pmo
from pmotools.pmo_builder.pmo_updater import PMOUpdater

Read in data

First we read in a PMO that we previously created on the Create minimal pmo page.

Code

staph_aureus_pmo = PMOReader.read_in_pmo("minimum_Furstenau2025_new_names_PMO.json.gz")

Adding in specimen meta information

Right now the specimen_info is simply the specimen_name, but we can add additional metadata. For example, if we just look at the first sample we can see it only has the specimen_name field completed

Code

staph_aureus_pmo["specimen_info"][0]

{'specimen_name': '85b498-Wk16-Nasal'}

First we load the metadata from SRA. There are several columns in the SRA/ENA metadata but for now we use as example the geographic location country and the date of collection collection_date.

Code

sra_info = pd.read_csv("sra_info_table.tsv", sep = '\t')
sra_meta_of_interest = sra_info[['sample_alias', 'country', 'collection_date']].drop_duplicates()
sra_meta_of_interest.head()

	sample_alias	country	collection_date
0	2b2068n1	USA: Arizona	2019
2	2a4023n1	USA: Arizona	2019
3	2b4022n1	USA: Arizona	2019
5	2b3034n1	USA: Arizona	2019
9	2b2048n1	USA: Arizona	2019

We will want a field for country and state separately, so we will create new columns by splitting the existing country column on “:”

Code

sra_meta_of_interest[['country_only', 'state']] = sra_meta_of_interest['country'].str.split(':', expand=True)
sra_meta_of_interest.head()

	sample_alias	country	collection_date	country_only	state
0	2b2068n1	USA: Arizona	2019	USA	Arizona
2	2a4023n1	USA: Arizona	2019	USA	Arizona
3	2b4022n1	USA: Arizona	2019	USA	Arizona
5	2b3034n1	USA: Arizona	2019	USA	Arizona
9	2b2048n1	USA: Arizona	2019	USA	Arizona

Now build the specimen metadata section of PMO using specimen_info_table_to_pmo

Code

pmo_spec_info = specimen_info_table_to_pmo(
                            sra_meta_of_interest, 
                            specimen_name_col='sample_alias',
                            collection_date_col='collection_date',
                            collection_country_col='country_only',
                            geo_admin1_col='state',
                           )

Now merge this information into our PMO data using PMOUpdater.merge_dicts_by_key

Code

from pmotools.pmo_builder.pmo_updater import PMOUpdater

staph_aureus_pmo["specimen_info"] = PMOUpdater.merge_dicts_by_key(
    staph_aureus_pmo["specimen_info"],
    pmo_spec_info, 
    key_field="specimen_name")

Now we can see the specimen_info has metadata (we just show the first specimen in this example)

Code

staph_aureus_pmo["specimen_info"][0]

{'specimen_name': '85b498-Wk16-Nasal',
 'collection_date': '2022-02-07',
 'collection_country': 'USA',
 'geo_admin1': ' Phoenix'}

Now let’s write our PMO to a file

Code

pmowriter = PMOWriter()
pmowriter.write_out_pmo(staph_aureus_pmo, "minimum_Furstenau2025_PMO_with_spec_meta.json.gz", overwrite=True)

We can validate the file using the command line functionality in pmotools

Code

!pmotools-python validate_pmo --pmo minimum_Furstenau2025_PMO_with_spec_meta.json.gz --jsonschema_version 1.1.0