In this tutorial we will show an example of using pmotools-python to generate a PMO that only contains the minimal information. We will use example data downloaded from the SRA associated with the following study:
Furstenau, T. N., Whealy, R., Timm, S., Roberts, A., Maltinsky, S., Wells, S. J., Drake, K., Ross, A., Bolduc, C., Pearson, T., & Fofanov, V. Y. (2025). High-throughput targeted amplicon screening tool for characterizing intrahost diversity in Staphylococcus aureus directly from sample. Microbial Genomics, 11(6). https://doi.org/10.1099/mgen.0.001427
The minimum amount of information needed to create a PMO is the microhaplotype data and information on the panel used (at a minimum, the target’s primers). The files we will use from this study are:
allele_data.tsv.gz - results of microhaplotype called data
Furstenau2025_primers.tsv - primers used in the experiment
Adding a key to set a specimen_name for library_sample_name
If we only supply the panel and microhaplotype information (like in the example above) the specimen_names and library_sample_names are auto generated from microhaplotype data. The with specimen_names will be identical to the libary_sample_names. This can be changed by supplying a table that supplies a specimen_name to be used for each library_sample_name. Here we use another file we generated from SRA metadata for this dataset:
sra_info_table.tsv - this has the SRA/ENA meta information
Below we show how the current specimen_name and library_sample_name relate to each other in the PMO we generated above. Notice that we use a function from pmotools to export this information into a table to look at
Code
from pmotools.pmo_engine.pmo_exporter import PMOExporterlib_to_spec_df = PMOExporter.list_library_sample_names_per_specimen_name(staph_aureus_pmo)lib_to_spec_df.head()
specimen_name
library_sample_name
library_sample_count
0
SRR30825770
SRR30825770
1
1
SRR30825771
SRR30825771
1
2
SRR30825772
SRR30825772
1
3
SRR30825773
SRR30825773
1
4
SRR30825774
SRR30825774
1
We want to change how these are generateds, so we first read in the SRA information
We convert this into a dictionary we can use in pmotools
Code
# create a dictionary keylib_to_spec_key = sra_info.set_index('run_accession')['sample_alias'].to_dict()
now we build our tables with this information
Code
# supply key when building library_sample_info and specimen_info library_sample_and_spec_renamed_infos = create_minimum_library_specimen_dict_from_mhap_table( pmo_mhaps["detected_microhaplotypes"], panel_name ="staph_aureus_Furstenau2025", library_sample_specimen_key = lib_to_spec_key)
Finally, we can use this information along with the sections we already generated above (pmo_panel_and_target_info and pmo_mhaps) to merge into a new pmo
Code
# now build with renamed staph_aureus_pmo_renamed = merge_to_pmo( specimen_info = library_sample_and_spec_renamed_infos["specimen_info"], library_sample_info = library_sample_and_spec_renamed_infos["library_sample_info"], panel_target_info = pmo_panel_and_target_info, mhap_info = pmo_mhaps)
Again, we can validate that this PMO complies with the schema
Code
checker.validate_pmo_json(staph_aureus_pmo)
Below we can see that our library and sample names are different