Code
pmotools-runner.py
Most of these basic panel info can be found underneath extract_panel_info_from_pmo
pmotools v1.0.0 - A suite of tools for interacting with Portable Microhaplotype Object (pmo) file format
Available functions organized by groups are
convertors_to_json
text_meta_to_json_meta - Convert text file meta to JSON Meta
excel_meta_to_json_meta - Convert excel file meta to JSON Meta
microhaplotype_table_to_json_file - Convert microhaplotype table to JSON Meta
terra_amp_output_to_json - Convert terra output table to JSON seq table
extractors_from_pmo
extract_pmo_with_selected_meta - Extract from PMO samples and associated haplotypes with selected meta
extract_pmo_with_select_specimen_ids - Extract from PMO specific samples from the specimens table
extract_pmo_with_select_experiment_sample_ids - Extract from PMO specific experiment sample ids from the experiment_info table
extract_pmo_with_select_targets - Extract from PMO specific targets
extract_pmo_with_read_filter - Extract from PMO with a read filter
extract_allele_table - Extract allele tables which can be as used as input to such tools as dcifer or moire
working_with_multiple_pmos
combine_pmos - Combine multiple pmos of the same panel into a single pmo
extract_basic_info_from_pmo
list_experiment_sample_ids_per_specimen_id - Each specimen_id can have multiple experiment_sample_ids, list out all in a PMO
list_specimen_meta_fields - List out the specimen meta fields in the specimen_info section
list_tar_amp_bioinformatics_info_ids - List out all the tar_amp_bioinformatics_info_ids in a PMO file
count_specimen_meta - Count the values of specific specimen meta fields in the specimen_info section
count_targets_per_sample - Count the number of targets per sample
count_samples_per_target - Count the number of samples per target
extract_panel_info_from_pmo
extract_insert_of_panels - Extract the insert of panels from a PMO
extract_refseq_of_inserts_of_panels - Extract the ref_seq of panels from a PMO
Getting files for examples
This will extract the insert location of targets of the panel infos out of a PMO and write it out as a bed file
usage: pmotools-runner.py extract_insert_of_panels [-h] --file FILE
[--output OUTPUT]
[--overwrite]
options:
-h, --help show this help message and exit
--file FILE PMO file
--output OUTPUT output file
--overwrite If output file exists, overwrite it
The python code for extract_insert_of_panels
script is below
pmotools-python/scripts/extract_info_from_pmo/extract_insert_of_panels.py
#!/usr/bin/env python3
import os, argparse, json
import sys
from collections import defaultdict
import pandas as pd
from pmotools.pmo_utils.PMOExtractor import PMOExtractor
from pmotools.pmo_utils.PMOReader import PMOReader
from pmotools.utils.small_utils import Utils
def parse_args_extract_insert_of_panels():
parser = argparse.ArgumentParser()
parser.add_argument('--file', type=str, required=True, help='PMO file')
parser.add_argument('--output', type=str, default="STDOUT", required=False, help='output file')
parser.add_argument('--overwrite', action = 'store_true', help='If output file exists, overwrite it')
return parser.parse_args()
def extract_insert_of_panels():
args = parse_args_extract_insert_of_panels()
# check files
Utils.inputOutputFileCheck(args.file, args.output, args.overwrite)
# read in PMO
pmo = PMOReader.read_in_pmo(args.file)
# get panel insert locations
panel_bed_locs = PMOExtractor.extract_panels_insert_bed_loc(pmo)
# write
output_target = sys.stdout if args.output == "STDOUT" else open(args.output, "w")
with output_target as f:
f.write("\t".join(["#chrom", "start", "end", "name", "length", "strand", "extra_info"]) + "\n")
for panel_id, bed_locs in panel_bed_locs.items():
for loc in bed_locs:
f.write("\t".join([loc.chrom, str(loc.start), str(loc.end), loc.name, str(loc.score), loc.strand, loc.extra_info]) + "\n")
if __name__ == "__main__":
extract_insert_of_panels()
#chrom start end name length strand extra_info
Pf3D7_01_v3 145449 145622 t1 173 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_02_v3 109807 109982 t10 175 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_14_v3 3214351 3214478 t100 127 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_02_v3 278165 278336 t11 171 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_02_v3 470492 470676 t12 184 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_02_v3 805822 805942 t13 120 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_03_v3 85440 85646 t14 206 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_03_v3 141963 142181 t15 218 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_03_v3 221363 221495 t16 132 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_03_v3 618396 618581 t17 185 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_03_v3 654002 654175 t18 173 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_03_v3 850816 850989 t19 173 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_01_v3 179903 180115 t2 212 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 109912 110087 t20 175 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 133491 133701 t21 210 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 141778 141945 t22 167 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 415653 415826 t23 173 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 544718 544861 t24 143 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 748230 748436 t25 206 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 748533 748696 t26 163 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 802525 802713 t27 188 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 1037634 1037844 t28 210 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 1100656 1100831 t29 175 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_01_v3 181557 181673 t3 116 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 1102389 1102578 t30 189 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 1113450 1113604 t31 154 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_04_v3 1128489 1128673 t32 184 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_05_v3 329378 329550 t33 172 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_05_v3 958059 958221 t34 162 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_05_v3 958389 958506 t35 117 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_05_v3 1042162 1042281 t36 119 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_05_v3 1309609 1309744 t37 135 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_06_v3 145343 145501 t38 158 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_06_v3 532195 532378 t39 183 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_01_v3 495971 496143 t4 172 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_07_v3 165235 165422 t40 187 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_07_v3 166035 166167 t41 132 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_07_v3 298886 299005 t42 119 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_07_v3 729975 730088 t43 113 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_07_v3 1149415 1149585 t44 170 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_07_v3 1358694 1358911 t45 217 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_08_v3 102326 102500 t46 174 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_08_v3 336468 336647 t47 179 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_08_v3 339168 339357 t48 189 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_08_v3 549993 550218 t49 225 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_01_v3 512199 512388 t5 189 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_08_v3 933023 933143 t50 120 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_08_v3 1269320 1269456 t51 136 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_08_v3 1344686 1344819 t52 133 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_08_v3 1362891 1363087 t53 196 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_09_v3 516928 517092 t54 164 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_09_v3 596133 596334 t55 201 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_09_v3 685601 685792 t56 191 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_09_v3 1178894 1179078 t57 184 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_09_v3 1406405 1406541 t58 136 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_09_v3 1437114 1437303 t59 189 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_01_v3 531682 531900 t6 218 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_10_v3 377095 377209 t60 114 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_10_v3 992371 992544 t61 173 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_10_v3 1386700 1386869 t62 169 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_10_v3 1399544 1399711 t63 167 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_10_v3 1436479 1436682 t64 203 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_11_v3 119486 119693 t65 207 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_11_v3 1009856 1010038 t66 182 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_11_v3 1018953 1019085 t67 132 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_11_v3 1376185 1376372 t68 187 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_11_v3 1552430 1552640 t69 210 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_01_v3 532690 532844 t7 154 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_11_v3 1750865 1751055 t70 190 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_11_v3 1816211 1816425 t71 214 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_12_v3 63166 63280 t72 114 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_12_v3 659891 660010 t73 119 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_12_v3 684088 684261 t74 173 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_12_v3 943258 943428 t75 170 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_12_v3 1237431 1237603 t76 172 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_12_v3 2050130 2050314 t77 184 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_13_v3 103659 103879 t78 220 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_13_v3 156566 156722 t79 156 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_01_v3 534215 534368 t8 153 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_13_v3 1150303 1150493 t80 190 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_13_v3 1419543 1419670 t81 127 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_13_v3 1725365 1725570 t82 205 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_13_v3 1876352 1876534 t83 182 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_13_v3 2114975 2115142 t84 167 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_13_v3 2124634 2124847 t85 213 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_13_v3 2479086 2479246 t86 160 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_13_v3 2481106 2481288 t87 182 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_13_v3 2669135 2669307 t88 172 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_14_v3 39953 40137 t89 184 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_01_v3 534941 535110 t9 169 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_14_v3 120215 120351 t90 136 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_14_v3 150105 150294 t91 189 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_14_v3 279663 279786 t92 123 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_14_v3 407379 407571 t93 192 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_14_v3 564208 564377 t94 169 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_14_v3 1038369 1038486 t95 117 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_14_v3 1956129 1956286 t96 157 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_14_v3 1992289 1992426 t97 137 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_14_v3 2524962 2525089 t98 127 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
Pf3D7_14_v3 3124642 3124842 t99 200 + [genome_name_version=3D7_2020-09-01;panel_id=heomev1;]
This will extract the reference sequence of the insert location of the targets within the panel info out of a PMO and write it out as a table. The reference sequence is an optional field and so if no reference sequence is loaded then just blanks will be extracted
usage: pmotools-runner.py extract_refseq_of_inserts_of_panels
[-h] --file FILE [--output OUTPUT] [--overwrite]
extract ref_seq of inserts of panels, but if no ref_seq is save in the PMO
will just be blank
options:
-h, --help show this help message and exit
--file FILE PMO file
--output OUTPUT output file
--overwrite If output file exists, overwrite it
The python code for extract_refseq_of_inserts_of_panels
script is below
pmotools-python/scripts/extract_info_from_pmo/extract_refseq_of_inserts_of_panels.py
#!/usr/bin/env python3
import os, argparse, json
import sys
from collections import defaultdict
import pandas as pd
from pmotools.pmo_utils.PMOExtractor import PMOExtractor
from pmotools.pmo_utils.PMOReader import PMOReader
from pmotools.utils.small_utils import Utils
def parse_args_extract_refseq_of_inserts_of_panels():
parser = argparse.ArgumentParser()
parser.add_argument('--file', type=str, required=True, help='PMO file')
parser.add_argument('--output', type=str, default="STDOUT", required=False, help='output file')
parser.add_argument('--overwrite', action = 'store_true', help='If output file exists, overwrite it')
parser.description = "extract ref_seq of inserts of panels, but if no ref_seq is save in the PMO will just be blank"
return parser.parse_args()
def extract_refseq_of_inserts_of_panels():
args = parse_args_extract_refseq_of_inserts_of_panels()
# check files
Utils.inputOutputFileCheck(args.file, args.output, args.overwrite)
# read in PMO
pmo = PMOReader.read_in_pmo(args.file)
# get panel insert locations
panel_bed_locs = PMOExtractor.extract_panels_insert_bed_loc(pmo)
# write
output_target = sys.stdout if args.output == "STDOUT" else open(args.output, "w")
with output_target as f:
f.write("\t".join(["panel_id", "target_id", "ref_seq"]) + "\n")
for panel_id, bed_locs in panel_bed_locs.items():
for loc in bed_locs:
f.write("\t".join([str(panel_id), loc.name, loc.ref_seq]) + "\n")
if __name__ == "__main__":
extract_refseq_of_inserts_of_panels()
panel_id target_id ref_seq
heomev1 t1
heomev1 t10
heomev1 t100
heomev1 t11
heomev1 t12
heomev1 t13
heomev1 t14
heomev1 t15
heomev1 t16
heomev1 t17
heomev1 t18
heomev1 t19
heomev1 t2
heomev1 t20
heomev1 t21
heomev1 t22
heomev1 t23
heomev1 t24
heomev1 t25
heomev1 t26
heomev1 t27
heomev1 t28
heomev1 t29
heomev1 t3
heomev1 t30
heomev1 t31
heomev1 t32
heomev1 t33
heomev1 t34
heomev1 t35
heomev1 t36
heomev1 t37
heomev1 t38
heomev1 t39
heomev1 t4
heomev1 t40
heomev1 t41
heomev1 t42
heomev1 t43
heomev1 t44
heomev1 t45
heomev1 t46
heomev1 t47
heomev1 t48
heomev1 t49
heomev1 t5
heomev1 t50
heomev1 t51
heomev1 t52
heomev1 t53
heomev1 t54
heomev1 t55
heomev1 t56
heomev1 t57
heomev1 t58
heomev1 t59
heomev1 t6
heomev1 t60
heomev1 t61
heomev1 t62
heomev1 t63
heomev1 t64
heomev1 t65
heomev1 t66
heomev1 t67
heomev1 t68
heomev1 t69
heomev1 t7
heomev1 t70
heomev1 t71
heomev1 t72
heomev1 t73
heomev1 t74
heomev1 t75
heomev1 t76
heomev1 t77
heomev1 t78
heomev1 t79
heomev1 t8
heomev1 t80
heomev1 t81
heomev1 t82
heomev1 t83
heomev1 t84
heomev1 t85
heomev1 t86
heomev1 t87
heomev1 t88
heomev1 t89
heomev1 t9
heomev1 t90
heomev1 t91
heomev1 t92
heomev1 t93
heomev1 t94
heomev1 t95
heomev1 t96
heomev1 t97
heomev1 t98
heomev1 t99
---
title: Getting panel info out of PMO using pmotools-python
---
```{r setup, echo=F}
source("../common.R")
```
Most of these basic panel info can be found underneath `extract_panel_info_from_pmo`
```{bash, eval = F}
pmotools-runner.py
```
```{bash, echo = F}
pmotools-runner.py | perl -pe 's/\e\[[0-9;]*m(?:\e\[K)?//g'
```
Getting files for examples
```{bash, eval = F}
cd example
wget https://plasmogenepi.github.io/PMO_Docs/format/moz2018_PMO.json.gz
wget https://plasmogenepi.github.io/PMO_Docs/format/PathWeaverHeome1_PMO.json.gz
```
# Extract insert locations of panels from PMO
This will extract the insert location of targets of the panel infos out of a PMO and write it out as a bed file
```{bash}
pmotools-runner.py extract_insert_of_panels -h
```
The python code for `extract_insert_of_panels` script is below
```{python}
#| echo: true
#| eval: false
#| code-fold: true
#| code-line-numbers: true
#| filename: pmotools-python/scripts/extract_info_from_pmo/extract_insert_of_panels.py
#| file: ../pmotools-python/scripts/extract_info_from_pmo/extract_insert_of_panels.py
```
```{bash}
cd example
pmotools-runner.py extract_insert_of_panels --file ../../format/moz2018_PMO.json.gz
```
```{bash}
cd example
pmotools-runner.py extract_insert_of_panels --file ../../format/moz2018_PMO.json.gz --output moz2018_PMO_panel_insert_locs.bed --overwrite
```
# Extract ref sequences of insert locations of panels from PMO
This will extract the reference sequence of the insert location of the targets within the panel info out of a PMO and write it out as a table. The reference sequence is an optional field and so if no reference sequence is loaded then just blanks will be extracted
```{bash}
pmotools-runner.py extract_refseq_of_inserts_of_panels -h
```
The python code for `extract_refseq_of_inserts_of_panels` script is below
```{python}
#| echo: true
#| eval: false
#| code-fold: true
#| code-line-numbers: true
#| filename: pmotools-python/scripts/extract_info_from_pmo/extract_refseq_of_inserts_of_panels.py
#| file: ../pmotools-python/scripts/extract_info_from_pmo/extract_refseq_of_inserts_of_panels.py
```
```{bash}
cd example
pmotools-runner.py extract_refseq_of_inserts_of_panels --file ../../format/moz2018_PMO.json.gz
```
```{bash}
cd example
pmotools-runner.py extract_refseq_of_inserts_of_panels --file ../../format/moz2018_PMO.json.gz --output moz2018_PMO_panel_ref_seqs.tsv --overwrite
```