pmotools.pmo_builder.metatable_to_pmo module

pmotools.pmo_builder.metatable_to_pmo.add_parasite_density_info(parasite_density_col, parasite_density_method_col, meta_json, df, specimen_name_col, entry_name)[source]

pmotools.pmo_builder.metatable_to_pmo.add_plate_info(plate_col_col, plate_name_col, plate_row_col, plate_position_col, meta_json, df, specimen_name_col, entry_name='plate_info')[source]

pmotools.pmo_builder.metatable_to_pmo.check_columns_exist(df, columns)[source]

pmotools.pmo_builder.metatable_to_pmo.check_unique_columns(columns)[source]

pmotools.pmo_builder.metatable_to_pmo.library_sample_info_table_to_pmo(contents: DataFrame, library_sample_name_col: str = 'library_sample_name', specimen_name_col: str = 'specimen_name', panel_name_col: str = 'panel_name', sequencing_info_name_col: str = None, alternate_identifiers_col: str = None, experiment_accession_col: str = None, fastqs_loc_col: str = None, library_prep_plate_name_col: str = None, library_prep_plate_col_col: str = None, library_prep_plate_row_col: str = None, library_prep_plate_position_col: str = None, parasite_density_col: str = None, parasite_density_method_col: str = None, run_accession_col: str = None, additional_library_sample_info_cols: list | None = None, list_values_library_values: list | None = ['alternate_identifiers'], list_values_library_values_delimiter: str = ',')[source]

Convert a DataFrame containing library information into JSON.

Parameters:

contents (pd.DataFrame) – input DataFrame containing library data
library_sample_name_col (str) – column name for library sample names. Default: library_sample_name
specimen_name_col (str) – column name for specimen names. Default: specimen_name
panel_name_col (str) – column name for panel names. Default: panel_name
sequencing_info_name_col (str, optional) – column name for sequencing information names
alternate_identifiers_col (str, optional) – column name for alternate identifiers
experiment_accession_col (str, optional) – column name for experiment accession information
fastqs_loc_col (str, optional) – column name for location of fastqs
library_prep_plate_name_col (str, optional) – column name containing plate name for sequencing
library_prep_plate_col_col (str, optional) – column name for the column of the sample on the sequencing plate
library_prep_plate_row_col (str, optional) – column name for the row of the sample on the sequencing plate
library_prep_plate_position_col (str, optional) – column name for position on the sequencing plate (e.g. A01). Can’t be set if library_prep_plate_col_col and library_prep_plate_row_col are specified.
parasite_density_col (str or list of str, optional) – the parasite density in parasites per microliter
parasite_density_method_col (str or list of str, optional) – the method of how the density was obtained. If set, parasite_density_col must also be specified.
run_accession_col (str, optional) – column name for run accession information
additional_library_sample_info_cols (list of str, optional) – additional column names to include
list_values_library_values (list of str, optional) – columns that contain values that could be a list, delimited by list_values_library_values_delimiter
list_values_library_values_delimiter (str) – delimiter between list_values_library_values. Default: ‘,’

Returns:

JSON format where keys are library_sample_id and values are corresponding row data

Return type:

dict

pmotools.pmo_builder.metatable_to_pmo.pandas_table_to_json(contents: DataFrame, return_indexed_dict: bool = False)[source]

Convert a pandas dataframe table into a json dictionary, if there is an index column create a dictionary with the keys being the index

Parameters:

contents – the dataframe to be converted
return_indexed_dict – whether to return an indexed dictionary

Returns:

a dictionary of the input table data

pmotools.pmo_builder.metatable_to_pmo.specimen_info_table_to_pmo(contents: DataFrame, specimen_name_col: str = 'specimen_name', specimen_taxon_id_col: str = None, host_taxon_id_col: str = None, collection_date_col: str = None, collection_country_col: str = None, project_name_col: str = None, alternate_identifiers_col: str = None, blood_meal_col: str = None, drug_usage_col: str = None, env_broad_scale_col: str = None, env_local_scale_col: str = None, env_medium_col: str = None, geo_admin1_col: str = None, geo_admin2_col: str = None, geo_admin3_col: str = None, gravid_col: str = None, gravidity_col: str = None, has_travel_out_six_month_col: str = None, host_age_col: str = None, host_sex_col: str = None, host_subject_id: str = None, lat_lon_col: str = None, parasite_density_col: str = None, parasite_density_method_col: str = None, specimen_accession_col: str = None, storage_plate_col_col: str = None, storage_plate_name_col: str = None, storage_plate_row_col: str = None, storage_plate_position_col: str = None, specimen_collect_device_col: str = None, specimen_comments_col: str = None, specimen_store_loc_col: str = None, specimen_type_col: str = None, treatment_status_col: str = None, additional_specimen_cols: list | None = None, list_values_specimen_values: list | None = ['alternate_identifiers', 'drug_usage', 'specimen_comments', 'treatment_status', 'specimen_taxon_id'], list_values_specimen_values_delimiter: str = ',')[source]

Convert a DataFrame containing specimen information into JSON.

Parameters:

contents (pd.DataFrame) – the input DataFrame containing specimen data
specimen_name_col (str) – the column name for specimen names. Default: specimen_name
specimen_taxon_id_col (str, optional) – NCBI taxonomy number of the organism
host_taxon_id_col (str, optional) – NCBI taxonomy number of the host
collection_date_col (str, optional) – date of the sample collection
collection_country_col (str, optional) – name of country collected in (admin level 0)
project_name_col (str, optional) – name of the project
alternate_identifiers_col (str, optional) – list of optional alternative names for the samples
blood_meal_col (str, optional) – whether the host specimen has had a recent blood meal
drug_usage_col (str, optional) – any drug used by the subject and the frequency of usage; can include multiple drugs used
env_broad_scale_col (str, optional) – the broad environment from which the specimen was collected
env_local_scale_col (str, optional) – the local environment from which the specimen was collected
env_medium_col (str, optional) – the environment medium from which the specimen was collected
geo_admin1_col (str, optional) – geographical admin level 1
geo_admin2_col (str, optional) – geographical admin level 2
geo_admin3_col (str, optional) – geographical admin level 3
gravid_col (str, optional) – whether the host specimen is pregnant
gravidity_col (str, optional) – the number of previous pregnancies
has_travel_out_six_month_col (str, optional) – whether the host specimen has travelled out from the local region in the last six months
host_age_col (str, optional) – the age in years of the person
host_sex_col (str, optional) – if the specimen is from a person, the sex of that person
host_subject_id (str, optional) – ID for the individual a specimen was collected from
lat_lon_col (str, optional) – latitude and longitude of the collection site
parasite_density_col (str or list of str, optional) – the parasite density in parasites per microliter
parasite_density_method_col (str or list of str, optional) – the method of how the density was obtained. If set, parasite_density_col must also be specified.
specimen_accession_col (str, optional) – the accession number of the specimen
storage_plate_col_col (str, optional) – column the specimen was in on the plate. If set, storage_plate_row_col must also be specified.
storage_plate_name_col (str, optional) – name of the plate the specimen was in
storage_plate_row_col (str, optional) – row the specimen was in on the plate. If set, storage_plate_col_col must also be specified.
storage_plate_position_col (str, optional) – position of the specimen on the plate (e.g. A01). Can’t be set if storage_plate_col_col and storage_plate_row_col are specified.
specimen_collect_device_col (str, optional) – the way the specimen was collected
specimen_comments_col (str, optional) – additional comments about the specimen
specimen_store_loc_col (str, optional) – specimen storage site
specimen_type_col (str, optional) – type of specimen, e.g. negative_control, positive_control, field_sample
treatment_status_col (str, optional) – if the person has been treated with drugs, what the treatment outcome was
additional_specimen_cols (list of str, optional) – additional column names to include
list_values_specimen_values (list of str, optional) – columns that contain values that could be a list, delimited by list_values_specimen_values_delimiter
list_values_specimen_values_delimiter (str) – delimiter between list_values_specimen_values. Default: ‘,’

Returns:

JSON format where keys are specimen_name and values are corresponding row data

Return type:

dict