pmotools.pmo_builder.metatable_to_pmo module

pmotools.pmo_builder.metatable_to_pmo.add_parasite_density_info(parasite_density_col, parasite_density_method_col, meta_json, df, specimen_name_col, entry_name)[source]
pmotools.pmo_builder.metatable_to_pmo.add_plate_info(plate_col_col, plate_name_col, plate_row_col, plate_position_col, meta_json, df, specimen_name_col, entry_name='plate_info')[source]
pmotools.pmo_builder.metatable_to_pmo.check_columns_exist(df, columns)[source]
pmotools.pmo_builder.metatable_to_pmo.check_unique_columns(columns)[source]
pmotools.pmo_builder.metatable_to_pmo.library_sample_info_table_to_pmo(contents: DataFrame, library_sample_name_col: str = 'library_sample_name', specimen_name_col: str = 'specimen_name', panel_name_col: str = 'panel_name', sequencing_info_name_col: str = None, alternate_identifiers_col: str = None, experiment_accession_col: str = None, fastqs_loc_col: str = None, library_prep_plate_name_col: str = None, library_prep_plate_col_col: str = None, library_prep_plate_row_col: str = None, library_prep_plate_position_col: str = None, parasite_density_col: str = None, parasite_density_method_col: str = None, run_accession_col: str = None, additional_library_sample_info_cols: list | None = None, list_values_library_values: list | None = ['alternate_identifiers'], list_values_library_values_delimiter: str = ',')[source]

Convert a DataFrame containing library information into JSON.

Parameters:
  • contents (pd.DataFrame) – input DataFrame containing library data

  • library_sample_name_col (str) – column name for library sample names. Default: library_sample_name

  • specimen_name_col (str) – column name for specimen names. Default: specimen_name

  • panel_name_col (str) – column name for panel names. Default: panel_name

  • sequencing_info_name_col (str, optional) – column name for sequencing information names

  • alternate_identifiers_col (str, optional) – column name for alternate identifiers

  • experiment_accession_col (str, optional) – column name for experiment accession information

  • fastqs_loc_col (str, optional) – column name for location of fastqs

  • library_prep_plate_name_col (str, optional) – column name containing plate name for sequencing

  • library_prep_plate_col_col (str, optional) – column name for the column of the sample on the sequencing plate

  • library_prep_plate_row_col (str, optional) – column name for the row of the sample on the sequencing plate

  • library_prep_plate_position_col (str, optional) – column name for position on the sequencing plate (e.g. A01). Can’t be set if library_prep_plate_col_col and library_prep_plate_row_col are specified.

  • parasite_density_col (str or list of str, optional) – the parasite density in parasites per microliter

  • parasite_density_method_col (str or list of str, optional) – the method of how the density was obtained. If set, parasite_density_col must also be specified.

  • run_accession_col (str, optional) – column name for run accession information

  • additional_library_sample_info_cols (list of str, optional) – additional column names to include

  • list_values_library_values (list of str, optional) – columns that contain values that could be a list, delimited by list_values_library_values_delimiter

  • list_values_library_values_delimiter (str) – delimiter between list_values_library_values. Default: ‘,’

Returns:

JSON format where keys are library_sample_id and values are corresponding row data

Return type:

dict

pmotools.pmo_builder.metatable_to_pmo.pandas_table_to_json(contents: DataFrame, return_indexed_dict: bool = False)[source]

Convert a pandas dataframe table into a json dictionary, if there is an index column create a dictionary with the keys being the index

Parameters:
  • contents – the dataframe to be converted

  • return_indexed_dict – whether to return an indexed dictionary

Returns:

a dictionary of the input table data

pmotools.pmo_builder.metatable_to_pmo.specimen_info_table_to_pmo(contents: DataFrame, specimen_name_col: str = 'specimen_name', specimen_taxon_id_col: str = None, host_taxon_id_col: str = None, collection_date_col: str = None, collection_country_col: str = None, project_name_col: str = None, alternate_identifiers_col: str = None, blood_meal_col: str = None, drug_usage_col: str = None, env_broad_scale_col: str = None, env_local_scale_col: str = None, env_medium_col: str = None, geo_admin1_col: str = None, geo_admin2_col: str = None, geo_admin3_col: str = None, gravid_col: str = None, gravidity_col: str = None, has_travel_out_six_month_col: str = None, host_age_col: str = None, host_sex_col: str = None, host_subject_id: str = None, lat_lon_col: str = None, parasite_density_col: str = None, parasite_density_method_col: str = None, specimen_accession_col: str = None, storage_plate_col_col: str = None, storage_plate_name_col: str = None, storage_plate_row_col: str = None, storage_plate_position_col: str = None, specimen_collect_device_col: str = None, specimen_comments_col: str = None, specimen_store_loc_col: str = None, specimen_type_col: str = None, treatment_status_col: str = None, additional_specimen_cols: list | None = None, list_values_specimen_values: list | None = ['alternate_identifiers', 'drug_usage', 'specimen_comments', 'treatment_status', 'specimen_taxon_id'], list_values_specimen_values_delimiter: str = ',')[source]

Convert a DataFrame containing specimen information into JSON.

Parameters:
  • contents (pd.DataFrame) – the input DataFrame containing specimen data

  • specimen_name_col (str) – the column name for specimen names. Default: specimen_name

  • specimen_taxon_id_col (str, optional) – NCBI taxonomy number of the organism

  • host_taxon_id_col (str, optional) – NCBI taxonomy number of the host

  • collection_date_col (str, optional) – date of the sample collection

  • collection_country_col (str, optional) – name of country collected in (admin level 0)

  • project_name_col (str, optional) – name of the project

  • alternate_identifiers_col (str, optional) – list of optional alternative names for the samples

  • blood_meal_col (str, optional) – whether the host specimen has had a recent blood meal

  • drug_usage_col (str, optional) – any drug used by the subject and the frequency of usage; can include multiple drugs used

  • env_broad_scale_col (str, optional) – the broad environment from which the specimen was collected

  • env_local_scale_col (str, optional) – the local environment from which the specimen was collected

  • env_medium_col (str, optional) – the environment medium from which the specimen was collected

  • geo_admin1_col (str, optional) – geographical admin level 1

  • geo_admin2_col (str, optional) – geographical admin level 2

  • geo_admin3_col (str, optional) – geographical admin level 3

  • gravid_col (str, optional) – whether the host specimen is pregnant

  • gravidity_col (str, optional) – the number of previous pregnancies

  • has_travel_out_six_month_col (str, optional) – whether the host specimen has travelled out from the local region in the last six months

  • host_age_col (str, optional) – the age in years of the person

  • host_sex_col (str, optional) – if the specimen is from a person, the sex of that person

  • host_subject_id (str, optional) – ID for the individual a specimen was collected from

  • lat_lon_col (str, optional) – latitude and longitude of the collection site

  • parasite_density_col (str or list of str, optional) – the parasite density in parasites per microliter

  • parasite_density_method_col (str or list of str, optional) – the method of how the density was obtained. If set, parasite_density_col must also be specified.

  • specimen_accession_col (str, optional) – the accession number of the specimen

  • storage_plate_col_col (str, optional) – column the specimen was in on the plate. If set, storage_plate_row_col must also be specified.

  • storage_plate_name_col (str, optional) – name of the plate the specimen was in

  • storage_plate_row_col (str, optional) – row the specimen was in on the plate. If set, storage_plate_col_col must also be specified.

  • storage_plate_position_col (str, optional) – position of the specimen on the plate (e.g. A01). Can’t be set if storage_plate_col_col and storage_plate_row_col are specified.

  • specimen_collect_device_col (str, optional) – the way the specimen was collected

  • specimen_comments_col (str, optional) – additional comments about the specimen

  • specimen_store_loc_col (str, optional) – specimen storage site

  • specimen_type_col (str, optional) – type of specimen, e.g. negative_control, positive_control, field_sample

  • treatment_status_col (str, optional) – if the person has been treated with drugs, what the treatment outcome was

  • additional_specimen_cols (list of str, optional) – additional column names to include

  • list_values_specimen_values (list of str, optional) – columns that contain values that could be a list, delimited by list_values_specimen_values_delimiter

  • list_values_specimen_values_delimiter (str) – delimiter between list_values_specimen_values. Default: ‘,’

Returns:

JSON format where keys are specimen_name and values are corresponding row data

Return type:

dict