cmapBQ

cmapBQ.config module

class cmapBQ.config.Configuration(credentials: str, tables: cmapBQ.config.TableDirectory)

Data class for configuration of cmapBQ. Object for config.txt

credentials: str
tables: cmapBQ.config.TableDirectory
class cmapBQ.config.TableDirectory(compoundinfo: str, genetic_pertinfo: str, geneinfo: str, cellinfo: str, instinfo: str, siginfo: str, level3: str, level4: str, level5: str)
cellinfo: str
compoundinfo: str
geneinfo: str
genetic_pertinfo: str
instinfo: str
level3: str
level4: str
level5: str
siginfo: str
cmapBQ.config.get_bq_client(config=None)

Return authenticated BigQuery client object.

Parameters

config – optional path to config if not default

Returns

BigQuery Client

cmapBQ.config.get_default_config()

Get configuration object from reading ~/.cmapBQ/config.txt

Returns

cmapBQ.config.Configuration class.

cmapBQ.config.set_default_config(input_config_path)

Change configuration in ~/.cmapBQ to input config path. Overwrites ~/.cmapBQ/config.txt.,

Parameters

input_config_path – valid YAML formatted config file

Returns

location in ~/.cmapBQ

cmapBQ.config.setup_credentials(path_to_credentials)

Setup script for pointing config.txt to a GOOGLE_APPLICATION_CREDENTIALS JSON key. Writes default tables if ~/.cmapBQ/config.txt does not exist.

Parameters

path_to_credentials

Returns

None (side effect)

cmapBQ.query module

cmapBQ.query.cmap_cell(client, cell_iname=None, cell_alias=None, ccle_name=None, primary_disease=None, cell_lineage=None, cell_type=None, table=None, verbose=False)

Query cellinfo table

Parameters
  • client – Bigquery Client

  • cell_iname – List of cell_inames

  • cell_alias – List of cell aliases

  • ccle_name – List of ccle_names

  • primary_disease – List of primary_diseases

  • cell_lineage – List of cell_lineages

  • cell_type – List of cell_types

  • table – table to query. This by default points to the siginfo table and normally should not be changed.

  • verbose – Print query and table address.

Returns

Pandas DataFrame

cmapBQ.query.cmap_compounds(client, pert_id=None, cmap_name=None, moa=None, target=None, compound_aliases=None, limit=None, verbose=False)

Query compoundinfo table for various field by providing lists of compounds, moa, targets, etc. ‘AND’ operator used for multiple conditions.

Parameters
  • client – BigQuery Client

  • pert_id – List of pert_ids

  • cmap_name – List of cmap_names

  • target – List of targets

  • moa – List of MoAs

  • compound_aliases – List of compound aliases

  • limit – Maximum number of rows to return

  • verbose – Print query and table address.

Returns

Pandas Dataframe matching queries

cmapBQ.query.cmap_genes(client, gene_id=None, gene_symbol=None, ensembl_id=None, gene_title=None, gene_type=None, feature_space='landmark', src=None, table=None, verbose=False)

Query geneinfo table. Geneinfo contains information about genes including ids, symbols, types, ensembl_ids, etc.

Parameters
  • client – Bigquery Client

  • gene_id – list of gene_ids

  • gene_symbol – list of gene_symbols

  • ensembl_id – list of ensembl_ids

  • gene_title – list of gene_titles

  • gene_type – list of gene_types

  • feature_space

    Common featurespaces to extract. ‘rid’ overrides selection

    Choices: [‘landmark’, ‘bing’, ‘aig’]

    landmark: 978 landmark genes

    bing: Best-inferred set of 10,174 genes

    aig: All inferred genes including 12,328 genes

    Default is landmark.

  • src – list of gene sources

  • table – table to query. This by default points to the siginfo table and normally should not be changed.

  • verbose – Print query and table address.

Returns

Pandas DataFrame

cmapBQ.query.cmap_genetic_perts(client, pert_id=None, cmap_name=None, gene_id=None, gene_title=None, ensemble_id=None, table=None, verbose=False)

Query genetic_pertinfo table

Parameters
  • client – Bigquery Client

  • pert_id – List of pert_ids

  • cmap_name – List of cmap_names

  • gene_id – List of type INTEGER corresponding to gene_ids

  • gene_title – List of gene_titles

  • ensemble_id – List of ensumble_ids

  • table – table to query. This by default points to the siginfo table and normally should not be changed.

  • verbose – Print query and table address.

Returns

cmapBQ.query.cmap_matrix(client, data_level='level5', feature_space='landmark', rid=None, cid=None, verbose=False, chunk_size=1000, table=None, limit=4000)

Query for numerical data for signature-gene level data.

Parameters
  • client – Bigquery Client

  • data_level – Data level requested. IDs from siginfo file correspond to ‘level5’. Ids from instinfo are available in ‘level3’ and ‘level4’. Choices are [‘level5’, ‘level4’, ‘level3’]

  • rid – Row ids

  • cid – Column ids

  • feature_space

    Common featurespaces to extract. ‘rid’ overrides selection

    Choices: [‘landmark’, ‘bing’, ‘aig’]

    landmark: 978 landmark genes

    bing: Best-inferred set of 10,174 genes

    aig: All inferred genes including 12,328 genes

    Default is landmark.

  • chunk_size – Runs queries in stages to avoid query character limit. Default 1,000

  • limit – Soft limit for number of signatures allowed. Default is 4,000.

  • table – Table address to query. Overrides ‘data_level’ parameter. Generally should not be used.

  • verbose – Print query and table address.

Returns

GCToo object

cmapBQ.query.cmap_profiles(client, sample_id=None, pert_id=None, pert_type=None, cmap_name=None, cell_iname=None, det_plate=None, build_name=None, return_fields='priority', limit=None, table=None, verbose=False)

Query per sample metadata, corresponds to level 3 and level 4 data, AND operator used for multiple conditions.

Parameters
  • client – Bigquery client

  • sample_id – list of sample_ids

  • pert_id – list of pert_ids

  • pert_type – list of pert_types. Avoid using only this parameter as the return could be very large.

  • cmap_name – list of cmap_names

  • det_plate – list of det_plates

  • build_name – list of builds

  • return_fields – [‘priority’, ‘all’]

  • limit – Maximum number of rows to return

  • table – table to query. This by default points to the siginfo table and normally should not be changed.

  • verbose – Print query and table address.

Returns

Pandas Dataframe

cmapBQ.query.cmap_sig(client, sig_id=None, pert_id=None, pert_type=None, cmap_name=None, cell_iname=None, det_plates=None, build_name=None, return_fields='priority', limit=None, table=None, verbose=False)

Query level 5 metadata table. Multiple parameters are filtered using the ‘AND’ operator

Parameters
  • client – Bigquery Client

  • sig_id – list of sig_ids

  • pert_id – list of pert_ids

  • pert_type – list of pert_types. Avoid using only this parameter as the return could be very large.

  • cmap_name – list of cmap_name, formerly pert_iname

  • cell_iname – list of cell names

  • det_plates – list of det_plates. det_plates values are the concatenation of values from

instinfo det_plate field with the ‘|’ delimiter used. :param build_name: list of builds :param return_fields: [‘priority’, ‘all’] :param limit: Maximum number of rows to return :param table: table to query. This by default points to the level 5 siginfo table and normally should not be changed. :param verbose: Print query and table address. :return: Pandas Dataframe

cmapBQ.query.get_bq_client()

Return authenticated BigQuery client object.

Parameters

config – optional path to config if not default

Returns

BigQuery Client

cmapBQ.query.get_table_info(client, table_id)

Query a table address within client’s permissions for schema.

Parameters
  • client – Bigquery Client

  • table_id – table address as {dataset}.{table_id}

Returns

Pandas Dataframe of column names. Note: Not all column names are query-able but all will be returned from a given metadata table

cmapBQ.query.list_cmap_compounds(client)

List available compounds

Parameters

client – BigQuery Client

Returns

Single column Dataframe of compounds

cmapBQ.query.list_cmap_moas(client)

List available MoAs

Parameters

client – BigQuery Client

Returns

Single column Dataframe of MoAs

cmapBQ.query.list_cmap_targets(client)

List available targets

Parameters

client – BigQuery Client

Returns

Pandas DataFrame

cmapBQ.query.list_tables()

Print table addresses. Comes from defaults in config.

Returns

None

cmapBQ.query.run_query(client, query)

Runs BigQuery queryjob

Parameters
  • client – BigQuery client object

  • query – Query to run as a string

Returns

QueryJob object

cmapBQ.utils module

cmapBQ.utils.csv_to_gctx(filepaths, outpath, use_gctx=True)

Convert list of csv files to gctx. CSVs must have ‘rid’, ‘cid’ and ‘value’ columns No other columns or metadata is preserved.

Parameters
  • filepaths – List of paths to CSVs

  • outpath – output directory of file

  • use_gctx – use GCTX HDF5 format. Default is True

Returns

cmapBQ.utils.long_to_gctx(df)

Converts long csv table to GCToo Object. Dataframe must have ‘rid’, ‘cid’ and ‘value’ columns No other columns or metadata is preserved.

Parameters

df – Long form pandas DataFrame

Returns

GCToo object