Workflow Type: Jupyter
Work-in-progress

BridgeDb tutorial: Gene HGNC name to Ensembl identifier

This tutorial explains how to use the BridgeDb identifier mapping service to translate HGNC names to Ensembl identifiers. This step is part of the OpenRiskNet use case to link Adverse Outcome Pathways to WikiPathways.

First we need to load the Python library to allow calls to the BridgeDb REST webservice:

import requests

Let's assume we're interested in the gene with HGNC MECP2 (FIXME: look up a gene in AOPWiki), the API call to make mappings is given below as callUrl. Here, the H indicates that the query (MECP2) is an HGNC symbol:

callUrl = 'http://bridgedb.prod.openrisknet.org/Human/xrefs/H/MECP2'

The default call returns all identifiers, not just for Ensembl:

response = requests.get(callUrl)
response.text
'GO:0001964\tGeneOntology\nuc065cav.1\tUCSC Genome Browser\n312750\tOMIM\nGO:0042551\tGeneOntology\nuc065car.1\tUCSC Genome Browser\nA0A087X1U4\tUniprot-TrEMBL\n4204\tWikiGenes\nGO:0043524\tGeneOntology\nILMN_1702715\tIllumina\n34355_at\tAffy\nGO:0007268\tGeneOntology\nMECP2\tHGNC\nuc065caz.1\tUCSC Genome Browser\nA_33_P3339036\tAgilent\nGO:0006576\tGeneOntology\nuc065cbg.1\tUCSC Genome Browser\nGO:0006342\tGeneOntology\n300496\tOMIM\nGO:0035176\tGeneOntology\nuc065cbc.1\tUCSC Genome Browser\nGO:0033555\tGeneOntology\nGO:0045892\tGeneOntology\nA_23_P114361\tAgilent\nGO:0045893\tGeneOntology\nENSG00000169057\tEnsembl\nGO:0090063\tGeneOntology\nGO:0005515\tGeneOntology\nGO:0002087\tGeneOntology\nGO:0005634\tGeneOntology\nGO:0007416\tGeneOntology\nGO:0008104\tGeneOntology\nGO:0042826\tGeneOntology\nGO:0007420\tGeneOntology\nGO:0035067\tGeneOntology\n300005\tOMIM\nNP_001104262\tRefSeq\nA0A087WVW7\tUniprot-TrEMBL\nNP_004983\tRefSeq\nGO:0046470\tGeneOntology\nGO:0010385\tGeneOntology\n11722682_at\tAffy\nGO:0051965\tGeneOntology\nNM_001316337\tRefSeq\nuc065caw.1\tUCSC Genome Browser\nA0A0D9SFX7\tUniprot-TrEMBL\nA0A140VKC4\tUniprot-TrEMBL\nGO:0003723\tGeneOntology\nGO:0019233\tGeneOntology\nGO:0001666\tGeneOntology\nGO:0003729\tGeneOntology\nGO:0021591\tGeneOntology\nuc065cas.1\tUCSC Genome Browser\nGO:0019230\tGeneOntology\nGO:0003682\tGeneOntology\nGO:0001662\tGeneOntology\nuc065cbh.1\tUCSC Genome Browser\nX99687_at\tAffy\nGO:0008344\tGeneOntology\nGO:0009791\tGeneOntology\nuc065cbd.1\tUCSC Genome Browser\nGO:0019904\tGeneOntology\nGO:0030182\tGeneOntology\nGO:0035197\tGeneOntology\n8175998\tAffy\nGO:0016358\tGeneOntology\nNM_004992\tRefSeq\nGO:0003714\tGeneOntology\nGO:0005739\tGeneOntology\nGO:0005615\tGeneOntology\nGO:0005737\tGeneOntology\nuc004fjv.3\tUCSC Genome Browser\n202617_s_at\tAffy\nGO:0050905\tGeneOntology\nGO:0008327\tGeneOntology\nD3YJ43\tUniprot-TrEMBL\nGO:0003677\tGeneOntology\nGO:0006541\tGeneOntology\nGO:0040029\tGeneOntology\nA_33_P3317211\tAgilent\nNP_001303266\tRefSeq\n11722683_a_at\tAffy\nGO:0008211\tGeneOntology\nGO:0051151\tGeneOntology\nNM_001110792\tRefSeq\nX89430_at\tAffy\nGO:2000820\tGeneOntology\nuc065cat.1\tUCSC Genome Browser\nGO:0003700\tGeneOntology\nGO:0047485\tGeneOntology\n4204\tEntrez Gene\nGO:0009405\tGeneOntology\nA0A0D9SEX1\tUniprot-TrEMBL\nGO:0098794\tGeneOntology\n3C2I\tPDB\nHs.200716\tUniGene\nGO:0000792\tGeneOntology\nuc065cax.1\tUCSC Genome Browser\n300055\tOMIM\n5BT2\tPDB\nGO:0006020\tGeneOntology\nGO:0031175\tGeneOntology\nuc065cbe.1\tUCSC Genome Browser\nGO:0008284\tGeneOntology\nuc065cba.1\tUCSC Genome Browser\nGO:0060291\tGeneOntology\n202618_s_at\tAffy\nGO:0016573\tGeneOntology\n17115453\tAffy\nA0A1B0GTV0\tUniprot-TrEMBL\nuc065cbi.1\tUCSC Genome Browser\nGO:0048167\tGeneOntology\nGO:0007616\tGeneOntology\nGO:0016571\tGeneOntology\nuc004fjw.3\tUCSC Genome Browser\nGO:0007613\tGeneOntology\nGO:0007612\tGeneOntology\nGO:0021549\tGeneOntology\n11722684_a_at\tAffy\nGO:0001078\tGeneOntology\nX94628_rna1_s_at\tAffy\nGO:0007585\tGeneOntology\nGO:0010468\tGeneOntology\nGO:0031061\tGeneOntology\nA_24_P237486\tAgilent\nGO:0050884\tGeneOntology\nGO:0000930\tGeneOntology\nGO:0005829\tGeneOntology\nuc065cau.1\tUCSC Genome Browser\nH7BY72\tUniprot-TrEMBL\n202616_s_at\tAffy\nGO:0006355\tGeneOntology\nuc065cay.1\tUCSC Genome Browser\nGO:0010971\tGeneOntology\n300673\tOMIM\nGO:0008542\tGeneOntology\nGO:0060079\tGeneOntology\nuc065cbf.1\tUCSC Genome Browser\nGO:0006122\tGeneOntology\nuc065cbb.1\tUCSC Genome Browser\nGO:0007052\tGeneOntology\nC9JH89\tUniprot-TrEMBL\nB5MCB4\tUniprot-TrEMBL\nGO:0032048\tGeneOntology\nGO:0050432\tGeneOntology\nGO:0001976\tGeneOntology\nI6LM39\tUniprot-TrEMBL\nGO:0005813\tGeneOntology\nILMN_1682091\tIllumina\nP51608\tUniprot-TrEMBL\n1QK9\tPDB\nGO:0006349\tGeneOntology\nGO:1900114\tGeneOntology\nGO:0000122\tGeneOntology\nGO:0006351\tGeneOntology\nGO:0008134\tGeneOntology\nILMN_1824898\tIllumina\n300260\tOMIM\n0006510725\tIllumina\n'

You can also see the results are returned as a TSV file, consisting of two columns, the identifier and the matching database.

We will want to convert this reply into a Python dictionary (with the identifier as key, as one database may have multiple identifiers):

lines = response.text.split("\n")
mappings = {}
for line in lines:
    if ('\t' in line):
        tuple = line.split('\t')
        identifier = tuple[0]
        database = tuple[1]
        if (database == "Ensembl"):
            mappings[identifier] = database

print(mappings)
{'ENSG00000169057': 'Ensembl'}

Alternatively, we can restrivct the return values from the BridgeDb webservice to just return Ensembl identifiers (system code En). For this, we add the ?dataSource=En call parameter:

callUrl = 'http://bridgedb-swagger.prod.openrisknet.org/Human/xrefs/H/MECP2?dataSource=En'
response = requests.get(callUrl)
response.text
'ENSG00000169057\tEnsembl\n'

Version History

master @ 0f98fd8 (latest) Created 6th Apr 2022 at 14:12 by Marvin Martens

Update genes.ipynb


Frozen master 0f98fd8

master @ 5f34ac1 Created 6th Apr 2022 at 14:09 by Marvin Martens

added citation info


Frozen master 5f34ac1

master @ 5f34ac1 (earliest) Created 6th Apr 2022 at 14:02 by Marvin Martens

added citation info


Frozen master 5f34ac1
help Creators and Submitter
Creator
Additional credit

Egon Willighagen

Submitter
Citation
Martens, M. (2022). BridgeDb tutorial: Gene HGNC name to Ensembl identifier. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.326.3
Activity

Views: 3147   Downloads: 563

Created: 6th Apr 2022 at 14:02

Last updated: 6th Apr 2022 at 14:09

help Attributions

None

Total size: 116 MB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH