BridgeDb tutorial: Gene HGNC name to Ensembl identifier
This tutorial explains how to use the BridgeDb identifier mapping service to translate HGNC names to Ensembl identifiers. This step is part of the OpenRiskNet use case to link Adverse Outcome Pathways to WikiPathways.
First we need to load the Python library to allow calls to the BridgeDb REST webservice:
import requests
Let's assume we're interested in the gene with HGNC MECP2 (FIXME: look up a gene in AOPWiki), the API call to make mappings is given below as callUrl
. Here, the H
indicates that the query (MECP2
) is an HGNC symbol:
callUrl = 'http://bridgedb.prod.openrisknet.org/Human/xrefs/H/MECP2'
The default call returns all identifiers, not just for Ensembl:
response = requests.get(callUrl)
response.text
'GO:0001964\tGeneOntology\nuc065cav.1\tUCSC Genome Browser\n312750\tOMIM\nGO:0042551\tGeneOntology\nuc065car.1\tUCSC Genome Browser\nA0A087X1U4\tUniprot-TrEMBL\n4204\tWikiGenes\nGO:0043524\tGeneOntology\nILMN_1702715\tIllumina\n34355_at\tAffy\nGO:0007268\tGeneOntology\nMECP2\tHGNC\nuc065caz.1\tUCSC Genome Browser\nA_33_P3339036\tAgilent\nGO:0006576\tGeneOntology\nuc065cbg.1\tUCSC Genome Browser\nGO:0006342\tGeneOntology\n300496\tOMIM\nGO:0035176\tGeneOntology\nuc065cbc.1\tUCSC Genome Browser\nGO:0033555\tGeneOntology\nGO:0045892\tGeneOntology\nA_23_P114361\tAgilent\nGO:0045893\tGeneOntology\nENSG00000169057\tEnsembl\nGO:0090063\tGeneOntology\nGO:0005515\tGeneOntology\nGO:0002087\tGeneOntology\nGO:0005634\tGeneOntology\nGO:0007416\tGeneOntology\nGO:0008104\tGeneOntology\nGO:0042826\tGeneOntology\nGO:0007420\tGeneOntology\nGO:0035067\tGeneOntology\n300005\tOMIM\nNP_001104262\tRefSeq\nA0A087WVW7\tUniprot-TrEMBL\nNP_004983\tRefSeq\nGO:0046470\tGeneOntology\nGO:0010385\tGeneOntology\n11722682_at\tAffy\nGO:0051965\tGeneOntology\nNM_001316337\tRefSeq\nuc065caw.1\tUCSC Genome Browser\nA0A0D9SFX7\tUniprot-TrEMBL\nA0A140VKC4\tUniprot-TrEMBL\nGO:0003723\tGeneOntology\nGO:0019233\tGeneOntology\nGO:0001666\tGeneOntology\nGO:0003729\tGeneOntology\nGO:0021591\tGeneOntology\nuc065cas.1\tUCSC Genome Browser\nGO:0019230\tGeneOntology\nGO:0003682\tGeneOntology\nGO:0001662\tGeneOntology\nuc065cbh.1\tUCSC Genome Browser\nX99687_at\tAffy\nGO:0008344\tGeneOntology\nGO:0009791\tGeneOntology\nuc065cbd.1\tUCSC Genome Browser\nGO:0019904\tGeneOntology\nGO:0030182\tGeneOntology\nGO:0035197\tGeneOntology\n8175998\tAffy\nGO:0016358\tGeneOntology\nNM_004992\tRefSeq\nGO:0003714\tGeneOntology\nGO:0005739\tGeneOntology\nGO:0005615\tGeneOntology\nGO:0005737\tGeneOntology\nuc004fjv.3\tUCSC Genome Browser\n202617_s_at\tAffy\nGO:0050905\tGeneOntology\nGO:0008327\tGeneOntology\nD3YJ43\tUniprot-TrEMBL\nGO:0003677\tGeneOntology\nGO:0006541\tGeneOntology\nGO:0040029\tGeneOntology\nA_33_P3317211\tAgilent\nNP_001303266\tRefSeq\n11722683_a_at\tAffy\nGO:0008211\tGeneOntology\nGO:0051151\tGeneOntology\nNM_001110792\tRefSeq\nX89430_at\tAffy\nGO:2000820\tGeneOntology\nuc065cat.1\tUCSC Genome Browser\nGO:0003700\tGeneOntology\nGO:0047485\tGeneOntology\n4204\tEntrez Gene\nGO:0009405\tGeneOntology\nA0A0D9SEX1\tUniprot-TrEMBL\nGO:0098794\tGeneOntology\n3C2I\tPDB\nHs.200716\tUniGene\nGO:0000792\tGeneOntology\nuc065cax.1\tUCSC Genome Browser\n300055\tOMIM\n5BT2\tPDB\nGO:0006020\tGeneOntology\nGO:0031175\tGeneOntology\nuc065cbe.1\tUCSC Genome Browser\nGO:0008284\tGeneOntology\nuc065cba.1\tUCSC Genome Browser\nGO:0060291\tGeneOntology\n202618_s_at\tAffy\nGO:0016573\tGeneOntology\n17115453\tAffy\nA0A1B0GTV0\tUniprot-TrEMBL\nuc065cbi.1\tUCSC Genome Browser\nGO:0048167\tGeneOntology\nGO:0007616\tGeneOntology\nGO:0016571\tGeneOntology\nuc004fjw.3\tUCSC Genome Browser\nGO:0007613\tGeneOntology\nGO:0007612\tGeneOntology\nGO:0021549\tGeneOntology\n11722684_a_at\tAffy\nGO:0001078\tGeneOntology\nX94628_rna1_s_at\tAffy\nGO:0007585\tGeneOntology\nGO:0010468\tGeneOntology\nGO:0031061\tGeneOntology\nA_24_P237486\tAgilent\nGO:0050884\tGeneOntology\nGO:0000930\tGeneOntology\nGO:0005829\tGeneOntology\nuc065cau.1\tUCSC Genome Browser\nH7BY72\tUniprot-TrEMBL\n202616_s_at\tAffy\nGO:0006355\tGeneOntology\nuc065cay.1\tUCSC Genome Browser\nGO:0010971\tGeneOntology\n300673\tOMIM\nGO:0008542\tGeneOntology\nGO:0060079\tGeneOntology\nuc065cbf.1\tUCSC Genome Browser\nGO:0006122\tGeneOntology\nuc065cbb.1\tUCSC Genome Browser\nGO:0007052\tGeneOntology\nC9JH89\tUniprot-TrEMBL\nB5MCB4\tUniprot-TrEMBL\nGO:0032048\tGeneOntology\nGO:0050432\tGeneOntology\nGO:0001976\tGeneOntology\nI6LM39\tUniprot-TrEMBL\nGO:0005813\tGeneOntology\nILMN_1682091\tIllumina\nP51608\tUniprot-TrEMBL\n1QK9\tPDB\nGO:0006349\tGeneOntology\nGO:1900114\tGeneOntology\nGO:0000122\tGeneOntology\nGO:0006351\tGeneOntology\nGO:0008134\tGeneOntology\nILMN_1824898\tIllumina\n300260\tOMIM\n0006510725\tIllumina\n'
You can also see the results are returned as a TSV file, consisting of two columns, the identifier and the matching database.
We will want to convert this reply into a Python dictionary (with the identifier as key, as one database may have multiple identifiers):
lines = response.text.split("\n")
mappings = {}
for line in lines:
if ('\t' in line):
tuple = line.split('\t')
identifier = tuple[0]
database = tuple[1]
if (database == "Ensembl"):
mappings[identifier] = database
print(mappings)
{'ENSG00000169057': 'Ensembl'}
Alternatively, we can restrivct the return values from the BridgeDb webservice to just return Ensembl identifiers (system code En
). For this, we add the ?dataSource=En
call parameter:
callUrl = 'http://bridgedb-swagger.prod.openrisknet.org/Human/xrefs/H/MECP2?dataSource=En'
response = requests.get(callUrl)
response.text
'ENSG00000169057\tEnsembl\n'
Version History
master @ 0f98fd8 (latest) Created 6th Apr 2022 at 14:12 by Marvin Martens
Update genes.ipynb
Frozen
master
0f98fd8
master @ 5f34ac1 Created 6th Apr 2022 at 14:09 by Marvin Martens
added citation info
Frozen
master
5f34ac1
master @ 5f34ac1 (earliest) Created 6th Apr 2022 at 14:02 by Marvin Martens
added citation info
Frozen
master
5f34ac1
Creator
Additional credit
Egon Willighagen
Submitter
Views: 3345 Downloads: 635
Created: 6th Apr 2022 at 14:02
Last updated: 6th Apr 2022 at 14:09
None