The BioThings APIs provide query and retrieve annoation of genes, variants, chemicals and taxons.
This package has S4 methods for accessing each API, as well as an R6
class that is extensible so that as other APIs are made, a configuration
can be passed and used to query and retrieve annotation from said APIs.
The S4 methods are also extensible by providing a configuration object
like the included biothings_clients configuration, and then
using the btGet, btQuery,
btMetadata and btFields methods.
Each API (gene, taxon, variant and chemical - when this was written)
has a set of methods for getting annotations. Additionally, there are
the generic btGet method.
btGetThese methods can be used with a defined BioThingsClient
class object or with just the name of a client from the
biothings_clients object.
With a defined BioThings object (see next example for
output):
> gene_client <- BioThingsClient("gene")
> # Suppress messages
> slot(gene_client, "verbose") <- FALSE
> btGet(gene_client, "1017")
> btGet(gene_client, c("1017","1018","ENSG00000148795"))
Without (output trimmed):
> btGet(gene_client, "1017") # btGet("gene", "1017") is equivalent
[[1]]
[[1]]$`_id`
[1] "1017"
[[1]]$`_score`
[1] 22.32558
...(trimmed for space)
> btGet(gene_client, c("1017","1018","ENSG00000148795"))
[[1]]
[[1]]$`_id`
[1] "1017"
[[1]]$`_score`
[1] 22.32558
...(trimmed for space)
[[2]]
[[2]]$`_id`
[1] "1018"
...(trimmed for space)
Users can also specify fields and the type of return
object. However, due to the size of some API responses, the default is a
list (denoted by "records") and we recommend only
requesting "data.frame" when also specifing fields.
Requesting "data.frame" will generally return a data frame,
though it might be heavily nested and hard to use. So again, specify
fields for the best results when requesting a data frame.
> btGet(gene_client, c("1017","1018","ENSG00000148795"),
fields = c("symbol","name","taxid","entrezgene"),
return.as = "data.frame")
_id _score entrezgene name symbol
1 1017 22.32558 1017 cyclin dependent kinase 2 CDK2
2 1018 22.32596 1018 cyclin dependent kinase 3 CDK3
3 1586 23.35202 1586 cytochrome P450 family 17 subfamily A member 1 CYP17A1
taxid query
1 9606 1017
2 9606 1018
3 9606 ENSG00000148795
The APIs also have query endpoints, so a user can provide query terms
and receive annotations in that form. The btQuery method is
generic and rely on input of a key for the
biothings_clients or a BioThings object
configuration for specification of which API to use. Just like the
generic annotation methods, these methods can be used with an updated or
customized configuration.
btQuerybtQuery:
> btQuery(gene_client, "sp2")
[[1]]
[[1]]$max_score
[1] 457.1937
[[1]]$took
[1] 5
[[1]]$total
[1] 10
[[1]]$hits
[[1]]$hits[[1]]
[[1]]$hits[[1]]$`_id`
[1] "6668"
...(output trimmed)
btQuery for multiple ids:
> genes <- c('1053_at', '117_at', '121_at', '1255_g_at', '1294_at')
> btQuery(gene_client, genes, scopes=c("symbol", "reporter", "accession"),
fields = c("entrezgene", "uniprot"), species = "human",
returnall = TRUE)
Finished
$response
$response[[1]]
$response[[1]]$`_id`
[1] "5982"
$response[[1]]$`_score`
[1] 22.32529
$response[[1]]$entrezgene
[1] 5982
$response[[1]]$uniprot
$response[[1]]$uniprot$`Swiss-Prot`
[1] "P35250"
...(output trimmed)
fetch_all parameterSome queries return thousands of results. Use fetch_all
to get all of them.
> faquery <- btQuery(gene_client, '_exists_:pdb', fetch_all = TRUE,
fields = 'pdb')
Getting additional records. Took: 536
Getting additional records. Took: 484
Getting additional records. Took: 301
Getting additional records. Took: 300
Getting additional records. Took: 409
Getting additional records. Took: 538
Getting additional records. Took: 225
Getting additional records. Took: 147
Getting additional records. Took: 46
> str(faquery)
List of 8176
$ :List of 3
..$ _id : chr "23211"
..$ _score: num 1.55
..$ pdb : chr "2CQE"
$ :List of 3
..$ _id : chr "5888"
..$ _score: num 1.55
..$ pdb : chr [1:6] "1B22" "1N0W" "5H1B" "5H1C" ...
...(output trimmed)
Users can also retrieve information about metadata for an API with
btMetadata:
> btMetadata("gene") # Equivalent to btMetadata(gene_client)
[[1]]
[[1]]$source
NULL
[[1]]$stats
[[1]]$stats$entrez_retired
[1] 201591
[[1]]$stats$ensembl_prosite
[1] 725969
[[1]]$stats$ensembl_pfam
[1] 1174908
[[1]]$stats$entrez_gene
[1] 19752422
...(output trimmed)
As well as metadata fields with btFields:
> btFields("gene") # Beware, produces a lot of output