Search Uniprot via REST API. By default it searches the supplied
query against UniProtKB and returns a data.frame
of matching proteins.
It is a wrapper for
this UniProt API
endpoint.
Usage
uniprot_search(
query,
database = "uniprotkb",
format = "tsv",
path = NULL,
fields = NULL,
isoform = NULL,
method = "paged",
page_size = 500,
compressed = NULL,
verbosity = NULL,
dry_run = FALSE
)
Arguments
- query
string
, the search query. See this page for helping constructing search queries.- database
string
, database to look up. Default is"uniprotkb"
. See the Databases section below for all available databases.- format
string
, data format to fetch. Default is"tsv"
. Can be one of"tsv"
or"fasta"
.- path
string
(optional), file path to save the results, e.g."path/to/results.tsv"
.- fields
character
(optional), fields (i.e. columns) of data to get. The fields available depends on the database used, see return_fields for all available fields.- isoform
logical
(optional), should protein isoforms be included in the results? Not necessarily relevant for all formats and databases.- method
string
, download method to use. Either"paged"
(default) or"stream"
. Paged is more robust to connection issues and takes less memory. Stream may be faster, but uses more memory and is more sensitive to connection issues.- page_size
integer
(optional), how many entries per page to request? Only relevant ifmethod = "paged"
. It's best to leave this at500
.- compressed
logical
(optional), should gzipped data be requested? Only relevant ifmethod = "stream"
andpath
is specified.- verbosity
integer
(optional), how much information to print?0: no output
NULL (default): minimal output
1: show request headers
2: show request headers and bodies
3: show request headers, bodies, and curl status messages
- dry_run
logical
, perform request withhttr2::req_dry_run()
? Requires thehttpuv
package to be installed. Default isFALSE
.
Value
By default, returns an object whose type depends on format
:
tsv
:data.frame
fasta
: Biostrings::AAStringSet (ornamed character
if Biostrings not installed)
If path
is specified, saves the results to the file path indicated,
and returns NULL
invisibly. If dry_run = TRUE
, returns a
list containing information about the request, including the request
method
, path
, and headers
.
Databases
The following databases are available to query:
uniprotkb
: UniProt Knowledge Baseuniref
: UniProt Reference Clustersuniparc
: UniProt Archiveproteomes
: Reference proteomestaxonomy
: Taxonomykeywords
: Keywordscitations
: Literature referencesdiseases
: Disease queriesdatabase
: Cross referenceslocations
: Subcellular locationunirule
: UniRulearba
: ARBA (Association-Rule-Based Annotator)
See also
Other API wrapper functions: uniprot_map()
, uniprot_single()
Examples
if (FALSE) {
# Search for all human glycoproteins from SwissProt
res <- uniprot_search(
query = "(proteome:UP000005640) AND (keyword:KW-0325) AND (reviewed:true)",
database = "uniprotkb",
format = "tsv",
fields = c("accession", "gene_primary", "feature_count")
)
# Look at the resulting dataframe
head(res)
}