Search Uniprot via REST API. By default it searches the supplied
query against UniProtKB and returns a data.frame of matching proteins.
It is a wrapper for
this UniProt API
endpoint.
Usage
uniprot_search(
query,
database = "uniprotkb",
format = "tsv",
path = NULL,
fields = NULL,
isoform = NULL,
method = "paged",
page_size = 500,
compressed = NULL,
verbosity = NULL,
dry_run = FALSE
)Arguments
- query
string, the search query. See this page for helping constructing search queries.- database
string, database to look up. Default is"uniprotkb". See the Databases section below for all available databases.- format
string, data format to fetch. Default is"tsv". Can be one of"tsv"or"fasta".- path
string(optional), file path to save the results, e.g."path/to/results.tsv".- fields
character(optional), fields (i.e. columns) of data to get. The fields available depends on the database used, see return_fields for all available fields.- isoform
logical(optional), should protein isoforms be included in the results? Not necessarily relevant for all formats and databases.- method
string, download method to use. Either"paged"(default) or"stream". Paged is more robust to connection issues and takes less memory. Stream may be faster, but uses more memory and is more sensitive to connection issues.- page_size
integer(optional), how many entries per page to request? Only relevant ifmethod = "paged". It's best to leave this at500.- compressed
logical(optional), should gzipped data be requested? Only relevant ifmethod = "stream"andpathis specified.- verbosity
integer(optional), how much information to print?0: no output
NULL (default): minimal output
1: show request headers
2: show request headers and bodies
3: show request headers, bodies, and curl status messages
- dry_run
logical, perform request withhttr2::req_dry_run()? Requires thehttpuvpackage to be installed. Default isFALSE.
Value
By default, returns an object whose type depends on format:
tsv:data.framefasta: Biostrings::AAStringSet (ornamed characterif Biostrings not installed)
If path is specified, saves the results to the file path indicated,
and returns NULL invisibly. If dry_run = TRUE, returns a
list containing information about the request, including the request
method, path, and headers.
Databases
The following databases are available to query:
uniprotkb: UniProt Knowledge Baseuniref: UniProt Reference Clustersuniparc: UniProt Archiveproteomes: Reference proteomestaxonomy: Taxonomykeywords: Keywordscitations: Literature referencesdiseases: Disease queriesdatabase: Cross referenceslocations: Subcellular locationunirule: UniRulearba: ARBA (Association-Rule-Based Annotator)
See also
Other API wrapper functions: uniprot_map(), uniprot_single()
Examples
if (FALSE) {
# Search for all human glycoproteins from SwissProt
res <- uniprot_search(
query = "(proteome:UP000005640) AND (keyword:KW-0325) AND (reviewed:true)",
database = "uniprotkb",
format = "tsv",
fields = c("accession", "gene_primary", "feature_count")
)
# Look at the resulting dataframe
head(res)
}