Skip to contents

Search Uniprot via REST API. By default it searches the supplied query against UniProtKB and returns a data.frame of matching proteins. It is a wrapper for this UniProt API endpoint.

Usage

uniprot_search(
  query,
  database = "uniprotkb",
  format = "tsv",
  path = NULL,
  fields = NULL,
  isoform = NULL,
  method = "paged",
  page_size = 500,
  compressed = NULL,
  verbosity = NULL,
  dry_run = FALSE
)

Arguments

query

string, the search query. See this page for helping constructing search queries.

database

string, database to look up. Default is "uniprotkb". See the Databases section below for all available databases.

format

string, data format to fetch. Default is "tsv". Can be one of "tsv" or "fasta".

path

string (optional), file path to save the results, e.g. "path/to/results.tsv".

fields

character (optional), fields (i.e. columns) of data to get. The fields available depends on the database used, see return_fields for all available fields.

isoform

logical (optional), should protein isoforms be included in the results? Not necessarily relevant for all formats and databases.

method

string, download method to use. Either "paged" (default) or "stream". Paged is more robust to connection issues and takes less memory. Stream may be faster, but uses more memory and is more sensitive to connection issues.

page_size

integer (optional), how many entries per page to request? Only relevant if method = "paged". It's best to leave this at 500.

compressed

logical (optional), should gzipped data be requested? Only relevant if method = "stream" and path is specified.

verbosity

integer (optional), how much information to print?

  • 0: no output

  • NULL (default): minimal output

  • 1: show request headers

  • 2: show request headers and bodies

  • 3: show request headers, bodies, and curl status messages

dry_run

logical, perform request with httr2::req_dry_run()? Requires the httpuv package to be installed. Default is FALSE.

Value

By default, returns an object whose type depends on format:

If path is specified, saves the results to the file path indicated, and returns NULL invisibly. If dry_run = TRUE, returns a list containing information about the request, including the request method, path, and headers.

Databases

The following databases are available to query:

See also

Other API wrapper functions: uniprot_map(), uniprot_single()

Examples

if (FALSE) {
  # Search for all human glycoproteins from SwissProt
  res <- uniprot_search(
    query = "(proteome:UP000005640) AND (keyword:KW-0325) AND (reviewed:true)",
    database = "uniprotkb",
    format = "tsv",
    fields = c("accession", "gene_primary", "feature_count")
  )

  # Look at the resulting dataframe
  head(res)
}