Skip to contents

This function wraps the UniProt ID Mapping service which maps between the identifiers used in one database, to the identifiers of another. By default it maps UniProtKB accessions to UniProt, and returns a data.frame with metadata about the mapped protein accessions. You can also map IDs from/to other databases e.g. from = "Ensembl", to = "UniProtKB".

Things to note

This service has limits on the number of IDs allowed. Very large mapping requests are likely to fail. Try to split your queries into smaller chunks in case of problems.

  • 100,000 = maximum number of input ids allowed

  • 500,000 = maximum number of entries that will be output

Usage

uniprot_map(
  ids,
  from = "UniProtKB_AC-ID",
  to = "UniProtKB",
  format = "tsv",
  path = NULL,
  fields = NULL,
  isoform = NULL,
  method = "paged",
  page_size = 500,
  compressed = NULL,
  verbosity = NULL,
  dry_run = FALSE
)

Arguments

ids

character, vector of identifiers to map from. Should not contain duplicates. Maximum length = 100,000 ids.

from

string, database to map from. Default is "UniProtKB_AC-ID". See from_to_dbs possible databases whose identifiers you can map from.

to

string, database to map to. Default is "UniProtKB". See from_to_rules for the possible databases you can map to, depending on the from database.

format

string, data format to fetch. Default is "tsv". Can be one of "tsv" or "fasta".

path

string (optional), file path to save the results, e.g. "path/to/results.tsv".

fields

character (optional), fields (i.e. columns) of data to get. Only used if to is a UniProtKB, UniRef, or UniParc database. See return_fields for all available fields.

isoform

logical (optional), should protein isoforms be included in the results? Not necessarily relevant for all formats and databases.

method

string, download method to use. Either "paged" (default) or "stream". Paged is more robust to connection issues and takes less memory. Stream may be faster, but uses more memory and is more sensitive to connection issues.

page_size

integer (optional), how many entries per page to request? Only relevant if method = "paged". It's best to leave this at 500.

compressed

logical (optional), should gzipped data be requested? Only relevant if method = "stream" and path is specified.

verbosity

integer (optional), how much information to print?

  • 0: no output

  • NULL (default): minimal output

  • 1: show request headers

  • 2: show request headers and bodies

  • 3: show request headers, bodies, and curl status messages

dry_run

logical, perform request with httr2::req_dry_run()? Requires the httpuv package to be installed. Default is FALSE.

Value

By default, returns an object whose type depends on format:

If path is specified, saves the results to the file path indicated, and returns NULL invisibly. If dry_run = TRUE, returns a list containing information about the request, including the request method, path, and headers.

See also

Other API wrapper functions: uniprot_search(), uniprot_single()

Examples

if (FALSE) {
  # Default, get info about UniProt IDs
  uniprot_map(
    "P99999",
    format = "tsv",
    fields = c("accession", "gene_primary", "feature_count")
  )

  # Other common use, mapping other IDs to UniProt
  # (or vice-versa)
  uniprot_map(
    c("ENSG00000088247", "ENSG00000162613"),
    from = "Ensembl",
    to = "UniProtKB"
  )

}