Skip to contents

Download FASTQ Files.

Usage

DownloadFastq(
  gsm.df,
  out.folder = NULL,
  download.method = c("download.file", "ascp"),
  quiet = FALSE,
  timeout = 3600,
  ascp.path = NULL,
  max.rate = "300m",
  parallel = TRUE,
  use.cores = NULL,
  format.10x = TRUE,
  remove.raw = FALSE
)

Arguments

gsm.df

Dataframe contains GSM and Run numbers, obtained from ExtractRun.

out.folder

Output folder. Default: NULL (current working directory).

download.method

Method to download fastq files, chosen from "download.file" and "ascp". Default: "download.file".

quiet

Logical value, whether to show downloading progress. Used when download.method is "download.file". Default: FALSE (show).

timeout

Maximum request time. Used when download.method is "download.file". Default: 3600.

ascp.path

Path to ascp (/path/bin/ascp), please ensure that the relative path of asperaweb_id_dsa.openssh file (/path/bin/ascp/../etc/asperaweb_id_dsa.openssh). Default: NULL (conduct automatic detection).

max.rate

Max transfer rate. Used when download.method is "ascp". Default: 300m.

parallel

Logical value, whether to download parallelly. Default: TRUE.

use.cores

The number of cores used. Default: NULL (the minimum value of nrow(gsm.df) and parallel::detectCores()).

format.10x

Logical value, whether to format split fastqs to 10x standard format. Default: TRUE.

remove.raw

Logical value, whether to remove old split fastqs (unformatted), used when format.10x is TRUE. Default: FALSE.

Value

Dataframe contains failed gsm.df of NULL.

Examples

GSE130636.runs <- ExtractRun(acce = "GSE130636", platform = "GPL20301")
#> Extract all GSM with acce: GSE130636 and platform: GPL20301
#> Found 1 file(s)
#> GSE130636_series_matrix.txt.gz
#> Rows: 0 Columns: 7
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (7): ID_REF, GSM3745992, GSM3745993, GSM3745994, GSM3745995, GSM3745996,...
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> File stored at: 
#> /var/folders/_4/k4qmvf7s2gx_6789px8n_sxh0000gn/T//Rtmp1TqNsD/GPL20301.soft
#> 6 GSMs to process
# a small test
GSE130636.runs <- GSE130636.runs[GSE130636.runs$run %in% c("SRR9004325", "SRR9004326"), ]
# use download.file
download.file.res <- DownloadFastq(
  gsm.df = gsm.df, out.folder = "/path/to/output",
  download.method = "download.file", parallel = TRUE, use.cores = 2
)
#> Error in nrow(gsm.df): object 'gsm.df' not found
# use ascp
ascp.res <- DownloadFastq(
  gsm.df = gsm.df, out.folder = "/home/songyabing/data/projects/tmp/GEfetch2R",
  download.method = "ascp", ascp.path = "~/.aspera/connect/bin/ascp", parallel = TRUE, use.cores = 2
)
#> Error in nrow(gsm.df): object 'gsm.df' not found