Skip to contents

Introduction

A common situation is that we need to use a unified software version (e.g. CellRanger) to obtain the count matrix, in order to better integrate and compare multiple datasets. Here, we will use GEfetch2R to download raw data (sra/fastq/bam). With bam files, GEfetch2R also provides function for user to convert the bam to fastq files.

GEfetch2R supports downloading raw data (sra/fastq/bam) from SRA and ENA with GEO accessions. In general, downloading raw data from ENA is much faster than SRA, because of ascp and parallel support.


Check API

Check the availability of APIs used:

CheckAPI(database = c("GEO", "SRA/ENA"))
# 1 GSMs to process
# The API to access the SRA accession is OK!
# The API to access ENA download links is OK!
# start checking APIs to access GEO!
# trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE302nnn/GSE302912/matrix/GSE302912_series_matrix.txt.gz'
# Content type 'application/x-gzip' length 2642 bytes
# ==================================================
# downloaded 2642 bytes
#
# trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE302nnn/GSE302912/suppl//GSE302912_counts.csv.gz?tool=geoquery'
# Content type 'application/x-gzip' length 332706 bytes (324 KB)
# ==================================================
# downloaded 324 KB
#
# The API to access supplementary files is OK!

One-step wrapper

With a given GEO accession or GSM accession, GEfetch2R provides a one-step function DownloadFastq2R to extract all runs, automatically identify the RNA-seq type (10x Genomics scRNA-seq, bulk RNA-seq, Smart-seq2 scRNA-seq/mini-bulk RNA-seq) of each run, download fastq files from ENA, perform read mapping (merge multiple runs of a sample), and load the results to R.

The way DownloadFastq2R automatically identify the RNA-seq type:

  • filter the library_strategy field ("rna-seq", "scrna-seq", "snrna-seq") to remove non-RNA-seq runs.
  • if library_source field is "transcriptomic single cell" or library_strategy field is in c("scrna-seq", "snrna-seq") or key words ("scRNA", "snRNA", "single.cell", "single.nuclei", "single.nucleus", "singlecell", "singlenuclei", "singlenucleus", "10X.Genomics", "10XGenomics", "Smart.seq2", "Smartseq2") appear in title|source_name|characteristics|description fields, the run is classified as scRNA-seq.
    • if key words ("10X.Genomics", "Cell.Ranger", "10XGenomics", "CellRanger") appear in any field, the run is classified as 10x Genomics scRNA-seq.
    • if key words ("Smart.seq2", "Smartseq2") appear in any field, the run is classified as Smart-seq2 scRNA-seq/mini bulk RNA-seq.
    • else the run will skip subsequent processing steps.
  • else the run is classified as bulk RNA-seq.

If the automatic identification failed, users can specify the RNA-seq type via force.type, choose one from "10x", "Smart-seq2", "bulk".

With a given GEO accession:

# library
library(tidyverse)
library(GEfetch2R)

GSE305141.list <- DownloadFastq2R(
  acce = "GSE305141", skip.gsm = c("GSM9162729", "GSM9162725"), star.ref = "~/Mus_musculus/star",
  cellranger.ref = "~/refdata-cellranger-mm10-3.0.0",
  star.path = "~/anaconda3/bin/STAR",
  cellranger.path = "~/cellranger-3.0.2/cellranger",
  out.folder = "~/gefetch2r/doc/fastq2r"
)
GSE305141.list
# $bulk.rna
# class: DESeqDataSet
# dim: 55573 11
# metadata(1): version
# assays(1): counts
# rownames(55573): ENSMUSG00000000001 ENSMUSG00000000003 ...
#   ENSMUSG00000118487 ENSMUSG00000118488
# rowData names(0):
# colnames(11): GSM9162716 GSM9162717 ... GSM9162726 GSM9162727
# colData names(1): condition
#
# $scrna.10x
# $scrna.10x$`/home/songyabing/Gaolab/07_dev/gefetch2r/doc/fastq2r/mapping/10x/GSM9162728/outs/filtered_feature_bc_matrix`
# An object of class Seurat
# 31053 features across 6774 samples within 1 assay
# Active assay: RNA (31053 features, 0 variable features)
#
# $scrna.10x$`/home/songyabing/Gaolab/07_dev/gefetch2r/doc/fastq2r/mapping/10x/GSM9162730/outs/filtered_feature_bc_matrix`
# An object of class Seurat
# 31053 features across 9344 samples within 1 assay
# Active assay: RNA (31053 features, 0 variable features)

# # The output structure
# tree -L 3 ~/gefetch2r/doc/fastq2r
# ~/gefetch2r/doc/fastq2r
# ├── fastq
# │   ├── 10x
# │   │   ├── GSM9162728
# │   │   └── GSM9162730
# │   └── bulkRNAseq
# │       ├── GSM9162716
# │       ├── GSM9162717
# │       ├── GSM9162718
# │       ├── GSM9162719
# │       ├── GSM9162720
# │       ├── GSM9162721
# │       ├── GSM9162722
# │       ├── GSM9162723
# │       ├── GSM9162724
# │       ├── GSM9162726
# │       └── GSM9162727
# └── mapping
#     ├── 10x
#     │   ├── GSM9162728
#     │   └── GSM9162730
#     └── bulkRNAseq
#         ├── GSM9162716
#         ├── GSM9162717
#         ├── GSM9162718
#         ├── GSM9162719
#         ├── GSM9162720
#         ├── GSM9162721
#         ├── GSM9162722
#         ├── GSM9162723
#         ├── GSM9162724
#         ├── GSM9162726
#         └── GSM9162727
#
# 32 directories, 0 files

With a given GSM accession:

GSE127942.list <- DownloadFastq2R(
  gsm = c("GSM3656922", "GSM3656923"), star.ref = "~/Mus_musculus/star",
  star.path = "~/anaconda3/bin/STAR",
  out.folder = "~/gefetch2r/doc/fastq2r2"
)
GSE127942.list
# $bulk.rna
# class: DESeqDataSet
# dim: 55573 2
# metadata(1): version
# assays(1): counts
# rownames(55573): ENSMUSG00000000001 ENSMUSG00000000003 ...
#   ENSMUSG00000118487 ENSMUSG00000118488
# rowData names(0):
# colnames(2): GSM3656922 GSM3656923
# colData names(1): condition

# # The output structure
# tree ~/gefetch2r/doc/fastq2r2
# ~/gefetch2r/doc/fastq2r2
# ├── fastq
# │   └── bulkRNAseq
# │       ├── GSM3656922
# │       │   ├── SRR8690244
# │       │   │   ├── SRR8690244_1.fastq.gz
# │       │   │   └── SRR8690244_2.fastq.gz
# │       │   └── SRR8690245
# │       │       ├── SRR8690245_1.fastq.gz
# │       │       └── SRR8690245_2.fastq.gz
# │       └── GSM3656923
# │           ├── SRR8690246
# │           │   ├── SRR8690246_1.fastq.gz
# │           │   └── SRR8690246_2.fastq.gz
# │           └── SRR8690247
# │               ├── SRR8690247_1.fastq.gz
# │               └── SRR8690247_2.fastq.gz
# └── mapping
#     └── bulkRNAseq
#         ├── GSM3656922
#         │   ├── Aligned.sortedByCoord.out.bam
#         │   ├── GSM3656922.txt
#         │   ├── Log.final.out
#         │   ├── Log.out
#         │   ├── Log.progress.out
#         │   ├── SJ.out.tab
#         │   └── _STARtmp
#         │       └── BAMsort
#         │           ├── 0
#         │           ├── 1
#         │           ├── 2
#         │           └── 3
#         └── GSM3656923
#             ├── Aligned.sortedByCoord.out.bam
#             ├── GSM3656923.txt
#             ├── Log.final.out
#             ├── Log.out
#             ├── Log.progress.out
#             ├── SJ.out.tab
#             └── _STARtmp
#                 └── BAMsort
#                     ├── 0
#                     ├── 1
#                     ├── 2
#                     └── 3
#
# 24 directories, 20 files

Download sra

Extract all samples (runs)

For fastq files stored in SRA/ENA, GEfetch2R can extract sample information and run number with GEO accessions or users can also provide a dataframe containing the run number of interested samples.

Extract all samples under GSE130636 and the platform is GPL20301 (use platform = NULL for all platforms):

GSE130636.runs <- ExtractRun(acce = "GSE130636", platform = "GPL20301")
# a small test
GSE130636.runs <- GSE130636.runs[GSE130636.runs$run %in% c("SRR9004346", "SRR9004351"), ]
GSE130636.runs[, c("run", "experiment", "gsm_name", "title", "geo_accession", "source_name_ch1", "organism_ch1", "characteristics_ch1", "characteristics_ch1.1")]
#                   run experiment   gsm_name         title geo_accession source_name_ch1
# SRR9004346 SRR9004346 SRX5783052 GSM3745993 Fovea donor 2    GSM3745993          Retina
# SRR9004351 SRR9004351 SRX5783052 GSM3745993 Fovea donor 2    GSM3745993          Retina
#            organism_ch1 characteristics_ch1 characteristics_ch1.1
# SRR9004346 Homo sapiens     location: Fovea        donor: Donor 2
# SRR9004351 Homo sapiens     location: Fovea        donor: Donor 2

Download sra

With the dataframe contains gsm and run number, GEfetch2R will use prefetch to download sra files from SRA or using ascp/download.file to download sra files from ENA. The returned value is a dataframe contains failed runs. If not NULL, users can re-run DownloadSRA by setting gsm.df to the returned value.

Download from SRA:

# download
GSE130636.down <- DownloadSRA(
  gsm.df = GSE130636.runs,
  prefetch.path = "~/data/software/sratoolkit.2.11.2-centos_linux64/bin/prefetch",
  out.folder = "~/gefetch2r/doc/download_sra/prefetch"
)
# GSE130636.down is null or dataframe contains failed runs

# tree ~/gefetch2r/doc/download_sra/prefetch
# ~/gefetch2r/doc/download_sra/prefetch
# └── GSM3745993
#     ├── SRR9004346
#     │   └── SRR9004346.sra
#     └── SRR9004351
#         └── SRR9004351.sra
#
# 3 directories, 2 files

Download from ENA (parallel):

# download from ENA using download.file
GSE130636.down <- DownloadSRA(
  gsm.df = GSE130636.runs, download.method = "download.file",
  timeout = 3600, out.folder = "~/gefetch2r/doc/download_sra/download_file",
  rename = TRUE, parallel = TRUE, use.cores = 2
)

# download from ENA using ascp
GSE130636.down <- DownloadSRA(
  gsm.df = GSE130636.runs, download.method = "ascp",
  ascp.path = "~/.aspera/connect/bin/ascp", max.rate = "300m",
  rename = TRUE, out.folder = "~/gefetch2r/doc/download_sra/ascp",
  parallel = TRUE, use.cores = 2
)

# download from ENA using wget
GSE130636.down <- DownloadSRA(
  gsm.df = GSE130636.runs, download.method = "wget",
  wget.path = "/usr/bin/wget", timeout = 3600,
  out.folder = "~/gefetch2r/doc/download_sra/wget", rename = TRUE,
  parallel = TRUE, use.cores = 2
)

# GSE130636.down is null or dataframe contains failed runs

Download fastq

Split sra to generate fastq

After obtaining the sra files, GEfetch2R provides function SplitSRA to split sra files to fastq files using parallel-fastq-dump (parallel, fastest and gzip output), fasterq-dump (parallel, fast but unzipped output) and fastq-dump (slowest and gzip output).

For fastqs generated with 10x Genomics, SplitSRA can identify read1, read2 and index files and format the read1 and read2 to 10x required format (sample1_S1_L001_R1_001.fastq.gz and sample1_S1_L001_R2_001.fastq.gz). In detail, the file with read length 26 or 28 is considered as read1, the files with read length 8 or 10 are considered as index files and the remain file is considered as read2. The read length rules is from Sequencing Requirements for Single Cell 3’ and Sequencing Requirements for Single Cell V(D)J.

The returned value is a vector of failed sra files. If not NULL, users can re-run SplitSRA by setting sra.path to the returned value.

# parallel-fastq-dump requires sratools.path
GSE130636.split <- SplitSRA(
  sra.folder = "~/gefetch2r/doc/download_sra/prefetch",
  fastq.type = "10x",
  split.cmd.path = "~/anaconda3/bin/parallel-fastq-dump",
  sratools.path = "~/data/software/sratoolkit.2.11.2-centos_linux64/bin", split.cmd.paras = "--gzip",
  split.cmd.threads = 4, format.10x = TRUE, remove.raw = TRUE # remove the unformatted fastq files, used when format.10x = TRUE
)

# tree ~/gefetch2r/doc/download_sra/prefetch
# ~/gefetch2r/doc/download_sra/prefetch
# └── GSM3745993
#     ├── SRR9004346
#     │   ├── SRR9004346_S1_L001_R1_001.fastq.gz
#     │   ├── SRR9004346_S1_L001_R2_001.fastq.gz
#     │   └── SRR9004346.sra
#     └── SRR9004351
#         ├── SRR9004351_S1_L001_R1_001.fastq.gz
#         ├── SRR9004351_S1_L001_R2_001.fastq.gz
#         └── SRR9004351.sra
#
# 3 directories, 6 files

Download fastq directly from ENA

Alternatively, GEfetch2R provides function DownloadFastq to download fastq files directly from ENA (parallel, faster than the above method). The returned value is a dataframe contains failed runs. If not NULL, users can re-run DownloadFastq by setting gsm.df to the returned value.

# use download.file
GSE130636.down.fastq <- DownloadFastq(
  gsm.df = GSE130636.runs, out.folder = "~/gefetch2r/doc/download_fastq/download_file",
  download.method = "download.file", timeout = 360000,
  parallel = TRUE, use.cores = 2, format.10x = TRUE, remove.raw = TRUE # remove the unformatted fastq files, used when format.10x = TRUE
)
# tree ~/gefetch2r/doc/download_fastq/download_file
# ~/gefetch2r/doc/download_fastq/download_file
# └── GSM3745993
#     ├── SRR9004346
#     │   ├── SRR9004346_S1_L001_R1_001.fastq.gz
#     │   └── SRR9004346_S1_L001_R2_001.fastq.gz
#     └── SRR9004351
#         ├── SRR9004351_S1_L001_R1_001.fastq.gz
#         └── SRR9004351_S1_L001_R2_001.fastq.gz
#
# 3 directories, 4 files

# use ascp
GSE130636.down.fastq <- DownloadFastq(
  gsm.df = GSE130636.runs, out.folder = "~/gefetch2r/doc/download_fastq/ascp",
  download.method = "ascp", ascp.path = "~/.aspera/connect/bin/ascp", max.rate = "300m",
  parallel = TRUE, use.cores = 2, format.10x = TRUE, remove.raw = TRUE
)

# use wget
GSE130636.down.fastq <- DownloadFastq(
  gsm.df = GSE130636.runs, out.folder = "~/gefetch2r/doc/download_fastq/wget",
  download.method = "wget", wget.path = "/usr/bin/wget", timeout = 360000,
  parallel = TRUE, use.cores = 2, format.10x = TRUE, remove.raw = TRUE
)

Download bam

Extract all samples (runs)

GEfetch2R can extract sample information and run number with GEO accessions or users can also provide a dataframe containing the run number of interested samples.

GSE138266.runs <- ExtractRun(acce = "GSE138266", platform = "GPL18573")
GSE138266.runs[1:3, c("run", "experiment", "gsm_name", "title", "geo_accession", "source_name_ch1", "organism_ch1", "characteristics_ch1")]
#                     run experiment   gsm_name       title geo_accession     source_name_ch1
# SRR10211551 SRR10211551 SRX6931239 GSM4104122 MS19270_CSF    GSM4104122 Cerebrospinal fluid
# SRR10211552 SRR10211552 SRX6931240 GSM4104123 MS58637_CSF    GSM4104123 Cerebrospinal fluid
# SRR10211553 SRR10211553 SRX6931241 GSM4104124 MS71658_CSF    GSM4104124 Cerebrospinal fluid
#             organism_ch1                   characteristics_ch1
# SRR10211551 Homo sapiens disease condition: Multiple Sclerosis
# SRR10211552 Homo sapiens disease condition: Multiple Sclerosis
# SRR10211553 Homo sapiens disease condition: Multiple Sclerosis

Download bam from SRA

With the dataframe contains gsm and run number, GEfetch2R provides DownloadBam to download bam files using prefetch. It supports 10x generated bam files and normal bam files.

  • 10x generated bam: While bam files generated from 10x softwares (e.g. CellRanger) contain custom tags which are not kept when using default parameters of prefetch, GEfetch2R adds --type TenX to make sure the downloaded bam files contain these tags.
  • normal bam: For normal bam files, DownloadBam will download sra files first and then convert sra files to bam files with sam-dump. After testing the efficiency of prefetch + sam-dump and sam-dump, the former is much faster than the latter (52G sra and 72G bam files):
# # use prefetch to download sra file
# prefetch -X 60G SRR1976036
# # real    117m26.334s
# # user    16m42.062s
# # sys 3m28.295s

# # use sam-dump to convert sra to bam
# time (sam-dump SRR1976036.sra | samtools view -bS - -o SRR1976036.bam)
# # real    536m2.721s
# # user    749m41.421s
# # sys 20m49.069s


# use sam-dump to download bam directly
# time (sam-dump SRR1976036 | samtools view -bS - -o SRR1976036.bam)
# # more than 36hrs only get ~3G bam files, too slow

The returned value is a dataframe containing failed runs (either failed to download sra files or failed to convert to bam files for normal bam; failed to download bam files for 10x generated bam). If not NULL, users can re-run DownloadBam by setting gsm.df to the returned value. The following is an example to download 10x generated bam file:

# a small test
GSE138266.runs <- GSE138266.runs[GSE138266.runs$run %in% c("SRR10211566"), ]
# download (slow, use wget instead)
GSE138266.down <- DownloadBam(
  gsm.df = GSE138266.runs,
  prefetch.path = "~/data/software/sratoolkit.2.11.2-centos_linux64/bin/prefetch",
  out.folder = "~/gefetch2r/doc/download_bam"
)
# GSE138266.down is null or dataframe contains failed runs

Download bam from ENA

The returned value is a dataframe containing failed runs. If not NULL, users can re-run DownloadBam by setting gsm.df to the returned value. The following is an example to download 10x generated bam file from ENA:

# download.file
GSE138266.down <- DownloadBam(
  gsm.df = GSE138266.runs, download.method = "download.file",
  timeout = 3600, rename = TRUE, out.folder = "~/gefetch2r/doc/download_bam/download_file",
  parallel = TRUE, use.cores = 2
)

# ascp
GSE138266.down <- DownloadBam(
  gsm.df = GSE138266.runs, download.method = "ascp",
  ascp.path = "~/.aspera/connect/bin/ascp", max.rate = "300m",
  rename = TRUE, out.folder = "~/gefetch2r/doc/download_bam/ascp",
  parallel = TRUE, use.cores = 2
)

# wget
GSE138266.down <- DownloadBam(
  gsm.df = GSE138266.runs, download.method = "wget",
  wget.path = "/usr/bin/wget", timeout = 36000000,
  rename = TRUE, out.folder = "~/gefetch2r/doc/download_bam/wget",
  parallel = TRUE, use.cores = 2
)
# # The output structure
# tree ~/gefetch2r/doc/download_bam/wget
# ~/gefetch2r/doc/download_bam/wget
# └── GSM4104137
#     └── SRR10211566
#         ├── bamfiles_MS60249_PBMC_possorted_genome_bam_1.bam
#         └── bamfiles_MS60249_PBMC_possorted_genome_bam_1.bam.bai
#
# 2 directories, 2 files

Convert bam to fastq

With downloaded bam files, GEfetch2R provides function Bam2Fastq to convert bam files to fastq files. For bam files generated from 10x softwares, Bam2Fastq utilizes bamtofastq tool developed by 10x Genomics, otherwise, samtools is utilized.

The returned value is a vector of bam files failed to convert to fastq files. If not NULL, users can re-run Bam2Fastq by setting bam.path to the returned value.

GSE138266.convert <- Bam2Fastq(
  bam.folder = "~/gefetch2r/doc/download_bam/wget",
  bam.type = "10x",
  bamtofastq.path = "~/data/software/bamtofastq_linux",
  bamtofastq.paras = "--nthreads 4"
)
# # The output structure
# tree ~/gefetch2r/doc/download_bam/wget
# ~/gefetch2r/doc/download_bam/wget
# └── GSM4104137
#     └── SRR10211566
#         ├── bam2fastq                                                  # generated by Bam2Fastq, can remove
#         │   └── MS60249_PBMC_2_0_MissingLibrary_1_H72VGBGX2
#         ├── bamfiles_MS60249_PBMC_possorted_genome_bam_1.bam
#         ├── bamfiles_MS60249_PBMC_possorted_genome_bam_1.bam.bai
#         ├── SRR10211566_S1_L001_R1_001.fastq.gz                        # generated by Bam2Fastq
#         └── SRR10211566_S1_L001_R2_001.fastq.gz                        # generated by Bam2Fastq
#
# 4 directories, 4 files

# bulk RNA-seq
GSE127942.convert <- Bam2Fastq(
  bam.folder = "~/gefetch2r/doc/fastq2r2/mapping/bulkRNAseq",
  bam.type = "other",
  bamtofastq.path = "~/anaconda3/bin/samtools",
  bamtofastq.paras = ""
)
# # The output structure
# tree ~/gefetch2r/doc/fastq2r2/mapping/bulkRNAseq
# ~/gefetch2r/doc/fastq2r2/mapping/bulkRNAseq
# ├── GSM3656922
# │   ├── Aligned.sortedByCoord.out.bam
# │   ├── Aligned.sortedByCoord.out.sortname.bam               # generate by Bam2Fastq
# │   ├── bam2fastq                                            # generate by Bam2Fastq
# │   │   ├── Aligned.sortedByCoord.out.sortname_1.fastq.gz    # generate by Bam2Fastq
# │   │   └── Aligned.sortedByCoord.out.sortname_2.fastq.gz    # generate by Bam2Fastq
# │   └── GSM3656922.txt
# │   ......
# └── GSM3656923
#     ├── Aligned.sortedByCoord.out.bam
#     ├── Aligned.sortedByCoord.out.sortname.bam               # generate by Bam2Fastq
#     ├── bam2fastq                                            # generate by Bam2Fastq
#     │   ├── Aligned.sortedByCoord.out.sortname_1.fastq.gz    # generate by Bam2Fastq
#     │   └── Aligned.sortedByCoord.out.sortname_2.fastq.gz    # generate by Bam2Fastq
#     └── GSM3656923.txt
#     ......
#
# 16 directories, 18 files

Load fastq to R

With downloaded/converted fastq files, GEfetch2R provides function Fastq2R to align them to reference genome with CellRanger (10x-generated fastq files) or STAR (Smart-seq2 or bulk RNA-seq data), and load the output to Seurat (10x-generated fastq files) or DESEq2 (Smart-seq2 or bulk RNA-seq data).

Here, we use the downloaded fastq files as an example. There are two runs (SRR9004346 and SRR9004351) corresponding to sample name GSM3745993. When running CellRanger, we will process SRR9004346 and SRR9004351 as a single merged sample by specifying --sample=SRR9004346,SRR9004351:

# run CellRanger (10x Genomics)
# the sample.dir corresponding to sra.folder/GSMXXXX (SplitSRA) or out.folder/GSMXXXX (DownloadFastq)
GSE130636.gsms <- file.path("~/gefetch2r/doc/download_fastq/wget", "GSM3745993")
GSE130636.seu <- Fastq2R(
  sample.dir = GSE130636.gsms,
  ref = "~/refdata-cellranger-GRCh38-3.0.0",
  method = "CellRanger",
  out.folder = "~/gefetch2r/doc/download_fastq/cellranger",
  st.path = "~/cellranger-3.0.2/cellranger",
  st.paras = "--chemistry=auto --jobmode=local"
)
GSE130636.seu
# An object of class Seurat
# 33538 features across 956 samples within 1 assay
# Active assay: RNA (33538 features, 0 variable features)

# # The output structure
# tree -L 2 ~/gefetch2r/doc/download_fastq/cellranger
# ~/gefetch2r/doc/download_fastq/cellranger
# └── GSM3745993
#     ├── _cmdline
#     ├── _filelist
#     ├── _finalstate
#     ├── GSM3745993.mri.tgz
#     ├── _invocation
#     ├── _jobmode
#     ├── _log
#     ├── _mrosource
#     ├── outs
#     ├── _perf
#     ├── SC_RNA_COUNTER_CS
#     ├── _sitecheck
#     ├── _tags
#     ├── _timestamp
#     ├── _uuid
#     ├── _vdrkill
#     └── _versions
#
# 3 directories, 15 files


# run STAR (Smart-seq2 or bulk RNA-seq)
GSE127942.gsms <- file.path("~/gefetch2r/doc/fastq2r2/fastq/bulkRNAseq", c("GSM3656922", "GSM3656923"))
GSE127942.obj <- Fastq2R(
  sample.dir = GSE127942.gsms,
  ref = "~/Mus_musculus/star",
  method = "STAR",
  out.folder = "~/gefetch2r/doc/download_fastq/STAR",
  st.path = "~/anaconda3/bin/STAR",
  st.paras = "--outBAMsortingThreadN 4 --twopassMode None"
)
GSE127942.obj
# class: DESeqDataSet
# dim: 55573 2
# metadata(1): version
# assays(1): counts
# rownames(55573): ENSMUSG00000000001 ENSMUSG00000000003 ...
#   ENSMUSG00000118487 ENSMUSG00000118488
# rowData names(0):
# colnames(2): GSM3656922 GSM3656923
# colData names(1): condition