Ingest + Export¶
--workflow ingest_export
Downloads full Seurat objects from LabKey for each sample and immediately exports the raw RNA count matrix as a 10x-like directory. No GPU, no HPC required — runs on a local Mac or any Linux host.
Stage-by-stage dataflow¶
flowchart TD
SS["**samplesheet.csv**
sample_id · output_file_id · species"]
INGEST["**INGEST**
(Rdiscvr)
Downloads full Seurat object from LabKey"]
EXPORT_COUNTS["**EXPORT_COUNTS**
(CellMembrane)
Extracts raw counts → 10x-like matrix dir"]
OUT_RDS["outputs/ingest/{id}/{id}.rds
Full Seurat object"]
OUT_COUNTS["outputs/counts/{id}/{id}_counts/
matrix.mtx · features.tsv
barcodes.tsv · obs_meta.csv"]
SS --> INGEST
INGEST -->|"tuple(meta, .rds)"| OUT_RDS
INGEST -->|"tuple(meta, .rds)"| EXPORT_COUNTS
EXPORT_COUNTS -->|"tuple(meta, counts_dir/)"| OUT_COUNTS
Inputs¶
Samplesheet¶
Path: --input (default data/samplesheet.csv)
See Data Formats → Samplesheet.
Required parameters¶
| Parameter | Description |
|---|---|
--labkey_base_url |
LabKey server base URL |
--labkey_folder |
LabKey folder path |
Optional parameters¶
| Parameter | Default | Description |
|---|---|---|
--export_assay |
RNA |
Seurat assay to export as count matrix |
--outdir |
outputs/ |
Output directory |
Outputs¶
INGEST → outputs/ingest/{sample_id}/¶
| File | Description |
|---|---|
{sample_id}.rds |
Full Seurat object (counts + all metadata) downloaded from LabKey. Contains at minimum the cells passing QC and their RNA assay. |
EXPORT_COUNTS → outputs/counts/{sample_id}/{sample_id}_counts/¶
A 10x-like matrix directory compatible with e.g. Seurat::Read10X(), scanpy.read_10x_mtx(), or BPCells::open_matrix_dir():
| File | Description |
|---|---|
matrix.mtx |
Sparse raw count matrix in Market Exchange (MatrixMarket) format. Rows = genes, columns = cells. |
features.tsv |
Gene names, one per row, matching row order in matrix.mtx. |
barcodes.tsv |
Cell barcodes, one per row, matching column order in matrix.mtx. |
obs_meta.csv |
Cell-level metadata from seurat_object[[]] with additional columns sample_id, species, output_file_id, and barcode. |
Synthetic example export¶
The docs and CI use a seeded fixture bundle in tests/fixtures/synthetic_trial_data/ so the exported count layout is visible without any live Prime-seq download.

This is the same file shape produced by EXPORT_COUNTS: matrix.mtx, features.tsv, barcodes.tsv, and obs_meta.csv in a per-sample directory.
For the generated code-level reference, see API Reference → Workflows.
Running locally¶
nextflow run main.nf \
--workflow ingest_export \
--labkey_base_url https://labkey.example.org \
--labkey_folder /My/Project/Folder
On macOS (or Linux without SLURM) the local executor is auto-selected; no -profile flag is required.
To limit output location:
nextflow run main.nf \
--workflow ingest_export \
--outdir ./outputs/dev \
--labkey_base_url https://labkey.example.org \
--labkey_folder /My/Project/Folder
Running on HPC¶
sbatch slurm_nextflow.sh \
--workflow ingest_export \
--labkey_base_url https://labkey.example.org \
--labkey_folder /My/Project/Folder
Resource profile¶
| Step | CPUs | Memory | Wall time |
|---|---|---|---|
| INGEST | 4 | 32 GB | 4 h |
| EXPORT_COUNTS | 4 | 32 GB | 4 h |