Uploading Datasets
This guide walks you through uploading a BIDS dataset to NEMAR.
Prerequisites
Section titled “Prerequisites”Before uploading:
- Dataset is in valid BIDS format
- Logged in with
nemar auth login - git-annex installed
- GitHub CLI (
gh) installed and authenticated - Sandbox training completed (
nemar sandbox)
Step 1: Validate Your Dataset
Section titled “Step 1: Validate Your Dataset”Always validate before uploading:
nemar dataset validate ./my-datasetCommon Validation Issues
Section titled “Common Validation Issues”| Issue | Solution |
|---|---|
| Missing dataset_description.json | Create the required BIDS metadata file |
| Invalid JSON | Check for syntax errors in JSON files |
| Missing required fields | Add Name and BIDSVersion to dataset_description.json |
| Invalid modality data | Ensure data files match BIDS naming conventions |
Step 2: Upload
Section titled “Step 2: Upload”nemar dataset upload ./my-datasetOptions
Section titled “Options”| Option | Description |
|---|---|
--name, -n | Dataset name (defaults to BIDS Name field, then directory name) |
--description | Brief description |
--skip-validation | Skip BIDS validation (not recommended) |
--skip-orcid | Skip co-author ORCID collection |
--dry-run | Show what would be uploaded without doing it |
--restart | Clear upload progress and re-upload all files |
-j, --jobs | Number of parallel upload jobs (default: 4) |
-y, --yes | Skip confirmation and proceed |
Step 3: What Happens
Section titled “Step 3: What Happens”The upload process:
- Prerequisite Check - Verifies required tools (git-annex, gh, aws) are installed with platform-specific install guidance if missing
- Auth and Prerequisites - Verifies login, GitHub authentication (HTTPS preferred, SSH as fallback)
- BIDS Validation - Runs the official BIDS validator (unless skipped)
- File Manifest - Collects files and co-author ORCIDs
- License Enforcement - Detects license from
dataset_description.jsonor LICENSE file; prompts to select one if missing. Validates the license allows research redistribution (see License Requirements below) - Data Provenance - For derived datasets, collects source dataset DOIs and checks license compatibility
- Confirmation - Shows upload plan for review
- Dataset Registration - Creates dataset record and private GitHub repo
- GitHub Invitation - Accepts collaborator invitation to the repo
- git-annex Init - Initializes git-annex and configures S3 remote
- Data Upload - Uploads large files to S3 (uses AWS CLI fast-path when available)
- Metadata and Push - Writes metadata, commits, and pushes to GitHub
- CI Deployment - Deploys GitHub Actions workflows for validation
License Requirements
Section titled “License Requirements”Every dataset uploaded to NEMAR must have a license that allows research redistribution. The CLI will:
- Auto-detect your license from
dataset_description.json(theLicensefield) or a LICENSE file - Prompt you to choose if no license is found, offering recommended open data licenses
- Validate that the chosen license allows research use
- Create a LICENSE file if one does not exist
Recommended licenses (most permissive to most restrictive):
| License | Description |
|---|---|
| CC0-1.0 | Public domain dedication (no restrictions) |
| PDDL-1.0 | Public Domain Dedication and License |
| CC-BY-4.0 | Attribution only |
| CC-BY-SA-4.0 | Attribution + ShareAlike |
| CC-BY-NC-4.0 | Attribution + NonCommercial |
| CC-BY-NC-SA-4.0 | Attribution + NonCommercial + ShareAlike |
| ODC-By-1.0 | Open Data Commons Attribution |
| ODbL-1.0 | Open Database License |
You can also enter any valid SPDX license identifier manually if it allows research use.
GitHub Authentication
Section titled “GitHub Authentication”The CLI uses HTTPS-first authentication for GitHub operations:
- CI/CD: Uses
GH_TOKENenvironment variable if set - Local (preferred): Uses
gh auth tokenfrom GitHub CLI, so rungh auth loginfirst - Fallback: SSH key authentication (
nemar auth setup-ssh)
Step 4: Making Updates
Section titled “Step 4: Making Updates”After initial upload, push changes using the CLI:
cd nm000104 # Your dataset directory
# Make changes, then save and pushnemar dataset save -m "Add subjects 101-110"nemar dataset pushOr create a formal update PR:
nemar dataset update ./nm000104Troubleshooting
Section titled “Troubleshooting”Upload Fails with Authentication Error
Section titled “Upload Fails with Authentication Error”# Check login statusnemar auth status --refresh
# Re-login if needednemar auth logingit-annex Errors
Section titled “git-annex Errors”# Ensure git-annex is configuredgit annex version
# Re-initialize if neededgit annex initUpload Interrupted or Timed Out
Section titled “Upload Interrupted or Timed Out”The upload tracks progress automatically. Re-run the same command to resume:
# Resume from where it left offnemar dataset upload ./my-dataset
# Or start fresh if resume failsnemar dataset upload ./my-dataset --restart