Skip to content

Uploading Datasets

This guide walks you through uploading a BIDS dataset to NEMAR.

Before uploading:

  • Dataset is in valid BIDS format
  • Logged in with nemar auth login
  • git-annex installed
  • GitHub CLI (gh) installed and authenticated
  • Sandbox training completed (nemar sandbox)

Always validate before uploading:

Terminal window
nemar dataset validate ./my-dataset
IssueSolution
Missing dataset_description.jsonCreate the required BIDS metadata file
Invalid JSONCheck for syntax errors in JSON files
Missing required fieldsAdd Name and BIDSVersion to dataset_description.json
Invalid modality dataEnsure data files match BIDS naming conventions
Terminal window
nemar dataset upload ./my-dataset
OptionDescription
--name, -nDataset name (defaults to BIDS Name field, then directory name)
--descriptionBrief description
--skip-validationSkip BIDS validation (not recommended)
--skip-orcidSkip co-author ORCID collection
--dry-runShow what would be uploaded without doing it
--restartClear upload progress and re-upload all files
-j, --jobsNumber of parallel upload jobs (default: 4)
-y, --yesSkip confirmation and proceed

The upload process:

  1. Prerequisite Check - Verifies required tools (git-annex, gh, aws) are installed with platform-specific install guidance if missing
  2. Auth and Prerequisites - Verifies login, GitHub authentication (HTTPS preferred, SSH as fallback)
  3. BIDS Validation - Runs the official BIDS validator (unless skipped)
  4. File Manifest - Collects files and co-author ORCIDs
  5. License Enforcement - Detects license from dataset_description.json or LICENSE file; prompts to select one if missing. Validates the license allows research redistribution (see License Requirements below)
  6. Data Provenance - For derived datasets, collects source dataset DOIs and checks license compatibility
  7. Confirmation - Shows upload plan for review
  8. Dataset Registration - Creates dataset record and private GitHub repo
  9. GitHub Invitation - Accepts collaborator invitation to the repo
  10. git-annex Init - Initializes git-annex and configures S3 remote
  11. Data Upload - Uploads large files to S3 (uses AWS CLI fast-path when available)
  12. Metadata and Push - Writes metadata, commits, and pushes to GitHub
  13. CI Deployment - Deploys GitHub Actions workflows for validation

Every dataset uploaded to NEMAR must have a license that allows research redistribution. The CLI will:

  1. Auto-detect your license from dataset_description.json (the License field) or a LICENSE file
  2. Prompt you to choose if no license is found, offering recommended open data licenses
  3. Validate that the chosen license allows research use
  4. Create a LICENSE file if one does not exist

Recommended licenses (most permissive to most restrictive):

LicenseDescription
CC0-1.0Public domain dedication (no restrictions)
PDDL-1.0Public Domain Dedication and License
CC-BY-4.0Attribution only
CC-BY-SA-4.0Attribution + ShareAlike
CC-BY-NC-4.0Attribution + NonCommercial
CC-BY-NC-SA-4.0Attribution + NonCommercial + ShareAlike
ODC-By-1.0Open Data Commons Attribution
ODbL-1.0Open Database License

You can also enter any valid SPDX license identifier manually if it allows research use.

The CLI uses HTTPS-first authentication for GitHub operations:

  1. CI/CD: Uses GH_TOKEN environment variable if set
  2. Local (preferred): Uses gh auth token from GitHub CLI, so run gh auth login first
  3. Fallback: SSH key authentication (nemar auth setup-ssh)

After initial upload, push changes using the CLI:

Terminal window
cd nm000104 # Your dataset directory
# Make changes, then save and push
nemar dataset save -m "Add subjects 101-110"
nemar dataset push

Or create a formal update PR:

Terminal window
nemar dataset update ./nm000104
Terminal window
# Check login status
nemar auth status --refresh
# Re-login if needed
nemar auth login
Terminal window
# Ensure git-annex is configured
git annex version
# Re-initialize if needed
git annex init

The upload tracks progress automatically. Re-run the same command to resume:

Terminal window
# Resume from where it left off
nemar dataset upload ./my-dataset
# Or start fresh if resume fails
nemar dataset upload ./my-dataset --restart