Getting Started With Submissions

Overview

In order to make your data accessible, searchable and assessable you should submit as much metadata as possible to the 4DN system along with the raw files you have generated in your experiments.

These pages are designed to

  • show you how to find out what kind of metadata we collect for your particular type of experiment
  • introduce the mechanisms by which you can submit your metadata and data to the 4DN data portal.

For an overview of the metadata structure and relationships between different items please see the slides available on the metadata introductory page.

We have three primary ways that you can submit data to the 4DN data portal.

Notes for prospective submitters

If you would like submit data to the portal:

  • You will need to create a user account.
  • Contact us at support@4dnucleome.org to discuss your submission and be granted submitter privileges for your account. As necessary we can set up a Zoom call to discuss the details of the submission process and the most convenient approach for your existing system.
  • Please skim through the metadata structure.
  • Check out the other pages in the Help menu for detailed information on the submission process.
  • Of note are the required metadata for the biological samples used in experiments, which is specified on this page.
  • IMPORTANT: If you are planning to submit experiments that include genomic data from human patient samples please let us know as soon as possible. This data likely requires controlled access and dbGaP registration. If you are not sure if the data you are generating should be considered controlled access please contact the relevant offices at your institute, your NIH program officers or Ian Fingerman, who coordinates controlled data issues for 4DN, with questions. Any personal health information (PHI) should not be submitted with your experimental metadata. Generally any genomic data generated from human tissue or cell lines must be explicitly consented for broad sharing of genomic information and be considered controlled access data. For more info consult the NIH Genomic Data Sharing Policy

Web Submission

The online web submission forms are best used

  • To submit one or a few experiments.
  • To edit one or a few fields of an already submitted but not yet released item.
  • As a hands on way to gain familiarity with the 4DN data model.

Documentation on how to get started with this interface is here.

Data Submission via Spreadsheet

The excel metadata workbooks

  • Are useful for submitting metadata and data for several experiments or biosamples
  • Can be used to make bulk edits of submitted but not yet released metadata
  • Contain multiple sheets where each sheet corresponds to an object type and each column a field of metadata
  • Can be generated using the Submit4DN software
  • Are used as input to the Submit4DN software which validates submissions and pushes the content of the forms to our database.

Documentation of the data submission process using these forms can be found here.

REST API

For both meta/data submission and retrival, you can also access our database directly via the REST-API.

  • Data objects exchanged with the server conform to the standard JavaScript Object Notation (JSON) format.
  • Our implementation is analagous to the one developed by the ENCODE DCC.

If you would like to directly interact with the REST API for data submission see the documentation here.

Notes on Experiments and Replicate Sets

Biological replicates

  • The 4DN Consortium strongly encourages that experiments be performed using at least two different preparations of the same source biomaterial - i.e. bioreplicates.
  • When submitting metadata you should submit two Experiments that use the same Biosource, but have different Biosamples.
  • In many cases the only difference between Biosamples may be the dates at which the cell culture or tissue was harvested.
  • The experimental techniques and parameters will be shared by all experiments of the same bioreplicate set.

Technical replicates

  • Performing the same assay on generating multiple libraries from the same sample are considered technical replicates.

  • Note that multiple sequencing runs performed at different times using the same library can be linked to a single replicate experiment.

Submitting replicate information

  • The replicate information is stored and represented as a set of experiments that includes labels indicating the replicate type and replicate number of each experiment in the set.

  • The mechanism that you use to submit your metadata will dictate the type of item that you will associate replicate information with

    • In excel workbooks bioreplicate and technical replicate numbers are entered in the Experiment sheet.

    • Using the API you directly associate the replicate information (i.e. replicate number and the experiment identifier) with the ExperimentSetReplicate objects.

    • Using the web submission interface the replicate numbers and linked experiments are added from the ExperimentSetReplicate page

  • In the database the information will always end up directly associated with ExperimentSetReplicate objects.

  • Specific details on formatting information regarding replicates is given in the Spreadsheet Submission page.

  • When submitting using the REST API you should format your json according to the specifications in the schema as described in the REST API page.

Referencing existing objects

Using aliases

Aliases are a convenient way for you to refer to other items that you are submitting or have submitted in the past.

  • An alias is a lab specific identifier that you can assign to any item
  • An alias takes the form of lab:id_string eg. peter-park-lab:my-alias.
  • An alias must be unique within all items.
  • Generally it is good practice to assign an alias to any item that you submit
  • If you use the Online Submission Interface to create new items, designating an alias is the first required step.
  • Once you submit an alias for an Item, that alias can be used as an identifier for that Item in the current submission as well as in any subsequent submission.

Other ways to reference existing items

You don't need to use an alias if you are referencing an item that already exists in the database.

Any of the following can be used to reference an existing item in an excel sheet or when using the REST-API.

  • accession - Objects of some types (eg. Files, Experiments, Biosamples, Biosources, Individuals...) are accessioned, e.g. 4DNEX4723419.
  • uuid - Every item in our database is assigned a “uuid” upon its creation, e.g. “44d3cdd1-a842-408e-9a60-7afadca11575”.
  • type/id in a few cases object specific identifying terms are also available, eg. award number for awards, or lab name for labs. (see table below)
ObjectFieldtype/IDID
Labname/labs/peter-park-lab/peter-park-lab
Awardnumber/awards/ODO1234567-01/ODO1234567-01
Useremail/users/test@test.com/test@test.com
Vendorname/vendors/fermentas/fermentas
Enzymename/enzymes/HindIII/HindIII
Constructname/constructs/GFP-H1B/GFP-H1B


  • Many of the objects that you may need for your submissions may already exist on the 4DN web site.
  • We encourage submitters to use existing database items as much as possible.
  • Common reusable items include:
    • Vendors
    • Enzymes
    • Biosources
    • Protocols
  • For example, if there is an existing biosource (e.g. accession 4DNSRV3SKQ8M for H1-hESC (Tier 1) ) for the new biosample you are creating, you should reference the existing one instead of creating a new one.

Getting Connection Keys for the 4DN-DCIC servers

If you have been designated as a submitter for the project and plan to use either our spreadsheet-based submission system or the REST-API an access key and a secret key are required to establish a connection to the 4DN database and to fetch, upload (post), or change (patch) data. Please follow these steps to get your keys.

  1. Log in to the 4DN website with your username (email) and password. If you have not yet created an account, see this page for instructions.
  2. Once logged in, go to your ”Profile” page by clicking Account on the upper right side of the page.
  3. In your profile page, click the green “Add Access Key” button, and copy the “access key ID” and “secret access key” values from the pop-up page. Note that once the pop-up page disappears you will not be able to see the secret access key value. However, if you forget or lose your secret key you can always delete and add new access keys from your profile page at any time.
https://s3.amazonaws.com/4dn-dcic-public/static-pages/AddAccessKey.png
  1. Create a file to store this information.
    • The default parameters used by the submission software is to look for a file named "keypairs.json" in your home directory.
    • However, you can also specify your own filename and file location as parameters to the software (see below).
    • The key information is stored in json format and is used to establish a secure connection.
    • The json must be formatted as shown below - replace key and secret with your new “Access Key ID” and “Secret Access Key”.
    • You can use the same key and secret to use the 4DN REST-API.

Sample content for keypairs.json

{
  "default": {
    "key": "ABCDEFG",
    "secret": "abcdefabcd1ab",
    "server": "https://data.4dnucleome.org/"
  }
}

Tip: If you don’t want to use that filename or keep the file in your home directory you can use:

  • the --keyfile parameter as an argument to any of the scripts to provide the path to your keypairs file.

  • the --key parameter to indicate a stored key name.

    import_data --keyfile Path/name_of_file.json --key NotDefault
    

Schema information

Schema FilenameWorksheet NameCollection Name(s)
analysis_step.jsonAnalysisStepanalysis-steps, analysis_step
award.jsonAwardaward(s)
biosample.jsonBiosamplebiosample(s)
biosample_cell_culture.jsonBiosampleCellCulturebiosample-cell-cultures, biosample_cell_culture
biosource.jsonBiosourcebiosource(s)
construct.jsonConstructconstruct(s)
document.jsonDocumentdocument(s)
enzyme.jsonEnzymeenzyme(s)
experiment_atacseq.jsonExperimentAtacseqexperiments-atacseq, experiment_atacseq
experiment_capture_c.jsonExperimentCaptureCexperiments-capture-c, experiment_capture_c
experiment_chiapet.jsonExperimentChiapetexperiments-chiapet, experiment_chiapet
experiment_hi_c.jsonExperimentHiCexperiments-hi-c, experiment_hi_c
experiment_mic.jsonExperimentMicexperiments-mic, experiment_mic
experiment_repliseq.jsonExperimentRepliseqexperiments-repliseq, experiment_repliseq
experiment_seq.jsonExperimentSeqexperiments-seq, experiment_seq
experiment_set.jsonExperimentSetexperiment-sets, experiment_set
experiment_set_replicate.jsonExperimentSetReplicateexperiment-set-replicates, experiment_set_replicate
file_calibration.jsonFileCalibrationfiles-calibration, file_calibration
file_fasta.jsonFileFastafiles-fasta, file_fasta
file_fastq.jsonFileFastqfiles-fastq, file_fastq
file_processed.jsonFileProcessedfiles-processed, file_processed
file_reference.jsonFileReferencefiles-reference, file_reference
file_set.jsonFileSetfile-sets, file_set
file_set_calibration.jsonFileSetCalibrationfile-set-calibrations, file_set_calibration
genomic_region.jsonGenomicRegiongenomic-regions, genomic_region
image.jsonImageimage(s)
imaging_path.jsonImagingPathimaging-paths, imaging_path
individual_human.jsonIndividualHumanindividuals-human, individual_human
individual_mouse.jsonIndividualMouseindividuals-mouse, individual_mouse
lab.jsonLablab(s)
modification.jsonModificationmodification(s)
ontology.jsonOntologyontology(s)
ontology_term.jsonOntologyTermontology-terms, ontology_term
organism.jsonOrganismorganism(s)
protocol.jsonProtocolprotocol(s)
publication.jsonPublicationpublication(s)
publication_tracking.jsonPublicationTrackingpublication-trackings, publication_tracking
quality_metric_bamqc.jsonQualityMetricBamqcquality-metrics-bamqc, quality_metric_bamqc
quality_metric_fastqc.jsonQualityMetricFastqcquality-metrics-fastqc, quality_metric_fastqc
quality_metric_flag.jsonQualityMetricFlagquality-metric-flags, quality_metric_flag
quality_metric_pairsqc.jsonQualityMetricPairsqcquality-metrics-pairsqc, quality_metric_pairsqc
software.jsonSoftwaresoftware(s)
sop_map.jsonSopMapsop-maps, sop_map
summary_statistic.jsonSummaryStatisticsummary-statistics, summary_statistic
summary_statistic_hi_c.jsonSummaryStatisticHiCsummary-statistics-hi-c, summary_statistic_hi_c
target.jsonTargettarget(s)
treatment_chemical.jsonTreatmentChemicaltreatments-chemical, treatment_chemical
treatment_rnai.jsonTreatmentRnaitreatments-rnai, treatment_rnai
user.jsonUseruser(s)
vendor.jsonVendorvendor(s)
workflow.jsonWorkflowworkflow(s)
workflow_mapping.jsonWorkflowMappingworkflow-mappings, workflow_mapping
workflow_run.jsonWorkflowRunworkflow-runs, workflow_run
workflow_run_sbg.jsonWorkflowRunSbgworkflow-runs-sbg, workflow_run_sbg