Availability of Primary Data Drops Off Precipitously Over Time

Less than 40% of Literature Data Accessible After Just Two Years
Line graph shows exponential drop in percentage of research data available 2 years after paper publication
Because much of the relevant information is either uncaptured or unstructured, personal communication with the original authors via email on phone calls is often necessary to make sense of any raw source files (if they can even be located).
Vines, Timothy H., et al. "The availability of research data declines rapidly with article age." Current Biology 24.1 (2014): 94-97. https://doi.org/10.1016/j.cub.2013.11.014
  • Tremendous human and societal effort goes into generating data that tragically fades rapidly over time. Data is often trapped in paper notebooks, hard-to-access files, and rigid data systems.
  • It is difficult to build on previous work, creating continuity challenges
  • Secondary data is often unavailable
    • Pertinent information from an experiment can often only be obtained with the assistance of the original bench scientist
    • Data provenance tracking is extremely difficult
  • Building on past data becomes more difficult as it ages
    • Combining new and old data sets is difficult when primary experimental data is not available. Even when available, method data is often missing or incomplete
    • Machine learning is stymied by missing information or data that is not reproducible or not well-structured
    • Lost opportunity for meta-analyses and other higher level conclusions

Data Generated Compounds in Value Over Time

A laboratory, company, or institution working in this highly structured and connected system can produce a dataset with unprecedented detail, sophistication, and complexity.
Imagine what you could do with just a year of all of your experimental data indexed and searchable on the web!
Scientific Data Generated on ECL
Line graph shows yearly linear growth of scientific data objects generated on the ECL
Data from ECL is
  • Highly structured
  • Indexed
  • Searchable
  • Instantly available online
  • Linked to the experimental techniques from which it was generated
  • Push-button reproducible
  • All data captured digitally and automatically
    • Eliminates need for paper lab notebooks and printouts
    • No more effort wasted in data transfer to ELN
  • Everything accessible securely on the cloud to all users
  • Methods valid and reproducible for years after initial execution
  • Data is automatically structured, indexed, and made quickly searchable for instantaneous retrieval
  • Standardized data ontology enables data mining and machine learning
  • All data is traceable and linked to its source techniques and lab notebook context
  • Data gathered on enterprise accounts is compliant with FDA data retention and access policies