Availability of Primary Data Drops Off Precipitously Over Time

Less than 40% of Literature Data Accessible After Just Two Years
Line graph shows exponential drop in percentage of research data available 2 years after paper publication
Because much of the relevant information is either uncaptured or unstructured, personal communication with the original authors via email on phone calls is often necessary to make sense of any raw source files (if they can even be located).
Vines, Timothy H., et al. "The availability of research data declines rapidly with article age." Current Biology 24.1 (2014): 94-97. https://doi.org/10.1016/j.cub.2013.11.014
  • Tremendous human and societal effort goes into generating data that tragically fades rapidly over time. Data is often trapped in paper notebooks, hard-to-access files, and rigid data systems.
  • It is difficult to build on previous work, creating continuity challenges
  • Secondary data is often unavailable
    • Pertinent information from an experiment can often only be obtained with the assistance of the original bench scientist
    • Data provenance tracking is extremely difficult
  • Building on past data becomes more difficult as it ages
    • Combining new and old data sets is difficult when primary experimental data is not available. Even when available, method data is often missing or incomplete
    • Machine learning is stymied by missing information or data that is not reproducible or not well-structured
    • Lost opportunity for meta-analyses and other higher level conclusions
  • Traditional labs require co-location with scientists, technicians, and support staff, resulting in a significant number of employees who depend on access to the lab
  • Some of the top concerns about remote working1 in research were
    • Sharing thoughts with colleagues
    • Collecting data
    • Getting support from colleagues
    • Having the necessary infrastructure
  • Access to scientific talent can be limited by the ability to co-locate personnel with the lab itself, limiting the talent pool for organizations - particularly organizations outside of industry hubs
  • Business risk arising from natural disasters and public health crises is impossible to mitigate and any situation blocking access to a lab leads to significant loss of work in progress and halting of new experimentation. A 2020 McKinsey survey reported “Business-as-usual operations are also clearly being affected, as companies report that R&D labs are operating at below 50 percent of normal capacity.” and that “Across all R&D related groups, companies estimate productivity has fallen by between 25 and 75 percent due to remote working.”2
1Aczel, B., Kovacs, M., Lippe, T. V. D. & Szaszi, B. Researchers working from home: Benefits and challenges. Plos One 16 (2021)
2Agrawal, G., Parry, B., Suresh, B. & Westra, A. COVID-19 implications for life sciences R&D: Recovery and the next normal. McKinsey & Company (2020). https://www.mckinsey.com/industries/pharmaceuticals-and-medical-products/our-insights/covid-19-implications-for-life-sciences-r-and-d-recovery-and-the-next-normal

Data Generated Compounds in Value Over Time

A laboratory, company, or institution working in this highly structured and connected system can produce a dataset with unprecedented detail, sophistication, and complexity.
Imagine what you could do with just a year of all of your experimental data indexed and searchable on the web!
Scientific Data Generated on ECL
Line graph shows yearly linear growth of scientific data objects generated on the ECL
Data from ECL is
  • Highly structured
  • Indexed
  • Searchable
  • Instantly available online
  • Linked to the experimental techniques from which it was generated
  • Push-button reproducible
  • All data captured digitally and automatically
    • Eliminates need for paper lab notebooks and printouts
    • No more effort wasted in data transfer to ELN
  • Everything accessible on the cloud to all users with valid credentials
  • Methods valid and reproducible for years after initial execution
  • Data is automatically structured, indexed, and made quickly searchable for instantaneous retrieval
  • Standardized data ontology amenable for data mining and machine learning
  • All data is traceable and linked to its source techniques and lab notebook context
  • Data gathered on enterprise accounts is compliant with FDA data retention and access policies
  • A Pew research poll indicates that 54% of surveyed workers desire to work from home after the coronavirus outbreak ends, compared to 20% who were working remotely previously1
  • While only 36% of all Life, physical and social science occupations are able to be performed remotely2
  • There is also a McKinsey study saying that 52% would like a more flexible working model postpandemic3
  • The Cloud Lab can be accessed from any computer connected to the internet, anywhere in the world, in any time zone to perform all of the experiments scientists can conduct in a traditional lab
  • Without being physically tied to the lab, key scientific talent is accessible independent of geography, broadening the talent pool
  • The Cloud Lab can help large businesses mitigate risk arising from natural disasters or public health crises by allowing scientists to conduct experimentation work when they are unable to access the traditional lab environment
1Parker, K., Horowitz, J. M. & Minkin, R. How Coronavirus Has Changed the Way Americans Work. Pew Research Center's Social & Demographic Trends Project (2021). https://www.pewresearch.org/social-trends/2020/12/09/how-the-coronavirus-outbreak-has-and-hasnt-changed-the-way-americans-work/
2Dingel, J. I. & Neiman, B. How many jobs can be done at home? Journal of Public Economics 189, 104235 (2020)
3 Alexander, A., Smet, A. D., Langstaff, M. & Ravid, D. What employees are saying about the future of remote work. McKinsey & Company (2021). https://www.mckinsey.com/business-functions/organization/our-insights/what-employees-are-saying-about-the-future-of-remote-work