In 2006, the UK Biobank (UKB)—a national and internationally-available long-term biobank funded by the Wellcome Trust and the UK Medical Research Council (MRC)—began an ambitious project in which they recruited 500,000 people between the ages of 40-69 to create a database that would be readily available to researchers in order to improve the prevention, diagnosis, and treatment of a wide range of diseases.
During the recruitment process the UKB collected a wide range of data, from blood pressure measurements to blood and urine samples. The result of such a large-scale collection is 500,000 samples with over 800,000 data files and more than 10 million variants.
Along with the initial collection of data, UKB’s principal investigator Rory Collins tells Clinical Research News the coordinating team has kept in close contact with consented participants.
“We’ve been sending out web questionnaires for them to complete, initially to get more information about their exposures, such as their diet, lifetime occupational history, which it wasn’t possible to get during their initial assessment due to time pressures,” Collins says. “But increasingly, we’re using this approach to follow aspects of their health beyond what we would get from their medical records – information that may be under-diagnosed and under-recorded in traditional health records, such as cognitive function, mood and depression, and quality of life.”
These data add a broader range of information about the participants’ health, something Collins says will increase biobank’s value for researchers.
The intended aim of the UKB was to provide rich, raw data for researchers who would then analyze the data, and enrich it, with derived data that they generate feedback into the resource for others to use.
“Any researcher can come along and say, ‘We’d like to have access to data for this specific health-related research.’ We review that request, and the researcher then gains access to the data for their stated research purpose,” says Collins.
As the amount of data has grown, however, the UKB realized they needed a different approach to how researchers access their data.
“[We] want to build a data analysis platform where these samples will reside,” says Collins. “Whereas up ‘til now we have been sending the samples to researchers for analysis, this will become increasingly more difficult due to the sheer scale of the biobank… [With our platform], researchers can go to the data rather than the data going to them.”
This will be a major step in democratizing access to the data, Collins says, “especially for researchers who don’t have a big IT capacity to store and analyze the data that we currently send to researchers.”
In the meantime, research groups (including those in companies) with substantial IT firepower are at an advantage in using the biobank.
“We are also now seeing researchers come along and say, ‘It would be great if we could do assays of the samples to turn the samples into data, which can then be used by other researchers’,” Collins says.
Enter: Regeneron, who wanted to undertake the exome sequencing and analysis of all 500,000 samples.
A “Win-Win” Scenario
Regeneron approached UKB and set up an agreement to sequence samples while creating a consortium of several major pharma companies, who all agreed to collaborate and help finance the sequencing.
Sequencing began initially with 100,000 samples, which proved daunting for Regeneron and—more specifically—their consortium partners, as many members were unable to process the 265 terabyte (TB) dataset that resulted from the sequencing.
Regeneron reached out to DNAnexus for assistance in providing a comprehensive delivery experience by combining scalable cloud tooling with a visual data integration solution, Richard Daly, CEO of California-based enterprise omics technology company DNAnexus, tells Clinical Research News. Fortunately, DNAnexus already had the necessary tools with its Apollo Platform, enabling researchers globally to perform at-scale clinico-genomic data science exploration, analysis, and discovery.
“[Apollo] was designed to alleviate the concerns that [the consortium had],” George Asimenos, DNAnexus’ CTO, tells Clinical Research News. “Not only the data but the metadata as well is presented in an easy to use experience enabling visual exploration and data analytics.”
DNAnexus designed a modified version of Apollo for Regeneron’s project, adding a cohort browser that was designed to democratize data access, giving diverse teams the ability to browse through 3,000 phenotypic fields and 15,000,000 genomic variants across the 100,000 samples.
Over 10,000 registered researchers are using the UKB on more than 1,000 research projects. Collins hopes the data analysis platform, which is expected to be in place during 2020, will make the resource even more widely available to researchers from around the world, increasing the rate of discovery.
DNAnexus has continued to build out both their tech and partnerships, picking up a European Innovations Award, as well as a $20 million contract from the US Food and Drug Administration (FDA) to power their cloud-based collaborative omics environment, precisionFDA.
Asimenos says they’re still working closely with Regeneron as well, who have set the goal of completing whole exome sequencing of all 500,000 UKB by the end of this year. Such an “aggressive” timeline is a testament to the collaboration Regeneron’s been able to cultivate, according to Asimenos, as well as being an indication of how far the technology has come.
“The thought of anyone saying, ‘I’m going to sequence half a million exomes’ in that short amount of time is incredible,” Asimenos says. “It’s an incredible feat in the sense that [Regeneron has] managed to set the infrastructure all the way from robotics that automatically handle preparing the samples, down to the automation that DNAnexus provides for uploading to the cloud and generating the results.”
“It’s become an industry-wide project,” Daly says. “We’re privileged to play a role in that.”