The 1000 Genome Project is a project sponsored and run primarily by National Institute of Health and a number of other organizations. The aim of the project is to discern such genetic variants which have a frequency of more than 1%. Now, Amazon has posted the genomic information from this project, of some 1700 people, in the public cloud so that it could be publicly accessed.
The researchers working on this project are of the view that if they are able to find genetic variants with 1% frequency across this data, this will help them immensely in studying different diseases more thoroughly. This genetic information of 1700 people, which comprises of 200 terabytes of data, is the largest ever collection of info regarding human genetics. And Amazon Web Services (AWS), who posted the data online, hopes that it will help researchers in the field of genetics.
AWS stated in the official press release, ‘Users can, for example, use Hadoop running on AWS’ Elastic Cloud Compute (EC2) or Elastic Map Reduce Compute services to analyze the data stored in its Simple Storage Service (S3).’
This genetic information has been gathered by studying samples of anonymous donors from around the globe, thus they come from a sample set which is sufficiently diverse. According to AWS, the samples have been gathered from Utah residents, people with Chinese heritage, people with Mexican heritage and people with African heritage. In the words of Richard Durbin, co-director of the 1000 Genomes Project, ‘Putting the data in the AWS cloud provides a tremendous opportunity for researchers around the world who want to study large-scale human genetic variation but lack the computer capability to do so.’
Image courtesy ynse.