The Wheat Genome Sequence Odyssey
Posted 9th August 2017 by Fabio Caligaris
As the world population is expected to reach 9.8 billion by 2050, it is crucial to have innovative genomics tools to address global food security in a sustainable way. At the 5th Plant Genomics and Gene Editing Congress: Europe, Kellye Eversole, Executive Director of the International Wheat Genome Sequencing Consortium (IWGSC), announced that a high-quality reference sequence of the wheat genome is now available for the research community. The reference sequence is an essential tool to accelerate crop improvement programmes and wheat genomics research.
This achievement is the result of 12 years of collaborative research, which began in December 2004 when Kansas Wheat and Kansas State University, under the leadership of Forrest Chumley, hired Kellye Eversole to establish an international consortium that would lay a foundation for a paradigm shift in wheat breeding. The IWGSC odyssey began in the spring of 2005 as four individuals – Kellye Eversole, Rudi Appels, Bikram Gil, and Catherine Feuillet – launched the International Consortium with a vision of completing a useful sequence of bread wheat for the breeding community.
One of the first steps was to determine what should be sequenced: progenitors of bread wheat, or one of the diploid, tetraploid, or hexaploidy wheats. Industry and growers made the choice simple. They unanimously supported sequencing what is growing on 95% of wheat fields, the hexaploid bread wheat genome, Triticum aestivum, and preferably the variety for which the most genetic stocks exist and which could be translated quickly into breeding programmes.
But, bread wheat is huge (~16 Gigabases – i.e. 5 times the size of the human genome), complex (three ancestral genomes – one of which is unknown), and contains a high percentage of repetitive elements that complicates assembly of genome sequences. At the time, only rice had a high-quality reference genome sequence available and the rice genome is equivalent to one of the smaller bread wheat chromosomes.
In the landscape of rapidly changing sequencing technologies, it was critical to select an approach that would be “technology neutral”, i.e. one that would allow them to build resources that could be used regardless of the sequencing technology available at the time. Then, the only technology neutral foundation was a BAC-based physical map. As most breeders rely on the information contained within individual chromosomes and as the Consortium was able to reduce the complexity of the genome by focusing on manageable pieces, chromosome-based physical maps were selected as the foundation that would underpin any sequencing technology.
Thus, the goal of the IWGSC was to produce a high-quality, physical map-based, ordered, and annotated genome sequence comparable in quality to the rice genome sequence. Having a complete, ordered sequence was considered key, as it would allow breeders to drastically reduce the time between gene discovery and commercially available variety.
41 Rice Genomes
Luckily, a laboratory in the Czech Republic led by Jaroslav Dolezel had developed a technology to flow sort chromosome arms, breaking down the huge bread wheat genome into 41 smaller, much manageable, pieces (40 chromosome arms and chromosome 3B). The challenge seemed more approachable and comparable to sequencing 41 rice genomes! The first BAC library was ready (3B) and the lab started working on the production of 40 chromosome arm specific BAC libraries.
The concept of building physical maps for each chromosome/chromosome arm of bread wheat was a daunting challenge and was viewed with much scepticism by most in the scientific community. However, scepticism turned to interest in 2008 with the completion of the high-quality physical map of the largest wheat chromosome – 3B, equivalent in size to the entire soybean genome. With this success, projects were launched in many countries to develop physical maps of chromosomes and, with financial support from Bayer CropScience, all physical maps were completed by 2015. Since they were working with chromosome-based physical maps, it was not necessary to wait until all the maps were finished before sequencing could begin. Work began in 2009 to sequence chromosome 3B and others followed as the maps were completed. This method also facilitated map-based cloning projects on these individual chromosomes.
As sequencing technologies became more efficient and affordable, the IWGSC established a side project in 2010 that would generate draft survey sequences of individual chromosomes. This would provide at least some information on each chromosome and would allow breeders to start isolating or refining regions of interest. The work on chromosome specific BAC-based physical maps and pseudo-molecule sequencing continued in parallel.
The real breakthrough came with the software DeNovoMAGICTM which was developed by the firm NRGene to be used to assemble Illumina whole genome sequence. With this, the IWGSC could produce a whole genome assembly of the 16Gb genome in 7 months and validate its quality against other sequence-based and chromosome-based resources developed by the IWGSC over the previous years. They released the whole genome assembly with Hi-C and POPSeq to the scientific community in June 2016. Although this was an impressive assembly, it did not completely achieve the high-quality standard that was the target for the IWGSC.
Since June 2016, the IWGSC has integrated all chromosome-based resources (physical maps, genetic maps, whole-genome-profiling-WGPTM sequence tags, optical maps, and markers) and released in January 2017 IWGSC RefSeq v1.0, the first version of the high-quality reference sequence of bread wheat.
Having at their disposal all the chromosome-based resources and the WGPTM tags generated over the previous years proved invaluable as the quality of the assembly tripled with the addition of these resources. Most of these were used to refine the order of the sequence and to decrease the number of pieces per chromosome (to an average of 75 scaffolds per chromosome). The automated annotation process, using two different annotation pipelines to develop a high confidence set of genes, was completed and released to the community in June 2017. Final analysis is underway and the goal is to submit the manuscript by late summer 2017.
What started with 4 people in 2005, has now grown to 1800 members working in more than 530 institutes or companies in 62 countries. The IWGSC reached its goal of generating a high-quality reference sequence of bread wheat, a significant milestone for agriculture and the scientific community and now work will focus on manual and functional annotation of the reference sequence as well as sequence improvement. All the IWGSC data are available under the Toronto agreement at the IWGSC data repository hosted by URGI-INRA (France).
So, what are the lessons learned from this odyssey?
- Every crop of importance for food, feed and fibre should have at least one high-quality manually and functionally annotated reference sequence, preferably more.
- BAC libraries are essential for generating high-quality references and are critical for map-based cloning.
- Maintaining flexibility is crucial so one can adopt new technologies as they are developed without losing sight of the need for quality.
- The key guiding principle is to never lose sight of the original vision even when the rest of the scientific community may not be supportive.
View the agenda here.