Title : Gossypium purpurascens genome provides insight into the origin and domestication of upland cotton
Abstract:
The wild race of allotetraploid upland cotton (Gossypium hirsutum L.), native to Central America, was domesticated in the southern United States and spread worldwide and widely cultivated after the mid-18th century. However, as early as 3,000 years ago, the Li ancestors on Hainan Island had already begun spinning cotton fibers for weaving. A unique Hainan Island native cotton (HIC) was likely used in the long textile history of Hainan Island. However, the HIC’s origin and the evolutionary relationship between HIC and American cottons are still unclear. Here, one HIC plant (named as HPF17) collected in Sanya (anciently known as Yazhou), Hainan province, was used as the material to de novo assemble a high-quality genome. Using the Gossypium genomes and resequencing data, comparative genomic and phylogenetic analyses revealed that the HIC belongs to G. purpurascens, and G. purpurascens is best classified as one of the most ancestral races of G. hirsutum, second only to G. hirsutum race yucatanense. It was inferred that purpurascens probably dispersed to Hainan by floating on ocean currents based on its high saltwater tolerance and the highly consistent distribution of Pacific currents with the geographic range of wild tetraploid cottons on the Pacific islands. Divergence time estimation also indicated that purpurascens differentiated from American upland cottons ~200,000 years ago. Considering together with historical materials, G. hirsutum race purpurascens may have been partly domesticated, planted successfully in small cultivations on Hainan Island much earlier than the Pre-Columbian period, and was likely used for “Yazhou cloth” weaving. Thus, modern upland cotton may stem from diverse origins and different domestication events, and China may be one of the earliest countries to domesticate and cultivate tetraploid cotton. This study also identified 69 QTLs associated with 11 yield and fiber quality traits, 2,489 domestication regions between wild races and cultivated varieties (lines) of upland cotton. They are the main loci for domestication and improvement of upland cotton. Through whole-genome comparison of 12 cotton genomes, 47,774,023 short variations and 805,397 structural variations (SVs) covering 2.93 Gbp of genome sequences were detected. Among all types of SV, the coverage rate of domestication region within inversions reached 55.5%, which was much higher (by >31.0%) than that of other types of SV. Haplotyping and association analysis revealed that eight large-scale inversions (lengths ranging from 4.9 to 32.4 Mbp) have experienced artificial selection in the early stage of upland cotton domestication and improvement, and are significantly associated with “domestication syndrome”-related agronomic traits such as lint percentage. These results indicated that SV, especially inversion, plays an important role in the domestication and improvement of upland cotton.