Accurate identification of Larimichthys crocea genetic resources based on "NingXin III" chip and machine learning method
-
ZHAO Ji,
-
FENG Miaosheng,
-
KE Qiaozhen,
-
WANG Jiaying,
-
JIANG Tingsen,
-
WU Xiongfei,
-
PENG Shiming,
-
BAI Yulin,
-
SHEN Weiliang,
-
ZHOU Tao,
-
PU Fei,
-
XU Peng
-
-
Abstract
Larimichthys crocea is an important commercial fish in China, with an annual production of more than 250,000 tons in recent years. L. crocea is extremely rich in genetic resources, which consisted of wild populations distributed in natural sea areas and breeding lines obtained through decades of selection breeding. In order to efficiently protect, manage and utilize L. crocea genetic resources, there is an urgent need to develop accurate genetic identification method to distinguish different germplasm of L. crocea. However, the lack of high-throughput genotyping tools for L. crocea and the lack of representative samples of geographical populations have made accurate identification of genetic resources difficult. Based on the previously developed 55K liquid SNP array ("Ningxin III") for L. crocea, the present study aims to carry out genetic identification for 21 L. crocea populations, including wild populations in coastal China, cultured populations in Fujian and Zhejiang, and multiple breeding lines. The results of population genetic analysis revealed that the L. crocea population could be divided into Nanhai, Mindong and Daiqu populations, among which the genetic differentiation of the Nanhai population is the most significant. The classification results of large yellow croaker populations based on machine learning methods showed that the identification accuracy rate of the geographical group to which unknown L. crocea individuals belong was more than 99%. The breeding lines to which unknown L. crocea individuals belong also had a very high identification accuracy rate. For example, after three generations of genetic selection, a new strain (GS3F3) that has strong resistance against Cryptocaryon irritans has an identification accuracy rate of 99% based on the neural network method. The result of the present research showed that "Ningxin III" chip and machine learning methods can be used to implement quick and accurate genetic identification for L. crocea. The present study provides an effective tool of accurately identifying and managing the genetic resources of L. crocea, intellectual property protection for breeding materials and lines, and also provides a reference for the genetic identification of other aquatic organisms. In the future, it is necessary to establish a complete database covering all L. crocea germplasm resources and genetic identification standards, and develop a supporting visual computer program to perform identification work.
-
-