Abstract

Materials genome is a subversive frontier technology emerging in the field of international materials in recent years and also a propeller for the development of new materials. It brings fundamental changes to the traditional material research mode, aiming to accelerate the research and development of new materials and reduce costs, so as to support the development of electronic information, energy and environmental protection, aerospace, and other industries. In this paper, we introduce the strategic significance, national layout, and methods of materials genome technology and emphatically introduce the design idea and development status of materials database method. Then we summarize the development trends of materials genome and put forward suggestions for its future research, aiming to provide references for the development direction of materials genome technology in various countries, especially in developing countries.

1. Introduction

New materials, such as new energy materials, information materials, and intelligent materials, are the core of subversive technology revolution and also the strategic highland of fierce competition among countries around the world. Traditional material research and development (R&D) utilizes the existing theory and knowledge experience to find the materials that meet the needs through characterization test and inspection by adjusting material ratio. However, this method is time-consuming and laborious; it often takes ten to twenty years from design, development, experiment, optimization, and characterization to integration, application, and first time put into the market, which is difficult to meet the needs of the rapid development of industry for new materials. With the development of economy and science and technology, designing materials on demand and accurately controlling materials’ properties have become the development trends of advanced materials [15].

In 2002, Professor Zikui Liu of Pennsylvania State University put forward the concept of “materials genome,” which possesses a strong analogy with the human genome. Just like the sequences of DNA and RNA in human genes determine the main functional properties of human body, microstructures of materials, properties and arrangement of atoms, crystal structures, and defects determine the intrinsic properties of materials. Materials genome is a product of deep integration of material R&D with modern information technology, such as high-performance computing, material gene chip, big data and “Internet+,” aiming to accelerate the whole process of discovery, R&D, production, and application of new materials and reduce R&D costs and shorten R&D cycle [611].

Materials genome can change the concept and mode of material R&D by integrating high-throughput computing, high-throughput experiment, and materials database technologies. It starts from the application requirement, backward-deduces materials that meet the corresponding structure and function, and reveals the relationships among material composition, different element arrangement, and material function, so as to realize the purposeful design of new materials to support the development of advanced manufacturing and high and new technology [1215].

2. Materials Genome Initiative

2.1. Development Status in Major Countries
2.1.1. United States

In 2011, the U.S. government launched “Materials Genome Initiative” (MGI) project, which is a major measure taken by the U.S. to maintain its leading position in advanced materials and high-end manufacturing industry fields, aiming to accelerate the pace of new materials from discovery, innovation, and manufacturing to commercialization, transform the R&D model from “experience” to “prediction,” and attempt to shorten the R&D cycle of new materials in half. The specific measures of MGI include developing high-throughput material simulation tools and methods to accelerate material screening and design, developing and popularizing high-throughput material experiment technology and equipment to quickly and accurately obtain a large number of key data for material calculation to screen and verify candidate materials, and developing and improving materials database/informatics tools to effectively manage and utilize the data chain of materials from discovery to application (see Figure 1) [16].

The U.S. Department of Energy (DOE), Department of Defense (DOD), National Science Foundation (NSF), National Institute of Standards, and Technology (NIST) and National Aeronautics and Space Administration (NASA), together with universities, enterprises, and scientific research institutes, have been working on resources and infrastructure in accordance with the objectives and purposes of MGI. They carried out a series of plans and deployments to accelerate material R&D and shorten its marketization process, and have achieved a series of achievements in key demonstration application fields of typical materials [17].

2.1.2. Europe

In 2011, the European Union (EU) launched “Accelerated Metallurgy (AccMet)” project, which focuses on alloy design and simulation, aiming to shorten the R&D cycle of alloy formulations from five or six years required by traditional methods to within one year [18]. In 2012, the European Science Foundation (ESF) launched “2012–2022 Metallurgy Europe” project, which lists high-throughput synthesis and combinatorial screening technology as its important contents to accelerate the discovery and application of high-performance alloys and new generation of other materials. The “Research Network Plan,” under the “AB-Initio Simulations of Materials, Psi-k2” project set up by ESF, is dedicated to the development of ab initio calculation method for condensed materials at the atomic level [19]. In 2013, EU approved and implemented the “Horizon 2020” project, which is the largest framework plan for scientific research and innovation in EU with a total budget of 77 billion Euros, aiming to integrate the scientific resource of EU countries, improve research efficiency, and promote scientific and technological innovation, among which the “NoMatD” project, led by the Max Planck Institute (MPI) of Germany, aims to establish material encyclopedia and develop tools for large data analysis.

The UK has also conducted research on high-throughput material computing simulation and basic database of material computing under the funding of “e-Science” project. Expert systems for nuclear fusion lead-lithium eutectic materials database have been established in Spain, Italy, and France [20].

2.1.3. Asia

Japan is one of the earliest countries in Asia to collect, process, and apply material data, which has established thousands of materials databases and knowledge bases in various fields such as glass, ceramic, alloy steel, etc. For instance, National Research Institute for Metals in Japan established many databases on mechanical properties of metal materials and composite materials. Korea established an online data bank on metal, chemical, and ceramic materials [21]. Indian Gandhi Atomic Research Center (IGCAR) integrated materials science data from Indian research institutes and universities and constructed an online materials database to provide data services on mechanical properties, corrosion properties, nondestructive evaluation, and thermal and optical properties of materials [22].

Since MGI was put forward by the United States, many universities in China set up special research centers for materials genome. In 2016, the Ministry of Science and Technology of People’s Republic of China launched “National key research and development program-key technology and supporting platform of material genome engineering” project, which focuses on the frontier and key technology equipment R&D, pays attention to the improvement of innovation ability, and comprehensively promotes the R&D of key technologies, platform construction, and R&D demonstration of typical materials, so as to achieve the goal of “shortening R&D cycle in half and reducing R&D cost in half” of new materials.

2.2. Successful Cases

General Electric (GE) corporation successfully developed the superalloy named “GTD262” by adopting the idea of materials genome and relying on its internal database of similar alloys. The R&D and service of GTD262 alloy lasted only four years from conceptual design to industrial production, and the R&D expenditure is approximately 1/5 of the development cost of similar alloys.

When identifying and testing perovskite solar materials alternatives, the Energy Frontier Research Center (EFRC) of DOE improved the research efficiency by 20% using Materials Project database. The Key Materials Institute of DOE accelerated the discovery and development of rare earth substitutes relying on materials genome method.

By high-throughput density functional theory calculation, Ceder team from Massachusetts Institute of Technology (MIT) screened out three promising lithium battery materials from 20,000 compounds, of which the performances are significantly improved compared with commercial materials [23, 24].

The MGI project team in Ningbo city of China successfully developed the first 48-target high-throughput composite material ion beam sputtering vacuum coating equipment in China. Its core technology, combination material chip, is known as “new material search engine,” which improves the efficiency of synthesis and screening processes of new materials by 1000 to 100000 times, and shortens the R&D cycle to two weeks.

The Institute of Physics of the Chinese Academy of Sciences developed a unique high-throughput experimental method based on the concept of materials genome engineering, which made a breakthrough in the composition design and exploration of high-performance amorphous alloys, and realized rapid screening of amorphous alloys, and developed a new system of high-temperature and high-strength amorphous alloy materials [25].

3. Methods of Materials Genome

The key to MGI is the collaboration and combination of experiment, calculation, and database in the R&D process of new materials. Cooperative work of these three methods can make the combination of theory and experiment in the R&D process closer and obtain the results faster (see Figure 2) [20].

3.1. High-Throughput Experiment

High-throughput experiment is the basic method to directly select new materials and obtain experimental data in a large number of samples. In the process of high-throughput experiment, combined preparation can realize parallel synthesis of a series of samples, and high-throughput characterization that combined structure and performance can greatly accelerate the discovery of new materials. High-throughput experiment utilizes experimental data to modify computational models and frame the connections between different-scale calculations. It provides a large amount of basic material data and experimental verification for the simulation calculation and constructs the internal connections among components, organizations, and processes related to material performance. In addition, it can also enrich the materials database, provide analytical materials for material informatics, and screen target materials quickly and efficiently according to specific application requirements. Currently, a complete experimental technical system has been formed for high-throughput preparation that covers various material forms of thin film, block, powder, etc. [2630], and for high-throughput characterization that meets properties of thermodynamics, electricity, optics, mechanics, electromagnetism, electrochemistry, phase, etc. [11, 31].

High-throughput experimental preparation can be divided into two steps: “combination” and “ phase formation.” The former can realize the controllable distribution of sample composition, and the latter can realize the controllable distribution of phase structure of the sample. Composite material chip based on thin film morphology is a mature high-throughput material preparation technology, which can be divided into co-deposition method and physical mask method. The preparation of bulk materials can accurately characterize the properties of related system materials. In recent years, a series of new methods for the preparation of high-throughput bulk materials have been developed, including laser additive manufacturing method, isostatic pressure preparation method, etc., among which the relatively mature methods are bulk diffusion method and rapid alloy forming method. High-throughput preparation techniques for powder materials include spray printing synthesis, multichannel microreactor, etc. [3237].

Among high-throughput characterization technologies, optical detection including X-ray diffraction/scattering, X-ray fluorescence spectrum analysis, X-ray energy spectrometer, ultraviolet/infrared spectrophotometer, etc. is a relatively direct and effective characterization method for studying material composition and structure. Electrical properties of materials include superconductivity, conductivity, dielectric constant, ferroelectric constant, magnetoresistance effect, electron mobility, diffusion length, corrosion, contact resistance, interface parameter, energy level alignment, etc. Evanescent microwave probe microscope is an effective high-throughput research tool for studying electrical properties. Magnetic properties of materials include magnetic susceptibility, spin resonance, etc. The tools used to characterize high-throughput magnetic properties include magnetic microscope, scanning hall effect probe, scanning magneto-optical Kerr effect imaging system, scanning microscope for superconducting quantum interference devices, etc. High-throughput electrochemical characterization is of great significance to the study of batteries, capacitive materials, and devices such as electrode and electrolyte. Electrochemical characterization instruments must possess high resolution and automation characteristics. At present, the VersaSCAN microregion electrochemical scanning system developed by AMETEK company of the United States is widely used in high-throughput combined electrochemical characterization for lithium battery anode and cathode, thin film electrolyte, semiconductor, and other important materials. A variety of materials with different components and structures can be prepared on the same substrate by micro-electro-mechanical system (MEMS) technology; then the high-throughput mechanical properties can be tested. The characterization of high-throughput catalytic properties needs to simulate the corresponding conditions of catalytic reaction process, and the reaction process and catalytic materials can be integrated through micro fluidic structure to achieve the characterization study of partial catalytic properties. With high brightness and high temporal-spatial resolution characteristics, large-scale scientific devices are particularly suitable for rapid characterization of a large number of samples generated in high-throughput experiments, so as to take advantage of its fast, accurate, and efficient characteristics. Among them, synchrotron radiation source and spallation neutron source are the most representative large-scale scientific devices [30, 3847].

3.2. High-Throughput Computation

High-throughput computation refers to utilizing the combination of supercomputing platform with multiscale integration, high-throughput concurrent material calculation method, and software to achieve large-scale material simulation, rapid calculation, accurate prediction of material properties, and new materials design, as well as to improve the screening efficiency and design level of new materials and provide theoretical basis for the R&D of new materials [48, 49]. High-throughput computation is characterized by multiple tasks, and single task is often characterized by flow computing. Its computing amount is relatively small, while the concurrent quantity and data scale of tasks are huge, and the real-time processing is required. High-throughput computational methods and tools mainly include the first principle calculation, calculation thermodynamics, dynamic process algorithm, microstructure and mechanical properties prediction tools, etc., which spans multiple levels of atomic model, simplified model, and engineering model and integrates the multiscale association algorithm from atomic scale to macroscopic scale [5053].

Adersson et al. designed Ni-Fe alloy catalyst using high-throughput density functional calculation [54]. Curtarolo et al. designed high-throughput calculation procedure AFLOW based on the first principle and obtained 150,000 thermodynamic data of alloys and more than 10,000 electronic structure data of inorganic compounds [55]. Setyawan et al. applied this method to the study of inorganic scintillators, calculated electronic structures of 7439 compounds, and mined the results in an attempt to find new radiation-detecting materials [56]. Yang et al. utilized this method to find 28 kinds of topological insulator materials [57]. MIT’s Ceder research group began to design and develop lithium battery materials using high-throughput calculation method in 2010. They designed new compounds by substituting elements in compounds containing polyanion XO4 (X = P, S, As, Si) and screened new materials by calculating the parameters such as energy density, voltage, volume change after lithium removal, etc. [58]. Researchers realized the auto-transfer function of crystal structure database and the first principle calculation program VASP, so as to obtain the thermodynamic data and electronic structure information of materials through high-throughput calculation [59]. In addition, MatCloud, developed by computer network information center of Chinese Academy of Sciences, is a basic platform and software framework for supporting high-throughput material integrated computing, which can be directly connected with computing cluster. It can provide graphical modeling tools, support complex computing process design, and realize many functions of large-scale first principles calculation tasks, such as online job submission and monitoring, result analysis, automatic data extraction, standardized processing, and automatic storage of data, etc. [60, 61].

It can be seen from Figure 3 that the data flow of screening materials by high-throughput calculation is as follows: screening data from the external structure database to generate the input file that can be invoked by calculation software, then obtaining corresponding performance data of materials by calculation, the calculation results are saved to the database for further analysis. The new knowledge obtained can expand the original database and help to screen more accurate data. Therefore, forming and improving a set of programmed high-throughput computing process and associating various calculation software packages, single functional calculation programs, or instructions with computing hardware devices to make the whole computing process complete automatically, are the keys to improve the efficiency of high-throughput calculation. However, corresponding software needs to be designed for different materials to realize specific automated operation process [62].

3.3. Materials Database
3.3.1. Design Idea

Materials database usually includes the data on material properties, components, process, experimental conditions, application and evaluation, etc., which is the important basis for material research and application. Once built in scale, it can extract a large amount of useful information from original data by means of data mining, deep learning, data reproduction, and other technical methods, provide basic data for calculation and simulation and experimental design basis for high-throughput experiment by summarizing these information, and comprehensively collect, store, and share computational data and experimental data in real time [6365].

The computational data, experimental data, and empirical data of materials constitute the big data of multisource and heterogeneous materials. According to the characteristics of material data, the database can be divided into basic information database, calculation database, phase diagram database, and other subdatabases. Meanwhile, material data are logically divided into four parts: basic performance, processing test, calculation, and application. Then the relatively independent subdatabase is integrated into the materials database according to the new logical structure (see Figure 4).

3.3.2. Development Status

Currently, many countries in the world have established a number of relatively mature and practical materials databases, as shown in Table 1.

The United States is the most developed country in the development and application of materials database in the world. MatWeb is a well-known comprehensive business database in the United States, which provides material performance information and manufacturing information. It contains data of more than 130,000 kinds of materials, such as metal, plastic, ceramic, composite, etc., and provides many powerful search tools to help users query material data information, which is very convenient to retrieve. The Materials Project database established by MIT provides structural information and properties of more than 130,000 inorganic compounds (e.g., lithium-ion battery materials), which utilizes the huge database collected by density functional theory to predict the actual properties of simulated material models. The Molecular Space database established by Harvard University is also based on the density functional theory, which utilizes the methods of manual labor and machine learning to dig the potential of database. The Materials Commons database established by University of Michigan and Material Data Facility database established by NIST have collected 12.5 TB data. The Web SCD (Structural Ceramics Database) and Web HTS (High-Temperature Superconducting) databases established by NIST provide online data retrieval and data evaluation functions [66, 67].

The Total Material database in Switzerland is the most comprehensive database on the properties of metal materials in the world, which contains more than 450,000 detailed performance data of metal and nonmetal materials in 26 language versions. Total Material is also the largest advanced material performance database in the world, containing stress-strain, fatigue data, fracture mechanics, and creep data of more than 150,000 materials required for industrial design. The Pauling File database in Switzerland contains more than 46,000 phase diagram data, 320,000 crystal structure data, and 125,000 physical properties data, making it the largest inorganic compound database in the world [68].

The Material Universe and Process Universe databases established by Granta Design company in United Kingdom collect data of more than 4000 materials and 200 processes. The European Fusion Material Performance database developed by Culham Center for Fusion Energy and other institutions in the United Kingdom aims to collect, preserve, and expand the data of future network and reactor. Cambridge Crystallographic Data Center (CCDC) established by Cambridge University is a crystal structure database, which is a repository of small molecule organic and metal organic crystal structures in the world. CCDC contains one million structures from X-ray and neutron diffraction analysis, which has been used by thousands of organizations in more than 70 countries [21].

The Inorganic Crystal Structure Database (ICSD) in Germany aims to collect and provide crystal structure information of all inorganic compounds without C-H bond, which contains 100,000 compound catalogues so far, making it an authoritative inorganic crystal structure database in the world. The MSI Eureka database in Germany collects data on phase diagrams, phase reactions, and thermodynamics of inorganic materials since 1894, which is a numerical and instrumental database. It is also the largest rigorously evaluated phase diagram resource database in the world, covering all published inorganic material systems [69].

The high-temperature nuclear reactor material database (MatDB) and literature management database (DoMa) in the Netherlands provide thermodynamic and thermophysical performance data of nuclear reactor materials at low or high-temperature environment.

The MatNavi database system established by National Institute for Materials Science (NIMS) in Japan possesses nine basic material performance databases, five structural material databases, three engineering application databases, and five data application systems, covering data information of polymer, inorganic nonmetal, metal, ceramic, alloy, superconducting, composite, and diffusion materials. MatNavi database possesses nearly 140,000 registered users from more than 26,000 organizations in 160 countries [70].

The Materials Genome Engineering Databases (MGED) established in 2018 in China is an integrated system platform of database and application software based on the idea and concept of materials genome engineering, including high-throughput computing engine, interatomic potentials database, materials database, materials data mining system, high-throughput experimental data processing software, and paper information assisted extraction software functional modules, possessing more than 77000 pieces of material data. Materials Data Sharing (MSDSN) is the core platform of material data sharing and also a relatively systematic online database in China, covering the fields of material foundation, ferrous metal materials, organic polymer materials, information materials, energy materials, biomedical materials, etc., integrating more than 610,000 pieces of material data.

Constructing and developing high-throughput calculation, high-throughput experiment, and materials database technologies and making full use of the cognition of interrelationships among material composition, process, microstructure, and mechanical properties in traditional materials science field based on the existing massive experimental data results are of great significance to fully understand and comprehensively promote the transformation and breakthrough of materials genome technology in the R&D process of new materials.

Currently, materials database is developing towards systematization, networking, intellectualization, standardization, modernization, and commercialization (see Figure 5). With the development of information technology, the combination of materials database and artificial intelligence technology constitutes an expert system for material performance prediction or material design, which plays an important role in material R&D, product design, and decision-making consultation [7375]. The standardization development of materials database will be the best way to break through the diversity limitation and improve the efficiency of material research. The commercialization of materials database refers to supporting the maintenance, operation, and development of database using data information service through network platform and other media to commercialize data information. For instance, MatWeb and Total Materia are typical commercial databases, which obtain operating profits through membership, online advertising, paid consulting, and other commercial forms [8, 76].(1)Developing all kinds of new material genes in industry, biological medicine, military, energy, information, and life fields in an all-round way, so as to meet the needs of scientific research, production, and practical application.(2)Currently, one of the obstacles in the development of materials genome is the lack of data storage standards due to the diversity of materials research. Different countries or regions often adopt different data standards; thus it is difficult to directly exchange data between different systems, and the information sharing is limited to some extent. All countries should actively carry out the standardization of materials database, strengthen international exchanges and cooperation, and jointly formulate international standards for database structure and data storage.(3)Material data are often dispersed in the hands of enterprises or individuals, resulting in the isolation and lack of sharing of different databases. Therefore, it is necessary to improve the sharing mechanism of databases, improve the effective incentive mechanism for materials data sharers, strengthen the results utilization efficiency, and avoid repeated investment and R&D.(4)The acquisition process of material data is relatively complex, which possesses strong intellectual property attribute. While data sharing is the general trend, on the one hand, various countries should actively establish open platforms to realize source code sharing and protect the intellectual property rights of data through unique identification to prevent the abuse of shared data. On the other hand, various countries should actively formulate relevant rules and regulations and legal provisions to ensure the coordination and unification of privacy and openness of material data.(5)The modernization and commercialization of materials database are great driving forces to promote the R&D and industrialization of materials database. Therefore, on the basis of formulating unified data standards and technical specifications, it is necessary to pay attention to the commercialization degree, update, and improvement of products, expand the commercialization scale of database, and make the maintenance and operation of database possess effective commercial mechanism, so as to realize the social and economic benefits of material data.(6)Close cooperation professional groups around the research, development, preparation, application, and evaluation of MGI should be established. The combination of materials genome with artificial intelligence, machine learning, and expert system should be emphasized to help understand and discover the correlation between various material parameters and properties, reduce the dependence of reliable predictive models on prior data, and improve the development efficiency and benefit of materials database.

5. Conclusions

Today, materials genome, artificial intelligence, block chain, etc. are the most potential innovative breakthrough technologies. Materials genome can help accelerate the R&D and deployment of new materials, as well as better predicting how the parameters in the manufacturing process affect the performance of the final materials and products, and then achieve the control of product performance.

With the development of electronic information technology, data has become the most core resource and the necessary foundation to accelerate the development of material science. With the continuous improvement of computational simulation capability, high-throughput material calculation will become one of the important sources of huge amounts of data. High-throughput material experiment will continuously and effectively provide data sources for calculation and database. Materials database will play a more important role in new material design, material selection, process formulation, material performance prediction, product design, and safety assessment. Therefore, all countries in the world, especially developing countries, should seize the development opportunity of materials genome and accelerate the R&D and construction of information infrastructure including material software, database, model, tool, platform, etc., so as to provide information support for rapid and low-cost R&D of new materials in the whole chain from theoretical design, preparation and characterization, organization, and process optimization to performance evaluation [77, 78].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors express their sincere thanks to National Science and Technology Library Project (2018XM06) for the financial support.