The registration to the NOMAD Summer was closed at the beginning of May. We received about twice the number of applications, from all the relevant groups from academia and industry, than we could finally accept to participate at the event. And, even though we closed the registration, we are still receiving requests for participation from all over the world.
Data-driven research meets materials science and engineering
Big data, machine learning and artificial intelligence are revolutionizing numerous fields, and materials science is no exception. In this timely moment, NOMAD summer will introduce novice and advanced researchers (in academia and industry) to data-driven computational methods but also practical - and readily usable - tools for novel materials discovery developed within the Novel Materials Discovery (NOMAD) Centre of Excellence.
Making Big Data of materials comprehensible
An enormous amount of materials data, with millions of CPU hours spent every day in HPC centers worldwide, are already stored in data repositories. These data represents an invaluable resource. But how to extract knowledge from it?
The NOMAD Center of Excellence (https://NOMAD-CoE.eu) develops tools that will help sort all of the available materials data to identify trends and anomalies and and eventually obtain insight into physical processes in materials, such as thermal conductivity, heterogeneous catalysis, and more. Converting inputs and outputs produced by many different computer codes into a common format ensures that they can be compared to each other. This makes data ready for the next step, i.e., data analytics (e.g., data mining, machine learning, compressed sensing, and artificial intelligence) that are urgently needed in academia and industry and that are the focus of this Summer School: making Big Data of materials comprehensible to the outside world.
In particular, this school will introduce both novice and advanced researchers in academia and industry to methods and practical tools to:
- upload, share, and download materials science data using the NOMAD Repository and Archive;
- visualize physical processes and complex relationships between materials properties with Advanced Graphics;
- search and retrieve the vast amount of computed materials properties using the NOMAD Encyclopedia;
- identify correlations and structure in big data of materials, towards the final goal of predicting novel materials with tailored properties with NOMAD Analytics-Toolkit.
The school will feature eight sessions on the different topics listed below. Each session will be comprised of talks (45 min) by the invited speakers introducing the topics and hands on sessions guided by tutors from the NOMAD team. In the program the topical sessions are detailed.
(i) Data Repositories, Archives and Metadata
Repositories are the prerequisite for hosting, organizing, and sharing materials data. We will provide an overview of existing data collections with an emphasis on the NOMAD Repository, which is part of the NOMAD Center of Excellence (https://wci.llnl.gov/simulation/computer-codes/visit/). We will address the question how the existing input and output files, produced by many different computer codes, are transformed into a common format (using NOMAD Metadata (https://metainfo.nomad-coe.eu/nomadmetainfo_public/archive.html)) as realized in the NOMAD Archive. In this way, different calculations can be compared to each other. Participants will learn how to use Repository and Archive. We will also collect feedback, particularly from industrial users, about their special needs using these tools and data.
Speakers will include J. Vreeken and M. Scheffler.
(ii) Advanced graphics
This area will focus on the analysis and visualization of time-dependent periodic, molecular structures in combination with scalar or vector fields in three-dimensional space, as produced by electronic structure simulation codes. In addition, the visualization of abstract data in a multidimensional space of parameters, as, for example, encountered in the data-analytics context will be covered. NOMAD provides tailored software tools, both, in a traditional remote visualization context as well as in different virtual reality (VR) environments. Users will learn about powerful open- source visualization tools (VisIt (http://www.visitusers.org/index.php?title=Molecular_data_features) and Paraview (http://www.paraview.org/)) which are deployed and customized by NOMAD but can also be operated as standalone software. VisIt, in particular, combines tailored molecular visualization features (including support for handling periodic structures) with general-purpose capabilities for the analysis of scalar and vector fields.
Furthermore, we will demonstrate the benefits from the immersive visualization using virtual reality (VR) devices. Specifically, users will be able to explore various three-dimensional (e.g. Fermi surfaces and crystal structures), four-dimensional (e.g. time-dependent simulations), and six-dimensional data sets (e.g. electron-hole interactions) by using various virtual reality viewers (e.g. HTC Vive, Samsung GearVR, Google Cartboard). The NOMAD visualization team has a multi-year track record in applying these methods and tools in different scientific contexts and providing training courses to a scientific audience.
Speakers will include U. Woessner and M. Rampp.
(iii) NOMAD Encyclopedia
How to access the vast amount of materials data in a user-friendly way? The NOMAD Encyclopedia is an infrastructure, that is developed within the NOMAD CoE for this purpose. It displays all possible information on the computed materials, thus facilitating to extract knowledge from the data. It serves two main purposes, which are the comprehensive characterization of single materials, and the search for materials exhibiting certain features or combinations of various properties.
For example, users will be able to directly compare calculations performed with different approximations and get information about the underlying methodology and associated error bars. The graphical user interfaces also allow for tracing results back to the respective calculations. The Encyclopedia displays data from the NOMAD Archive and also incorporates graphics tools of the visualization team.
Speakers will include C. Draxl and G. Huhs.
(iv) Preparation and analysis of high-throughput simulations
Despite the many millions of calculations already available in databases and repositories all over the world, a huge area of materials on the one hand and properties on the other hand is basically unexplored. As a matter of fact, high-throughput calculations will always be an important issue to meet, for instance, industrial needs or to create data for assessing error bars related to methodology, approximations, or numerical noise. Various tools have been suggested to facilitate such tasks, like the Atomic Simulation Environment (ASE) (https://wiki.fysik.dtu.dk/ase/), Aiida (http://www.aiida.net/), etc. We will address this issue by providing a tutorial that comprises both the preparation of such high-throughput studies but also the analysis of the related results.
Speakers will include G. Hautier and C. Carbogno.
(v) Report on the 2018 NOMAD-Kaggle competition "Predicting Transparent Conductors"
NOMAD has organized a crowd-sourced data-analytics competition with Kaggle, one of the most recognized online platforms hosting big-data competitions. The task was to develop and apply data analytics models for the prediction of two key properties of transparent conducting oxides (TCOs): the formation energy, which is an indication of the stability of a new material, and the band gap energy, which determines the transparency of the material over the visible spectrum. TCOs form an important class of already commercialized wide-band-gap materials that have been employed in a variety of (opto-)electronic devices such as solar cells, light-emitting diodes, field-effect transistors, sensors, and lasers. However, only few compounds in this class are known and manufactured, leaving plenty of room for new discoveries.
The NOMAD summer school hosts in a dedicated session the 3 teams, out of about 900 participants, that created the highest ranked models, winning the competition.
Speakers will include Ch. Sutton, T. Yamamoto (1st ranked), Y. Lysogorskiy (2nd), and L. Blumenthal (3rd).
(vi-viii) Big-data analytics
The major part of the school will be dedicated to data-analytics tools, i.e. various ways of extracting knowledge from materials data. They embrace machine-learning, statistical-learning, and data-mining methods. The topics listed below will be covered in terms of lectures as well as hands-on tutorials:
- Structure prediction by compressed sensing: Experience suggests that many properties of materials are determined by a few key variables. Physical intuition, which can help finding them, is in many cases difficult to develop because of the task complexity. Compressed sensing is a recent technique in the field of signal processing. It allows to extract, in an unbiased form, the smallest possible set from a huge pool of variables for the statistical learning of materials properties, for a given accuracy of the property prediction.
- Neural networks: Recently, neural networks revolutionized the field of artificial intelligence, outperforming existing machine learning algorithms in a variety of tasks such as for example speech recognition, and natural language generation. We will explain how neural network can be applied to relevant materials science problems like crystal-structure recognition or atomization energy prediction.
- Subgroup discovery: This is a data mining technique for identifying subgroups of materials according to some property of interest. With its help, interpretable descriptors or variables describing the subgroups can be uncovered.
- Cluster expansion: The cluster expansion technique allows to build models for a quick calculation of materials properties. It uses as input both the structural and compositional information, and calculated properties of materials. These are readily available in the Archive. As an example, this technique will be used to model the formation energies of alloys, in order to uncover phase transitions and the stability of materials.
- First Principles Molecular Dynamics with Machine-Learned Forces: We present a molecular dynamics (MD) scheme which combines first-principles and machine-learning techniques in a single information-efficient approach. Forces on atoms are learned from first-principles forces available in databases, using Bayesian techniques. Thus, the workload of costly MD simulations is minimized, allowing to tackle problems currently beyond reach.
- Cross validation: a topic that will be given high relevance in all the presented applications (lectures and tutorials) is cross-validation, i.e., a set of strategies to quantify the ability of a learned model to make accurate prediction on data that are not included in the training set. Cross validation adapts to the specific method, but effort will be made to present this topic in a unified way.
Speakers will include A. Chandrasekaran, L. Ghiringhelli, R. Ouyang, A. Ziletti, and M. Boley.
Besides focusing the summer school on the tools developed within the NOMAD Center of Excellence, we will include also external researchers on the forefront of data-driven materials science. Selected topics are:
- Exploratory Data Analysis and Causal Inference (J. Vreeken)
- Data-driven Rational Materials Design (A. Chandrasekaran)
- Virtual and Augmented Reality for Scientific Visualization (U. Woessner)
- Dimensionality reduction for Big-Data analytics (M. Ceriotti)
|mid of March 2018||Registration opens|
|May 2, 2018||Registration closes|
|May 9, 2018||Acceptance announcements|
|September 24, 2018||Start of the summer school|
|September 27, 2018||End of the summer school|
The NOMAD Summer School will be hosted at EPFL (Ecole Polytechnique Fédérale de Lausanne) the CECAM headquartes in Lausann, Switzerland. Please look on the homepage from CECAM for directions. Please keep in mind that people from outside Switzerland or Schengen area have to appliy for a VISA.