#### Abstract

The present paper reviews the investigation of ambient water structure and focusses in particular on the determination of the radial distribution functions of water from total experimental radiation scattering experiments. A novel method for removing the inelastic scattering from neutron data is introduced, and the effect of Compton scattering on X-ray data is discussed. In addition the extent to which quantum effects can be discerned between heavy and light water is analysed against these more recent data. It is concluded that, with the help of modern data analysis and computer simulation tools to interrogate the scattering data, a considerable degree of consistency can be obtained between recent and past scattering experiments on water. That consistency also gives a realistic estimate of the likely uncertainties in the extracted radial distribution functions, as well as offering a benchmark against which future experiments can be judged.

#### 1. Introduction

The structure of water is a recurring theme in the scientific literature on water and its many manifestations in solutions, mixtures and at surfaces. Being a liquid, water does not have “structure” in the sense that a crystal or a building has a structure. Nonetheless the forces between water molecules, which prevent molecular overlap and drive hydrogen bonding, give rise to characteristic correlations in space and time between water molecules. These correlations are called the structure of water, the nature of which correlations is a direct result of the forces between molecules. There is no direct way of measuring the forces between water molecules in the liquid, and these forces are different in the condensed state of the liquid compared to between two water molecules which approach each other in vacuo. Hence in principle if we can measure the correlations in the liquid experimentally, we can learn about the nature of the forces between molecules in water, and hence develop some intrinsic understanding about what makes water the rather important and special liquid that it is. Radiation scattering, whether by photons, neutrons or electrons, gives direct insight into these atom scale correlations, providing the radiation has the appropriate wavelength. This, combined with the a seemingly insatiable demand for knowledge about water at all levels of complexity, and from a wide variety of endeavours, including fundamental science, practical physical chemistry, geological situations, biological applications, and industry, has led to a plethora of data on water structure which extends back at least 80 years. Although simple in principle, the practice of measuring and interpreting scattering data is complicated by the fact that the measurements, even at modern high flux radiation sources, are still far from perfect and require several “corrections” some of which are not known precisely. Hence corresponding to these many data come a wide variety of opinions and interpretations about water structure which are often inconsistent, even in conflict, with one another. The present review covers the investigation into ambient water structure, and focusses in particular on the determination of the radial distribution functions of water from experimental radiation total scattering experiments. It is concluded that, with the help of modern developments in data analysis and interpretation, a considerable degree of consistency can be obtained between recent and past experiments. That consistency gives a realistic estimate of the likely uncertainties in the extracted radial distribution functions, as well as establishing a benchmark against which future experiments can be tested.

The purpose of this paper is to present a state-of-the-art account of the determination of water structure using radiation, mostly X-ray and neutron, scattering methods to measure the radial distribution functions of water, , , and . These three functions collectively describe the way atoms are arranged in the liquid and, therefore, form the bedrock for studying the statistical mechanical basis for water. This might seem a rather limited goal for a research paper which runs into many pages and with more than 100 references. Yet, the task of producing a comprehensive set of experimental radial distribution functions for water that the majority of researchers will sign up to has proved surprisingly elusive over many years. Articles on the structure of water usually attract hundreds of citations, yet the number of experimenters willing to delve into this topic to measure these functions is surprisingly few in practice. The scattering data themselves are rather dull and uninteresting, consisting of one or two broad peaks followed by a series of rather lifeless oscillations that disappear into a statistical haze. Unwanted effects like inelasticity and Compton scattering make large contributions that need to be carefully subtracted from the data if the experimenter is to have any chance at all of obtaining useful information from the data. The experiment is highly sensitive to source characteristics, detector characteristics, and detector stability. Why would anyone want to tackle this problem when there are any number of easier and more interesting problems to solve?

The truth of course is that hidden inside this difficult-to-decipher data is something rather special and important: the hydrogen bond. The function for water is arguably our only real chance of “seeing” a hydrogen bond in the flesh. The intramolecular version of this function gives the OH bond within the water molecule, but the intermolecular version shows clearly the way hydrogen atoms on one water molecule cluster around the oxygen atom of another water molecule. Although hydrogen bonding can exist between a variety of molecules, only in water (or perhaps hydrogen fluoride—but hydrogen fluoride is a really nasty chemical to investigate experimentally) do you see this clustering so vividly. Emanating from the hydrogen bond between water molecules comes a quite unique molecular arrangement, compared to most other known liquid materials. The , , and functions carry the information about that structure, which on one hand wants to be ordered and tetrahedral like ice but at the same time wants to be disordered like “normal” liquids such as mercury or liquid gold.

What makes water special in this case is not only that hydrogen bonding occurs in abundance in this material but that of all the materials that form hydrogen bonds, water, ammonia and hydrogen fluoride, are the only ones for which we have any chance of “seeing” the hydrogen bond directly in an experiment. This is because all of the three materials have only two atomic components, which means there are just three site-site radial distribution functions to be measured, which in turn means that the problem can be tackled with isotope substitution using neutron scattering, providing that the isotopic variants are isomorphic with each other. The neutron scattering of hydrogen is quite different to that from deuterium, so by measuring the light compound, the heavy compound (all H’s replaced by D’s) and a mixture of these two, one can obtain, at least in principle, the three site-site distribution functions being sought. As soon as the number of components increases above two, this direct separation becomes impossible even in principle because for the vast majority of cases the necessary isotopic contrast is not available. Of these three cases ammonia has a rather weak hydrogen bond which is difficult to identify and hydrogen fluoride is chemically so reactive that most researchers would not be tempted to try and measure it. But it has been done [1].

Water, therefore, offers a unique opportunity to study the hydrogen bond. Yet, the information does not come easily and even now the subject of “what is a hydrogen bond” involves significant uncertainty. Is it covalent? Is it charge-like? Is it really a bond at all? The present account will not even attempt to answer these major questions. Instead, it has a very simple and in principle straightforward task, namely, to identify as of now what are the best estimates of these three OO, OH, and HH radial distribution functions. The purpose is simple enough, but will the evidence presented be sufficiently compelling to convince enough people that the question of what is the structure of water is in fact no longer a question? It is the view of this author that yes, to all intents and purposes we know the structure of water, actually quite well in fact. Of course it will always be possible to improve on the data and on the radial distribution functions that are extracted from them. But the currently available radial distributions for water are already sufficiently accurate to provide an incisive test for computer simulations of water. Recent and independently acquired X-ray and neutron scattering data seem to support this attitude. Whether the reader will also accept this view remains to be seen.

This paper therefore, has limited scope. It will not address a myriad issues that are associated with water. It will mostly not address, except where relevant, the huge literature on the computer simulation of water—ironically, even though one of the critical tests for whether a simulation is any good or not is how well it reproduces the measured radial distribution functions, this is a far more active area than the experimental investigation of water structure. It will say nothing about aqueous mixtures and ionic solutions or the interactions of water in more complex systems. It will not comment on the interpretation of the radial distribution functions, whether they show water to be a heterogeneous mixture with chain-like properties mixed in with tetrahedral structures [2, 3], or whether it is really a rather uniform single component fluid [4]. The focus will simply be: can we measure these elusive site-site radial distribution functions and if we can, how good are the measurements?

In order to achieve even this goal, however, it is first necessary to describe the underlying theory behind the scattering experiment, then examine the salient features of the neutron and X-ray scattering methods, identifying in particular those aspects of each technique when applied to water structure can lead to uncertainties in the final data.

#### 2. Overview of Total Scattering Theory

##### 2.1. The Radial Distribution Function

Unlike in a crystal, the atoms and molecules in a liquid are in a state of rapid and diffusive motion. There will, as a result, be a local density of atoms and molecules which fluctuates rapidly with time:
where is the position of atom at time . Hence, even if we could find some probe that would capture these density fluctuations that would not by itself be particularly useful, since it would be difficult to identify any clear features or trends in these seemingly randomly changing density fluctuations. Instead, to capture useful information, it is necessary to perform an *autocorrelation* on the density fluctuations, to see how they vary, depending on the (relative) positions of the atoms, , and the relative difference in time, , between samplings of the density [5]:
where the last equality arises from substituting (1) into the middle term.

The primary concern of the present paper is the case where the time difference between samplings is zero, namely, , that is we are aiming to capture a snapshot of the density at every position within the fluid and see how that correlates with the density at every other position *at the same time*. The extent to which this can be achieved in practice places a limit on the accuracy of the correlation functions we can measure in a fluid, particularly when, as here, the atoms and electrons suffer considerable recoil as a result of interactions with the probing radiation, namely neutrons and X-rays, respectively. We will assume here that and drop the explicit reference to time, referring back to the effect of finite measuring time when discussing inelasticity effects in later sections.

It will be apparent that the terms with in (2) can be separated from those where . Hence, we write where is the average atomic number density (typically expressed in units of atoms per ). It can be seen, therefore, that divides into two parts, a “self” part involving correlations of an atom with itself and a “distinct” part involving correlations between distinct atoms. Equation (3) acts as the formal definition for the radial distribution function, . In effect, we are sitting on an atom and counting all the atoms that we find at a given displacement, , from that atom, converting that number to a local density. This local density is then averaged over all the atoms in the system and compared with the density of atoms in the system as a whole. Therefore is a convenient way of keeping track of how the local number density varies with displacement from an average atom and with respect to the average number density.

The properties of the Dirac -function are such that which means that is a density. This term arises from the fact that every atom must correlate with itself at (. It does not tell us anything about how the atoms are distributed in the material, but it does have an important bearing on the radiation scattering properties of the material.

One generalisation of these equations is needed for multicomponent materials such as water. As defined, is the pair correlation function for all the atoms of the system. If there is more than one atom type (oxygen, hydrogen, etc.) present as here then it is useful to split the pair correlation function into several terms, one for each pair of atom types. This was first suggested by Faber and Ziman [6], and we shall use definitions analogous to theirs throughout. Hence, would be the pair correlation function between atoms of type and . If there are distinct atom types in the system, then the number of distinct pair correlation functions is . For water there are just three of these functions, namely, OO, OH and HH.

In terms of these site-site correlation functions, the full autocorrelation function of the system would be defined as where and are the atomic fraction and number density of atoms of type respectively. The Kronecker is needed in (5) to avoid double counting pairs of atoms of the same type. The atomic fractions are needed to take account of the different percentages of the different types of atom present.

Because will go to unity at large , it is sometimes expressed as the sum of two terms:
where is called the site-site *total* pair correlation function between atoms of type and .

If the can be obtained by some means, then it is also possible to calculate *coordination numbers*. The number of -type atoms around an -type atom at the origin would be given by
where and are two radius values between which the coordination number is to be calculated.

##### 2.2. The Differential Scattering Cross Section and Structure Factor

The scattered radiation amplitude from an array of point atoms at positions is given by , where is the scattering length or form factor for atom . For neutron scattering, is simply a number which, however, depends on the spin and isotope state of the nucleus. For X-rays or electrons, or when scattering from magnetic materials with neutrons, the scattering length is called “the atomic form factor,” which is Q dependent, and which is usually given the symbol . (Strictly speaking the atomic form factor is the scattering length of the atom divided by the scattering length of a single electron, and so is a dimensionless number. X-ray differential cross sections are therefore normally quoted in the units of electrons per atom per steradian, since the value of , i.e., the atomic number of the atom.) Hence, the scattered intensity per unit atom is: Here, represents the change in wavevector between incident () and scattered () radiation beams. Thus, , and the modulus , where is the scattering angle and is the radiation wavelength. Note that, as in the definition of the autocorrelation function (3), the sum in (8) can be divided into two parts, namely terms for which , the so-called “self” terms, and terms for which , the distinct or interference terms. Hence the scattering experiment is in effect counting the atoms as a function of displacement from an atom at the origin. The main distinction between (8) and (3) is that now each term is weighted by the product of the scattering lengths of the two atoms at each end of the vector .

Using (5), the discrete sum in (8) can be replaced by integrals, with each term being weighted by the corresponding product of scattering lengths and collecting together terms which involve the same pair of atom types: with the partial structure factors defined by which becomes for an isotropic system.

The average in (11) is over the orientations of with respect to , and the second equality is allowed on the understanding that is isotropic with respect to these orientations. In short, the diffraction pattern is a three-dimensional Fourier transform of the pair correlation function, weighted by the scattering lengths or form factors for each pair of atoms. If, as in molecular liquids, the pair correlation function is a function of the relative *orientation* of neighbouring molecules as well as their separation, then this orientational information is obscured in the scattering process [7]. This is not, however, to say that there is no orientational information in the diffraction data from molecular liquids, as will be shown later.

The Fourier transform of a constant in -space is a -function in ,
Hence, using (6), the partial structure factors become
where
or
Including these definitions in (9) leads to the final expression:
It will be seen that the in (3) or (5) has become a constant independent of in and multiplied only by the mean square of the scattering length or atomic form factor for each atom—hence this term is sometimes also called the *single atom scattering*. The distinct correlations (correlations between different atoms), as represented by the total correlation functions, , then give rise to oscillations about the single atom scattering. The function in (16) is traditionally not shown, since it cannot be observed.

With neutron scattering there is a subtlety to the expression for the differential cross section (16) in that the neutron scattering length is dependent on the spin and isotope state of the atomic nuclei. This means that the expression for has to be averaged over the spin and isotope states of the atomic nuclei. Assuming that these states are uncorrelated with the positions of the atoms, then the general expression for the structure factor becomes
where the angle brackets represent the spin and isotope averages. Since this averaging is done inside the product of scattering lengths for the self terms, but outside the product for the distinct terms, in general the weighting on the self terms will be different from that on the corresponding distinct terms. The difference is sometimes referred to the “incoherent” scattering, although this terminology is not generally useful in the present context: it is more helpful in understanding the scattering process to distinguish clearly between “self” and “distinct” scattering. More importantly, there are some instances where the nuclear spins *do* correlate with nuclear position—molecular hydrogen at low temperature is a case in point, in which case the simple expression (17) becomes more complex [8]. Equally with molecules, there are instances where the isotopic state does depend on position within the molecule, so the *single molecule* (or *intramolecular*) scattering needs to be distinguished from the *intermolecular* scattering.

For X-rays, there is no spin or isotope dependence of the atomic form factors, so the X-ray structure factor is written simply as
which is correct providing that it is valid to treat the electron density for each atom as independent of that of its neighbours, the so-called *independent atom approximation*.

Sometimes it is useful to perform a direct Fourier transform on the measured interference differential cross sections (17) or (18), to give a total neutron or X-ray pair distribution function for a particular material. For neutrons this function is obtained by first subtracting the single atom scattering from the differential scattering cross section:
and then Fourier transforming this residual *interference* function:
which is simply a sum of the individual site-site pair distribution functions weighted by the product of atomic fractions and neutron scattering lengths.

For X-rays, direct Fourier transform of (18) is less straightforward. Because the atomic form factors are dependent, direct transform of (18) would lead to each site-site distribution function being convoluted with the convolution of the electron densities for the respective atom pair. To reduce this effect (although it is impossible to remove completely because the dependence of the atomic form factors is unique to each atom pair), it is conventional to deconvolve this “broadening function” [9], by dividing the X-ray interference scattering by the square of the mean atomic scattering factor prior to performing this Fourier transform: Then in space, the site-site terms might be assumed to be weighted by the form factor product at , but this is of course only an approximation: However the choice of normalisation in (21) is arbitrary, and in some situations we have argued that it is better to use in place of [10]. Indeed for water Narten and Levy [11] propose and use yet another normalisation based on the molecular form factor which has become widely used and will be discussed further in Section 4.1 in the following.

#### 3. Neutron Total Scattering from Water

Almost since its inception, neutron scattering has been used to look at the structure of water. However, the first definitive measurements came in the early 1970s when Powles et al. made their measurements on heavy and light water and on the so-called “null” mixture of heavy and light water [12, 13]. The latter consists of approximately 64 mole% O and 36 mole% O and is chosen because at this mixture ratio, the negative neutron scattering length of H (−3.74 fm) is almost exactly cancelled by the positive scattering length of D (6.67 fm), so that for this mixture only the O–O structure factor appears in the interference scattering. The quality of the data that was obtained at that time is remarkable given that the experiment has since been repeated many times, without any qualitative changes to the outcome. As it happened this work nearly coincided with publication of a joint study of heavy and light water using a combination of electron, X-ray, and neutron scattering [14–16].

Subsequently, Narten and coworkers repeated the neutron experiment [17], although the resulting site-site pair distribution functions [18] produced some very sharp peaks in the OH and HH pair distribution functions that were difficult to reconcile with existing computer simulations of water. At the same time, Soper and Silver [19] using pulsed neutron total scattering produced an independent estimate of the HH correlation function for water using a direct method for removing the inelasticity effects (see the next section) which had previously been developed for liquid hydrogen chloride [20]. This latter function appeared much more similar to computer simulation estimates than the Narten work, although there were still large uncertainties associated with the data. Subsequently, Soper and Phillips repeated the reactor-based neutron experiment using a range of mixtures of heavy and light water and employing an effective mass model to remove most of the inelastic scattering from the data [21]. This time there was good overlap with the previous pulsed source HH data, and this work led to the first complete set of experimental site-site radial distribution functions for water that have now been cited many times.

However, the story does not stop there by any means. The advent of high flux reactor and pulsed neutron sources with dedicated spectrometers designed specifically for scattering from disordered materials has meant that the experiment has been repeated many times [22], often now simply as a check that the equipment is working correctly. This led to a revised sets of s for water being published in 1997 [23] and 2000 [24].

Between 1986 and 2000, however, a big change took place in the way neutron total scattering data from disordered materials were interpreted in terms of the radial distribution functions. This was the invention in 1988 of Reverse Monte Carlo (RMC) modelling by McGreevy and Pusztai [25]. With this new method, the inversion of scattering data using a numerical Fourier transform was replaced by running a computer simulation of the material in question, then using the scattering data as a constraint on that simulation so that the calculated structure factors from the simulation model would reproduce the experimental measurements. This development produced some RMC simulations of water based on existing data [26, 27], and the idea was extended in 1996 to deriving an empirical force field which was consistent with the scattering data. This was the advent of Empirical Potential Structure Refinement (EPSR) [28–30]. As a result, the 2000 s were based entirely on this technique.

The method also introduced the possibility to determine the extent to which particular datasets were important in calculating the site-site radial distribution functions of water, and Pusztai has published several critical accounts on the structure of water which throw doubt on how accurately these radial distribution functions can be measured [31–33]. Later with collaborators, he uses a combination of computer simulation with specified intermolecular potentials and RMC to determine which potentials give the best agreement with the total scattering data from heavy water [34]. In a similar vein, Leetmaa et al. showed how by changing the bonding constraints in RMC a variety of structures could be fit to existing scattering data on water [35]. Structures which had only two hydrogen bonds could be fit to the data as well as structures with nearly four hydrogen bonds, and it was claimed that this called into question the conventional view that water is a tetrahedrally bonded liquid. Indeed, this work followed on from our own attempt to generate an asymmetric water structure consistent with scattering data [36], which in turn followed from a largely ignored demonstration that by using these computer simulation fitting procedures one could always obtain perfectly acceptable fits to scattering data that were physically meaningless [37]. Most recently, Zeidler and coworkers have used the challenging technique of oxygen isotope differences to explore water structure [38–40]. In this instance, the scattering contrast is achieved by changing oxygen isotope instead of hydrogen isotope as mentioned in all the previous studies, but the available isotopes give a contrast which at least one order of magnitude smaller than for hydrogen isotopes substitution. Although principally an experimental study, the data interpretation was assisted by a quantum mechanical molecular dynamics (MD) simulation of water using the TTM3-F potential [41]. This new work referred to our previous EPSR simulation of water which attempted to reconstruct the (small) differences in structure between heavy and light water [42]. From this author’s perspective, given the uncertainties about water structure, it was remarkable that two completely independent investigations using different neutron sources (reactor versus pulsed), different sample containments (cylindrical versus flat plate), different experimental teams, and different isotope substitution methods (oxygen versus hydrogen) could produce results which overlapped so closely for *both* light and heavy water [43]. It really does pose the question of whether there is anything more to be learned from performing further scattering experiments on pure water—perhaps we have gone as far as we can with the currently available data analysis techniques and neutron sources? Possibly, there is a case that we can always improve on the accuracy of the existing data, but as will be seen in the following sections how to achieve this is currently unclear.

The questions that remain, therefore, are what are the correct site-site pair distribution functions for water and what are the likely uncertainties on these functions? To obtain an answer to these questions, we need, understand a little about how the neutron scattering experiment on water actually works.

##### 3.1. Putting Neutron Data on an Absolute Scale

The raw data from a scattering experiment on water come in the form of number of neutron counts in a detector placed at a particular scattering angle, , with respect to the incident neutron beam. In a reactor experiment, the neutron energy or wavelength is fixed by a monochromator placed before the sample, while in a pulsed neutron experiment, the neutron energy is determined by the time of flight (TOF) from the source via the sample to the detector. These data typically have to be corrected for background, multiple scattering (neutrons which have been scattered more than once in the sample before reaching the detector), and self-attenuation in the sample, all of which procedures are by now well understood. One correction which for water is not well understood is the correction for inelasticity and will form the subject of the next section. Finally, the data have to be put on an absolute scale of differential scattering cross section (usually barns/atom/sr, where 1 barn = m^{2}). This is achieved by comparison with the scattering from a piece of pure vanadium which has the same geometry as the sample. Of all materials, vanadium has the unique property for neutrons that so that the distinct scattering from vanadium is very weak compared to the single atom scattering. At the same time, the inelasticity effects in vanadium are also weak and readily calculable by approximate means [44]. Hence, if the dimensions of the vanadium are known, then the scattered counts from the sample can be put on an absolute scale (to an estimated accuracy of ~1% in ideal cases) by dividing the sample counts by vanadium counts and applying a known factor which is determined from the vanadium geometry, density, and scattering cross section [45]. Since we also know the expected level of single scattering from the sample, this simple device of putting the data on an absolute scale is a useful check that the sample composition and dimensions are correct. For example, even a small bubble in the liquid or unexpected dissolved solute can throw the absolute scattering level off from that expected and signal that something is wrong with the sample.

Unfortunately, even though the scattering data may be on an absolute scale and the scattering level is as expected, for water there are relatively large inelasticity effects that even today are difficult to estimate.

##### 3.2. Inelasticity Effects

The subject of inelasticity effects in neutron scattering has a long history, and it will be impossible here to a give a complete account of the various attempts to solve this problem. We saw in Section 2.1 that strictly the pair correlation function of a material is a function of displacement in both position and time. The time dependence in this function translates to a frequency or energy response in the scattering experiment, so that scattered neutrons may have greater or less energy than they had when entering the sample. If is the incident neutron wavevector and is the scattered neutron wavevector, then conservation of momentum and energy in the nuclear collision requires that the momentum transfer, , and energy transfer, , are given by where is the scattering angle of the detector with respect to the incident beam.

Hence in the scattering experiment, the scattering cross section is strictly a function of both and , and for a detector at a particular scattering angle, equations (23) mean that will vary with energy transfer. The properties of the dynamic scattering law or structure factor, , were studied extensively by Van Hove [46], who showed that the so-called “static” structure factor, namely, the instantaneous structure which was the subject of Section 2.2, would be recovered if we could integrate over all at constant Q. Because the real detector, by kinematics, will integrate over at variable , the measured total scattering cross section will be different to the ideal “instantaneous” scattering cross section discussed previously. The crucial question is of course how different?

Placzek [44] was one of the first people to analyse this problem, and his approach to solving it for heavier atoms, which is based on a Taylor expansion in terms of the energy moments of , is the one that is still widely used today. Unfortunately, the Placzek method in its original form completely fails for light atoms such as hydrogen, and numerous attempts to correct for this failing have occurred [47–59]. An alternative to using the Placzek-type expansion is to integrate a model for over the actual trajectory seen by the detector and use this integral to estimate the correction for inelastic scattering [60, 61]. However in even the most recent of these studies [62] which used a model based on the quantum harmonic oscillator, only qualitative agreement could be obtained for pulsed source data from water, which, although encouraging, would have been inadequate to perform a reliable correction for inelasticity.

Just like the pair correlation function, it derives from the dynamic structure factor has a self and distinct term: from which the double differential scattering cross section is given by

Placzek proved two important identities concerning the first energy moments of these self and distinct terms: where the integrals are performed over all energy transfers at constant . In other words, the inelastic scattering associated with the distinct structure would be centred on zero energy transfer, while that associated with the single scattering would centre on an energy transfer related to the value of and the mass of the scattering atom, . Although this does not mean that the inelasticity associated with the distinct scattering is identically zero, it will always be much smaller than that associated with the single atom scattering: the distinct scattering will, therefore, lie on top of a -dependent background which arises from inelastic single atom scattering. Therefore, even if we do not know the shape of this single atom scattering, we can have some confidence, the interference signal on top of it that we are seeking has not been overly distorted by inelasticity effects, provided we have not attempted to measure the data at too low neutron energy and too high scattering angle. The recent study [62] as well as the previous studies of Powles [47–51] has demonstrated these facts. Hence, our primary objective in performing the inelasticity correction is to determine the nature of the inelasticity correction associated with the single atom scattering.

##### 3.3. Neutron Inelasticity: A Possible Solution?

Assuming that models for are currently inadequate to perform a quantitative correction for inelasticity, a method has recently evolved at ISIS for directly removing a large fraction of the inelasticity effect in scattering data from pulsed neutron sources without reference to a scattering model of the system. Since this method has not appeared in press previously, it is described here in some detail, with a demonstration of how it works in the following section. The notation follows much of that given in [62] to which reference should be made for more detail.

To understand how the scheme works, it is first necessary to distinguish between two closely related quantities. At a pulsed neutron source, scattering data are collected at particular scattering angles for a particular time of flight channel. The time of flight condition dictates that
where is the ratio of scattered to incident flight paths and is the wavevector at zero energy transfer, that is, when . Therefore, is determined by the total time of flight for a particular channel, and we define the elastic momentum transfer as
which is the same as the value of referred to in Section 2.2 in previous. As a result of (27), each time-of-flight channel will sample events where the neutron has gained or lost energy, and the measured total scattering cross section for each time channel will be an integral over these energy transfers, weighted by various factors associated with the incident neutron spectrum (which is a function of ), the detector efficiency (which is a function ), and the double differential scattering cross section (25); see [62]. When the data from different detectors are combined to form a single differential scattering cross section for the sample, the traditional way of doing this is to combine channels with the same value. This ensures that the distinct scattering is not smeared out by mixing different values. We shall represent this averaging over detectors and time channels (sometimes called *merging*) for a given by the notation . The resulting differential scattering cross section is then expressed as a function of alone. Alternatively, as seen in the following, it is also possible to combine the data from different channels and detectors at fixed , and this will be represented by .

Based on (24), each of the scattered intensities integrated over energy transfers, , can be expressed as the sum of single atom and distinct scattering terms, but it should be noted that due to our assumption that inelasticity effects on the distinct terms are small, the latter term depends *only* on the value of . On the other hand, since after correction for inelasticity there should be no dependence in the self term, the self term is strictly a function only of and as independent variables:

Figure 1 shows a plot of the estimated single atom scattering as a function of and for a hydrogen atom in a quantum harmonic oscillator function, as discussed in [62]. Although there is some variation in shape with scattering angle at larger angles, the general similarity of these curves, particularly for , suggests that we can divide the self scattering into two parts, one dependent *only* on and the other dependent on both and :
where the angle independent and dependent parts are formally defined as
We note that from (32), so that given Figure 1, we might expect that provided that we only use relatively low scattering angles.

Introducing the definition (29), (32) can alternatively be written as from which it will be immediately apparent that or

We note that in (35) is directly calculable from the measured data, while will be a heavily smeared version of the interference scattering because it will average over a range of at fixed . Hence, we can expect this term to make only a small (but non negligible) contribution to .

In a similar vein, using (29) with (30), if instead of averaging over scattering angles at constant , we average over scattering angles at constant , we obtain

As in (35), the term in (36) can be calculated directly from the data, while can be calculated if an initial estimate of is available from (35). This leaves a residue, to be estimated and subtracted to give an estimate of the interference function. Whilst it is not possible to assert that as was done for , it is likely, given the similarities in the averaging process, that this function will make a small contribution to (36) so that approximate methods for removing this term can be employed. Perhaps the most common method of removing unwanted low frequency backgrounds from total scattering data is to Fourier transform the data to space and then apply a filter to remove all structure below a specified minimum radius [62, 63]. Provided that minimum radius is below the position of any genuine structural features, there should be no corruption of the interference scattering cross section as a result. Figure 2 illustrates the basic ideas being expressed here.

**(a)**

**(b)**

The results (35) and (36), therefore, offer us a simple iterative route to removing unwanted inelasticity effects without having to resort to models of the dynamic structure factor. Starting from a set of data, we first perform the fixed smearing (35), assuming that . This gives an initial estimate of . Using this estimate, we now perform the fixed smearing (36) to give us an initial estimate for , after removing unphysical features which arise from using a method such as the Fourier filter method described previously. This estimate of is then fed back into (35), and the process repeated until no more changes to are observed. Obviously, a process like this can never remove *all* the effects of inelasticity, but it certainly avoids the potential subjectivity that can arise from alternative methods based on fitting polynomials to scattering data. As a result, data measured on different diffractometers can now be readily compared. Unfortunately the method will not work for reactor sources since those measurements are made at only one or a few fixed wavelengths. On the other hand it appears that corrections based on models of the dynamic scattering law are more reliable for these reactor cases [62].

##### 3.4. Application to Water—Comparison of Data Sets

In the present work a total of six distinct neutron data sets on water with hydrogen isotope substitution will be analysed. These were measured over an 11-year period between 2001 and 2012, two of them on the SANDALS diffractometer [64] (initials SLS) and four on the NIMROD diffractometer [65] (initials NIM), both at ISIS. Table 1 gives a list of the experiments and samples that are included. For the SANDALS data the detectors are combined into 18 groups spanning the angular range 3.8 to 35.4°. For NIMROD the detectors span the angular range 0.6 to 37.5°, although the lowest scattering angles will not be visible in the present data, which concentrate on the wider range above 0.1 . The samples with 0.64 mole fraction O, experiments SLS-07, NIM-10-3, and NIM-12-3, are called “null” because at this composition the average scattering length of a hydrogen atom is close to zero, so these data should only contain information on the O–O correlation, although this statement requires the assumption that light and heavy water and their mixtures all have the same structure, which is only approximately true in practice.

For four of the experiments, namely, SLS-01, SLS-07, NIM-10-3, NIM-12-3, the samples were contained in flat plate containers made of a “null” alloy of Ti and Zr with wall thickness 1 mm front and back. This alloy has a zero coherent neutron scattering length and so contributes no Bragg peaks to the scattering pattern. For the other two experiments, NIM-11-3 and NIM-11-5, the samples were contained in Hellma cells made from optically pure amorphous SiO with wall thickness 2 mm front and back. With the exception of samples 3, 4, and 5 in experiment SLS-01, the sample thickness was 1 mm in every case. For samples 3 and 4 of SLS-01, the sample thickness was 2 mm, while for sample 5 it was 4 mm. The total scattering was of course corrected for scattering from and attenuation in these containers. For experiments SLS-01 and SLS-07, the neutron wavelength range used was 0.05–4.9 Å ( ). For the NIMROD experiments, the wavelength range was 0.05–11 Å ( ).

Figure 3 gives an example of the total differential scattering cross sections, after averaging over all measured scattering angles, observed for the four samples of experiment NIM-10-3. These are analogous to what has been shown previously—see for example [19]—but differ in detail due to different instrument geometry, incident spectrum, and detector characteristics. It will be noted there are marked changes in scattering in going from O to O: the base scattering level rises dramatically with increasing H content, and the interference signal changes markedly in shape. For those samples which contain H, the underlying shape that was predicted by the model calculation of Figure 1 is reproduced qualitatively by the data.

These data, and the data from all the listed experiments, were subject to the iterative procedure described in Section 3.3 to extract the interference differential cross section, . The only required input parameter for this purpose was the minimum radius used to remove unphysical backgrounds from the estimate for . This minimum radius was set to 0.76 Å, which is reasonable given that the first peak in the radial distribution functions corresponds to the OH intramolecular bond length at 0.98 Å. Figure 4 shows the outcomes for all the samples in all the experiments corresponding to the four shown in Figure 4.

No attempt is made to distinguish between experiments here: the aim is simply to show the degree of variability that occurs between repeated measurements on the same material. In the present instance, the samples are measured on different instruments with different detector geometries and spectrum characteristics, in different containers, and sometimes the samples were prepared and mounted by different people. Perhaps not surprisingly the degree of disparity between different measurements is much smaller for the O sample than for those samples which contain H. This is almost certainly because a combination of large single atom scattering and large inelasticity effects when H is present mean that the data are excessively sensitive to small changes in the incident beam spectrum, detector performance, detector background, and detector geometry differences. Nonetheless, some very clear dependent trends emerge which are common to all experiments.

Clearly, the largest discrepancies occur below , and although these are large compared to the amplitude of the interference signal, particularly for the H containing samples, compared to the starting data, as shown in Figure 3, the discrepancies are actually quite small, of order %, at the lowest values. Hence, it seems fair to claim that the majority of the single atom scattering with its inherent large inelasticity features has been removed by this procedure. Obviously, one could envisage a time in the future when better models of the single atom dynamic structure factor are available so that the single atom scattering can be subtracted directly, but so far that time has not happened.

Preserving our requirement to avoid subjectivity as much as possible, the data from all six experiments were combined to form the arithmetic mean for each isotope contrast, and the RMS standard deviations of individual data sets around this mean were used to estimate the true experimental (as opposed to purely statistical error) error. The results of this analysis are shown in Figure 5. These data represent the main statement in this paper with regard with the interference functions for liquid water under ambient conditions using the pulsed neutron scattering technique. There are unfortunately too few recent data from reactor neutron sources to be able to perform a similar analysis on reactor data, although qualitatively at least the data appear consistent with what has been obtained previously [17, 21, 40].

Each of the data sets in Figure 4was Fourier transformed to space, and then an arithmetic mean of the results was formed in the same manner as for Figure 5. In order to reduce Fourier termination ripples in these functions, a broadening function of width 0.1Å was used on the space results as described in [63]. The results of this analysis are shown in Figure 6, where it can be seen that in fact the relatively large uncertainties at low in the interference functions have only minor impact on the short range structure.

As expected, the O sample produces a negative peak at the OH bond distance, and it will be noted that this peak is almost coincident with the positive peak for O at the same distance. Fitting Gaussians to these peaks for O and O suggests that the OH bond in O is only 0.003 Å longer than the corresponding distance in O, which is a much smaller difference than the 0.03 Å difference that was recorded in [42] and is much more in accord with what was found independently using oxygen isotope substitution [38]. Apparently, the discrepancy between these two values arises from the different methods used to remove the inelastic single atom scattering, but it does suggest that there is realistically a greater uncertainty in the stated bond lengths than is traditionally stated.

It will also be noted that the null water sample produces not only a very weak peak at the OH bond length, possibly a consequence of slight variations in the precise light water content of this sample, but also a weak peak at the hydrogen bond distance, ~1.8 Å. If the structure of light and heavy water were identical as is sometimes assumed [66], then there should be no peak at this distance. The point is that the weighting on the OO term is very weak in the neutron experiment, so that the “null” water data are sensitive to isotopic differences between OH and OD. On the other hand, it could reasonably be argued that, given the experimental uncertainties in this region of space, whether this is a genuine feature or simply an artifact of the data analysis is currently unclear, although the fact that it remains despite that averaging six different experiments together suggests that it is not pure artifice. A feature of this kind in null water would be produced for example if the OD hydrogen bond distribution was slightly sharper than the corresponding OH distribution.

#### 4. X-Ray Total Scattering from Water

The structure of water was of course tackled with X-rays long before it was measured with neutrons. Whilst there is almost certainly X-ray data from before 1930, it seems that the decade which followed produced the first activity of real significance, with Bernal and Fowler’s seminal interpretation of the early X-ray data [67] in terms of a tetrahedrally organised structure, but with the clear requirement for there to be some “quartz-like water modified by some local packing” [67]. Subsequently, Morgan and Warren extended these ideas with new X-ray data [68] and concluded that “… it is not possible to interpret the distribution curves uniquely in terms of a definite number of neighbours at definite distances. The results can be interpreted in terms of a structure in which the tendency of a water molecule to bond itself tetrahedrally to 4 neighbouring molecules is only partially satisfied.” After that, there appears to have been something of a dearth of data until the late 1950s and 1960s when Narten and coworkers [11, 69] and Bol [70] produced new data and analyses [71]. In the Narten and others view, water is regarded as a tetrahedral ice-like lattice with sufficiently large cavities to allow “interstitial” molecules. Bol on the other hand prefers to regard it as a locally coordinated liquid with some neighbours tetrahedrally bonded and others coordinated by purely van der Vaals interactions. Although both models have similarities, there is possibly a clear difference in emphasis, one exploring disordered crystalline models and the other a more randomly placed arrangement which is affected by the hydrogen bonding.

At about this time, the first computer simulations of water appeared [72, 73] which tended to emphasize the noncrystalline nature of water structure, and although there was some good qualitative agreement, there was a significant problem in that the first peak in the simulated OO generally came out at too large amplitude and too short distance compared to Narten’s data. This problem continued for most of the simple water potentials, such as the SPC [74, 75] and TIPnP potentials [76–80]. Only when density functional theory was brought into the problem this picture did appear to change [81], but, despite this initial success, later concerns about the equilibration of the *ab initio* simulation and the state of water being simulated emerged: it appeared to diffuse much more slowly than experimental water [82].

Perhaps as a result of this uncertainty, the past 12 years or so have seen an unprecedented interest in attempting to measure the OO radial distribution function using total X-ray scattering using modern high flux and high energy X-ray sources to achieve the maximum possible, and hence attempt to address the question of how sharp the first peak in really is. Whilst it is probably not possible to cover all the published work, some of the more notable results include [83–89]. To these I will add my own data measured on a laboratory X-ray diffractometer using Ag radiation [90] and analysed using a recently developed analysis program [63].

However, before saying any more about these data, it turns out that there are several ways in which X-ray data can be normalised, and it is not always totally clear which particular normalisation has been used in each particular case. Obviously, a common normalisation has to be used if we are to compare data from different sources.

##### 4.1. Normalisation of X-Ray Data

With neutron scattering the scattering data are usually put on an absolute scale by comparison with the scattering from a standard piece of vanadium as discussed in Section 3.1. For X-rays, there is no equivalent material to vanadium, so the data need to be put on an absolute scale by comparison with the single atom scattering. Methods for doing this have been available for a long time, typically based on the principles set out by Krogh-Moe and Norman [91, 92]. This normalisation procedure can be affected to a significant extent by the amount of Compton scattering present, Section 4.2, but provided that this is known reasonably well, it usually does not present any major difficulties in putting the data on an absolute scale.

A more contentious issue with X-rays is how to correct for the dependence of the atomic form factors: without this correction, we would see only the electron-electron correlation between atoms, and it would be difficult to compare with atomistic computer simulations. The most common normalisation is to divide the scattered intensity by the square of the average scattering length, as per (21). The author will call this normalisation I.

Alternatively, as stated previously, the raw scattered intensity can be normalised to the single atom scattering to remove the dependence: which will be called normalisation II. Although this normalisation has rarely been used up to now, it is entirely equivalent to normalisation I and has the added advantage that since , then for all , so that this definition behaves the same as the structure factor for a simple fluid, that is, . The same constraint will not necessarily be obeyed by using normalisation I. At the same time normalisation II means that is independent of whether the underlying intensities, , are defined per atom or per molecule.

The normalisation that Narten introduced [11] is based on the idea that a molecule has a characteristic molecular scattering amplitude which describes its structure, and within the independent atom approximation:
where is the displacement of atom relative to the scattering centre of the molecule. From this definition, the X-ray scattering from a single water molecule has the form
where the average is over the orientations of the molecule and and are the respective interatomic distances within the molecule. On the other hand, if the orientations of neighbouring water molecules are uncorrelated, then it can be shown that the intermolecular part of the scattering, that part which is due to interference between atoms on *distinct* molecules, contributes to the centre-centre structure factor multiplied by the square of the orientationally average molecular scattering amplitude (see also [93]):
where is the structure factor between scattering centres. Also, within the independent atom approximation,
(Note that strictly there should also be Debye-Waller factors in both (39) and (41), but leaving them out here for simplicity does not affect the point being made.)

Narten and Levy [11] showed, using a calculation based on electron orbitals, that which means, if (39) and (41) are correct, that the scattering centre for X-rays must in fact be close to the oxygen nucleus (). On this basis, they chose to ignore the fact that neighbouring water molecules might be orientationally correlated and defined a centre-centre structure factor, assumed to be equivalent to the OO structure factor,
This will be called normalisation III here. There is a slight variant on normalisation III that sometimes appears in the literature, namely,
where is the number of atoms in the molecule. To distinguish this from III, we will call this normalisation IV. The difference from III is that in IV we normalise to the square of the total atomic form factor of the molecule and make *no* assumptions about orientational correlations, whereas in III we normalise to square of the molecular scattering amplitude and have to assume that the orientations of neighbouring molecules are uncorrelated for (42) to be valid. No such approximation is made with normalisation IV, for which will contain contributions from the OH and HH structure factors.

In principle, normalisation III is similar to normalisation I, except that individual atomic form factors in I are replaced by molecular form factors in III. The practice of ignoring the orientational correlations which is required for this normalisation to be working is rarely tested, and it is questionable whether this normalisation has any usefulness beyond simple molecules like O, , , and so forth.

In fact, using the results from a computer simulation of water, there is a simple test that we can perform to see whether in normalisation III the approximation (42) is valid for water. Working within the independent atom approximation then the expression for for water using normalisation IV is
If on the other hand (42) is valid, then with the same normalisation
These two functions are shown in Figure 7 which compares from (46) with as defined by (44), using computer simulated values of the partial structure factors. It can be seen that the two results are close but not identical. Hence, if we are seeking reliable radial distribution functions it is fair to conclude that the Narten normalisation III will *not* give an accurate result. In other words, the contributions from the OH (in particular) and the HH partial structure factors cannot be ignored when Fourier transforming X-ray water data.

Another feature of the normalisation of X-ray data is the form factors themselves, which, within the independent atom approximation, are assumed to be spherically symmetric about the centre of each atom. Clearly, when chemical bonding takes place this approximation should break down, and indeed close comparison of the single molecule scattering within the independent atom approximation with quantum mechanical calculations of the water molecule’s electron distribution [94] suggests indeed that this approximation is inadequate. The difficulty here, and following on from the previous discussion, if we are to introduce a full quantum mechanical molecular form factor into the data analysis, we will need to know the orientational correlations between water molecules beforehand. Since the whole point of the experiment is to learn about water structure, this becomes impossible. To circumvent this difficulty, Sorenson et al. proposed a modification to the independent atom form factors which not only preserves the feature that they are spherically symmetric about each atom, but also captures the chemical binding effects [95]. In essence, a fractional charge is shifted off each hydrogen atom on to the oxygen atom. However, this applies only at low (longer distances) where the effects of chemical binding are more apparent, while at larger (shorter distances), the core electron distribution remains intact. Several other authors have achieved a similar goal by analogous procedures [83, 86, 96].

The main thing to emphasize about X-ray atomic form factors, however, is to use a consistent definition of the form factors throughout the data analysis. Hence, if a particular set of atomic form factors are used to put the data on an absolute scale and normalise the data, then the same form factors should be used when comparing models of the structure with the data. In that way, any approximations built into the data analysis are equally built into the model with which the data are being compared.

##### 4.2. X-Ray Inelasticity: Compton Scattering

As was the case for neutrons, the expression for the scattered intensity of X-rays, (18) in Section 2.2, leaves out an important contribution, namely, the inelastic scattering of X-rays. With X-rays, this inelastic scattering, originally analysed by Compton [97] then revisited by Dirac and Breit [98, 99] to take account of relativistic effects, arises from electron recoil (as opposed to nuclear recoil with neutrons) but unlike the neutron case grows monotonically with increasing , being zero at . This means high quality X-ray data at high are difficult to obtain, because the interference signal rapidly declines at large as the square of the atomic form factors, while the inelastic or Compton scattering background increases. Hence, since the intensity of Compton scattering depends on the efficiency performance of the detector as a function of photon energy, good normalisation of X-ray data to high requires a very stable X-ray source and detector with known characteristics. Some experiments attempt to remove the Compton scattering by energy analysis in the scattered X-ray beam [86, 87], which works well, but the degree of monochromation in the scattered beam required is substantial so that the count rate for elastically scattered X-rays is much poorer than when energy analysis is not used. Hence, experiments with and without energy analysis to date have generally given equivalent results at high .

Figure 8 illustrates the relative strengths of the single atom and Compton scattering for the X-ray experiment on ambient water. However, for any given dataset the Compton scattering is only known to a certain approximation, which means that X-ray scattering data, just as we saw for the neutron scattering case, will always have the potential for an unidentified background to appear due to inadequate subtraction of this scattering. This is almost certainly one reason why repeated measurements of the X-ray scattering pattern for water do not yield identical results.

##### 4.3. Comparison of Data Sets

The nine X-ray data sets for ambient water that will be compared here are from [11, 83–90] and will be labelled Narten1971, Badyal2000, Hura2003, Hart2005, Fu2009, Huang2011, Soper2011, Petkov2012, and Skinner2012, respectively. The most accurate X-ray scattering data from water to date [80] cannot be included in this list as they were not measured under ambient conditions. The data from Narten1971, Hura2003, Fu2009, Huang2011, and Petkov2012 appear to be normalised according to scheme III (the Narten normalisation).

Data from Badyal2000, Hart2005, and Skinner2012 are assumed to be normalised as scheme IV. The data from Soper2011 are normalised according to scheme II. In order to put the various datasets on a common graph the molecular form factors (39) and (41) were calculated using the modified atom form factors of Sorenson et al. [95] and applied to the data of Badyal2000, Hart2005, Soper2011, and Skinner2012, so that all nine datasets are normalised according to scheme III. The results are displayed in Figure 9.

With the exception of a few outliers it is remarkable how close these nine datasets are to each other. As we saw with the neutron scattering data, the variation between different data sets measured at different times by different people at different X-ray sources is a true mark of the experimental uncertainties in measuring these functions.

To avoid any suggestion of bias towards one experiment or another, the data of Figure 9 were interpolated onto a common scale, and the arithmetic (unweighted) mean of all datasets at each Q value where they have values was formed. This is shown in Figure 10(a) together with the RMS deviation of the different data sets about this mean. Figure 10(b) shows the same data renormalised according to scheme II: these will be used in the later analysis of the data in terms of radial distribution functions, Section 6.

**(a) Mean F(Q) normalised using scheme III**

**(b) Mean F(Q) normalised using scheme II**

#### 5. Empirical Potential Structure Refinement (EPSR)

##### 5.1. Introducing EPSR

Before the computer simulation methods of RMC and EPSR were invented, the experimenter was faced with only one choice for the inversion of scattering data such as that shown in Figures 5 and 10, namely, to perform a numerical Fourier inversion of the data. The plethora of methods that have been developed to perform this function would be beyond the present purpose to describe, but in essence the various methods aim to minimise the amount of spurious structure arising from artifacts in the data, such as statistics and systematic error, and maximise the amount of useful information. There is, however, a serious issue here: how do you know what is artifact and what is genuine? Right from the beginning, early workers such as Morgan and Warren [68] and Bol [70] were aware of the potential for Fourier transforms to introduce spurious structure into their radial distribution functions, and *they* often had to perform the transform by hand! Various attempts to bring some degree of objectivity into the process were achieved by the introduction of Maximum Entropy, Minimum Noise, and Monte Carlo methods for calculating [100–103] which allowed aspects like the resolution in the data to be included in the inversion process. Yet right up to the present time, some workers claim that small oscillations in their published radial distribution functions are genuine structural features [86, 88], while others will argue that they are not [89]. It seems to me the only objective way to prove that a given structural feature is not an artifact of the data is to remove it from the space function and see if it tangibly affects the fit to the space data. If it does not detectably affect the fit, it is not genuine.

The idea behind methods like RMC and EPSR is to increase that objectivity to the maximum possible extent. Fundamental quantities like density and restrictions on atomic overlap are already quite severe constraints on the way atoms and molecules can pack in a liquid or glass. Equally, if a system like water is made up of known structures, in this case water molecules, why not build that structure into the model at the outset? Surely, if any of these starting assumptions are not backed up by the data, then we should be able to use the data to improve on our starting assumptions. If on the other hand our data are not sensitive to any assumptions we make, then we have no indication as to whether those assumptions are correct or not.

In the case of water, and even despite the plethora of scattering data available such as demonstrated in this paper, it is still possible to get widely different results for the radial distribution functions depending on which starting assumptions we make, as Pusztai has demonstrated [104]. In this situation, there is only one way to proceed, namely to restrict the starting assumptions to those that we believe are at least physically plausible. In that case, the use of hard cut-offs on near neighbour coordinations and hard bond constraints to define molecules are disallowed, since neither is an accurate representation of how atoms interact at short distances and how molecules are bonded in practice, even in the situation where we may not know those interactions precisely.

Therefore, in this work I shall use EPSR, which does attempt to use realistic constraints on the forces between atoms, both inside and between molecules. The aim is to interpret the X-ray and neutron total scattering data from water in terms of their radial distribution functions. We shall discover that contrary to all the uncertainty about water structure, the combination of the constraints imposed at the outset by the simulation and the constraints on structure imposed by the data lead to a consistent set of radial distribution functions. The comparison between different experiments then allows us to put realistic error bars on these distribution functions for the first time.

##### 5.2. Molecules in EPSR

The molecules in an EPSR simulation are held together by a simple harmonic force law, . Each intramolecular distance is characterised by an expected mean distance, , and Debye-Waller factor, : where is the actual separation of atoms in molecule and , with the reduced mass of the pair of atoms . C is a constant which is adjusted at the outset to give the correct width of intramolecular correlation lines. The use of a generic Debye-Waller factor, , avoids the need to specify this value for a large number of pairs of atoms. When moving atoms within molecules the calculated energy ignores intermolecular interactions so that the molecular structure is preserved whatever happens to the intermolecular forces. However, this introduces a degree of disorder into the intermolecular interactions so that intramolecular moves are only made once for every (typically) 100 molecular translations and rotations.

For the water molecules used in the present simulations, the average OH bond distance was 0.976Å with RMS deviation 0.066 Å, and the average HH distance was 1.55 Å with RMS deviation 0.103 Å. These values were chosen to give a good fit to the wide neutron data and within the uncertainties are consistent with previous estimates.

##### 5.3. The Reference (or Seed) Potential

The intermolecular potential energy function in an EPSR simulation is represented as a sum of two terms, the reference potential (ref), which remains unchanged throughout the simulation, and the empirical potential (EP), which is derived from the scattering data and which can change, depending on the degree of fit to the data for the pair of atoms of type . In principle, the reference potential can take any form, although the present program uses a combination of Lennard-Jones, Coulomb potential, and purely repulsive exponential potentials, the latter being used to control atomic overlap when needed: where , and are the well-depth, core diameter, and electronic charges for the site pair (), respectively, and and are parameters which control the minimum separation and hardness of interaction of the same atom pair. The amplitude is calculated automatically to ensure that no atom pairs of type occur at distances shorter than . For the present simulations, the value of was set to 0.3 to give a relatively soft short range repulsive potential, and the only restriction on distances that was set, in addition to the Lennard-Jones terms, was for the HH intermolecular interaction which was given a minimum distance of 1.6 Å.

For the present work, the TIP4P/2005 potential [79] was used to define the parameters of the reference potential. This involves an extra nonscattering atom to be present which acts as the centre of negative charge on the water molecule displaced away from the oxygen nucleus. After experimenting with several possible reference potentials this one seemed to have the most success at capturing the OO distance obtained in the X-ray data. In addition this potential could be incorporated directly into the existing EPSR scheme without changing the program in any way.

##### 5.4. Introducing the Data

In the current version of EPSR, the empirical potential (EP), which is to be derived from the scattering data, is represented by a series of Poisson functions: with where is a width function ideally set to capture the genuine structure, while masking systematic effects, in the scattering data. The expression (51) has an exact Fourier transform to space [105], so in practice the coefficients are estimated in the measuring space of the experiment, then used to generate the empirical potential in space. This avoids the need for a direct numerical Fourier transform of the scattering data. In fact, the Poisson functions have the useful feature that they vary rapidly at low where repulsive interactions are expected to be strong, but slowly with at high , where artifacts from Fourier transform are likely to be significant. In this way, this representation of the EP using these functions is likely to significantly suppress spurious structure from Fourier termination effects that may be present, but of course no procedure can guarantee that such artifacts do not get carried through into the simulation to some extent. Figure 11 shows some Poisson functions in space for various orders, and their Fourier inversions to space, and it will be noted how the higher order functions progressively search for smaller and smaller values. As a result, if a material has significant small scattering, it will be important to generate the EP out to large distances for it to take full account of the data.

(a) |

(b) |

Further information about how the scattering data are used to generate the coefficients in (50) can be found in previous work [30].

#### 6. The Radial Distribution Functions for Water at 300 K

For all the EPSR simulations reported here, there were 1000 water molecules in a cubic simulation box of dimension 31.0516 Å at a temperature of 300 K. The other parameters of the simulation have already been stated in the previous section. A separate EPSR simulation was performed for each of the six neutron sets of data shown in Table 1 using in addition the merged X-ray scattering data shown in Figure 10. Given the good overlap between the different X-ray data sets, it did not seem pertinent to perform the corollary of this, namely, to compare the merged neutron data sets with the individual X-ray data sets. An additional EPSR simulation was performed using the merged neutron data, as shown in Figure 5, plus the merged X-ray data, as shown in Figure 10. Because there was only one X-ray scattering data set to several neutron data sets for each experiment, the X-ray data were given a relative weight of 5 compared to the neutron data’s weight of 1 in calculating the inversion of the scattering weights matrix. This was to ensure that the fit to the X-ray data was given the same emphasis as that to the neutron data. (Obviously, a variety of data weighting schemes could be contemplated here, such as that based on the relative (statistical) uncertainties in the data. However, as has been stated several times in this paper, statistical uncertainties are not the only source of error in these experiments and in practice are mostly much smaller than other systematic effects. Fortunately, in the present instance, weighting one data set over others does not have a major impact on the resulting radial distribution functions that are extracted from the EPSR simulations.)The radial distribution functions from the individual simulations were then combined to give an average set of radial distribution functions for the six neutron datasets, to be compared with those obtained when using the merged scattering data for both X-ray and neutron experiments. The RMS deviation of the individual s from the average gives an estimate of the likely uncertainty in the average distribution functions. Finally, we turn to the question of isotope effects on water structure and whether the structural differences between light and heavy water are really discernible in scattering experiments of these kinds.

##### 6.1. Why EPSR?

Before proceeding to present the EPSR fits to the various datasets, the question needs to be addressed: why use EPSR on water? To illustrate the point, using the inversion of the neutron and X-ray scattering weights matrix, it is possible to estimate the site-site partial structure factors and hence the site-site radial distribution functions directly from the scattering data [30]. Figure 12 shows that these estimated site-site radial distribution functions as derived from the merged X-ray and neutron scattering data (Figures 5 and 10), alongside the EPSR fit to the same data. Also shown is the effect of running the EPSR simulation with the empirical potential set to zero, that is, just with the reference potential on its own. It can be seen that there is a marked difference between the two simulations.

It must be emphasized that the use of the TIP4P/2005 parameters in the current reference potential will NOT necessarily give the same s as those published for this potential [79]. This is because the disordered molecule geometry used in EPSR is different from the fixed molecule geometry that is used in the definition of this potential. In addition, the OH bond lengths used to define the molecular geometry in TIP4P/2005 are incorrect for the water molecule in the liquid state.

A similar figure could have been drawn for other effective potentials such as SPC or ST2 [106] with similar conclusions. The point being made is simply that many, if not most, effective potentials for water use the incorrect molecular geometry and do not capture the observed structure of water correctly. It is true that they may capture the qualitative features of water structure to a greater or lesser extent, but the agreement is not quantitative. At the same time, the fact that EPSR *can* obtain a reasonable fit to the data suggests that such an effective potential which reproduces the experimental structure of water does exist. Moreover, as can be seen in Figure 12, there are regions at low where the data are clearly implying unphysical behaviour, for example, negative and positive at unphysically short distances. As already discussed, this can arise from problems in the data acquisition, inadequate data corrections, or from the assumption as here that light and heavy water have the same structure. Nonetheless, because EPSR includes a reference potential which at least qualitatively captures the structure of water, this unphysical behaviour is excluded from the phase space sampled by the EPSR simulation.

The same figure raises another important issue about how scattering data from water in the present instance, but from other liquids and disordered systems more widely, are interpreted. Rietveld refinement is a process used by crystallographers to build a model of a crystal structure which is consistent with a set of diffraction data [107]. This process, however, does not only include the scattering data: it requires knowledge of crystal space groups and symmetries, bonding constraints, atomic overlaps, peak shape functions, and so on, as well of course as the data themselves. The model does not depend *only* on the data: the structure that is produced is the best fit to the scattering data subject to all the other constraints being satisfied. No crystallographer would be able to publish a set of diffraction data without also presenting a structure refined against those data. Protein crystallography for example would be a dead science if it were not for the fact that at the end a model of the protein structure is produced.

EPSR is an attempt to do exactly the same as what is done for crystals, namely refine a structure of water against a set of scattering data. The process has to be different because the constraints are different. There are no crystal symmetries to be satisfied, but we often have a good idea of the structure of the component molecules. The shape of the experimental resolution function, which can be critical to a successful crystal structure refinement, is relatively unimportant for liquid structure refinement due to the intrinsic broadness of the peaks. Whilst we do not know the interaction potential between water molecules precisely, there are a large number of estimates of this potential, so we should have and do have a good starting point for the structure refinement process. We have to be acutely aware of the likelihood that data artifacts can overly influence our conclusions, and the use of Poisson functions (50) to represent the empirical potential is one way to lessen the influence of these artifacts. The results shown in Figure 12 indicate clearly that the EPSR refined structure relies heavily on the contribution from the scattering data, as well as the constraints imposed by the molecular geometry and reference interaction potential. These results are, therefore, correctly regarded as an experimental structure determination. Simply running a computer simulation without structure refinement and comparing this with scattering data as some have done, for example [38], may tell us something about the simulation and the interaction potential used, but it tells us little about what the measured data imply for the structure water.

##### 6.2. A Current Set of s for Water

As an example of the EPSR fits to a particular dataset, the fits from experiment NIM-10-3 are shown in Figure 13. It can be seen that generally the data are well reproduced by the simulation, although exact agreement is impossible to achieve. The main discrepancies occur at low where, as has already been seen, the pulsed neutron data used here are least accurate. The corresponding fits in space are shown in Figure 14. Again it seems that reasonable agreement in terms of peak heights and positions is obtained. In particular the discrepancies seen at low in space do not appear to overly distort the space fits.

**(a)**

**(b)**

Figure 15 shows the EPSR fits to the merged neutron and X-ray data, Figures 5 and 10. As might be expected, the discrepancies at low are not as large in this case as for the individual neutron data, Figure 13.

**(a)**

**(b)**

Given these structure refinements against individual and merged neutron data sets and against the merged X-ray data sets, Figure 16(a) shows the site-site radial distribution functions obtained in the individual EPSR simulations together with the arithmetic mean of each distribution function. It can be seen that from experiment to experiment, there is some variation in the extracted distributions functions, particularly in the positions of the first peaks: most experiments are close to the mean, but there are some outliers, as might be expected in any experiment where the systematic effects cannot be completely controlled. Generally, however the means of these distribution functions are close to the EPSR simulation in which the merged neutron and X-ray data are used (see Figure 16(b)), and are in good agreement with what has been obtained in previous EPSR refinements of water structure [10, 24, 42], which often used different scattering data and reference potentials. This argues for the general robustness of the EPSR technique for extracting structural data from water scattering data. For example, if one compares the coordination number for each of these site-site distribution functions integrated over the first peak, the results are notably consistent across the different simulations; see Table 2. The mean coordination numbers and their RMS deviations are OO: ; OH: ; and HH: .

**(a)**

**(b)**

One notable trend has been the slight reduction in the height and the increase in positions of the first OO peak. In 2000, and based purely on neutron scattering data, this peak had a height of 2.75 at a position of Å, whereas with the present data it has a height of 2.49 at a position of Å. This change arises as a result of the introduction of X-ray scattering data into the structure refinement [10] and is in good agreement with recent new X-ray data [80, 89] where the contribution of the OH distribution function to the X-ray total scattering is correctly accounted for. In fact, this trend in the OO peak is also supported by the neutron data. If you look closely at the region of the OO first peak in the Fourier transform of the fits for the X-ray data and O and O neutron data (Figure 17 (arrowed)), there is no sign of a need for extra intensity in this region, which suggests that the revised height for this peak is correct given all the available data.

Using the refinements against the six individual neutron data sets, an estimate of the likely uncertainties in these radial distribution functions can be obtained by calculating the mean square deviation of individual refined distribution functions against the averaged distribution functions. These deviations are shown as the error bars in Figure 18. They show that there is in fact, as anticipated above, some uncertainty in the positions of the first peaks, particularly for the OH and HH distributions, but less variation in the peak heights between different data sets.

**(a)**

**(b)**

##### 6.3. Are the Structural Differences between Heavy and Light Water Measurable?

In 2008, we published the first attempt to extract the radial distribution functions separately for heavy and light water, using a combination of EPSR simulation, X-ray scattering data for light and heavy water, and neutron scattering data for light and heavy water [42]. A particular result to emerge from that study was that the OH bond length might be as much as 3% longer in O than in O, although, based on the evidence presented in the present work, it seems that assignment may have been a consequence of inadequate correction for inelasticity effects in the earlier data. Beyond the OH bond length issue, however, there appeared to be small but measurable differences in the intermolecular structure between the two liquids with light water appearing to be a slightly more disordered version of heavy water. It is important, therefore, to establish whether the data produced by the new analysis presented in this work support the previous conclusions on differences in water structure between the two liquids.

In the meantime, Zeidler and coworkers [38, 40] have performed oxygen isotope difference experiments on heavy and light water separately. Comparing their data to path integral simulations with the TTM3-F potential, they concluded that differences between heavy and light water are smaller than might be expected on a simple analysis due to “competing quantum effects” which cause the proton in the OH bond to sample more of the anharmonicity in this bond than for the deuteron in the OD bond. It has to be said, however, that this class of potentials is known to tend to underestimate the measured X-ray scattering differences between light and heavy water [108]; see [109] for example, although the simulated differences in structure between heavy and light water shown in that work are in fact quite similar to what was presented in 2008 based on scattering data.

The following account shows the effect of using the new merged neutron data on heavy and light water and the previously published X-ray data [85], in separate EPSR simulations of the two liquids. In addition, since the oxygen isotope data are now available, these are included in further EPSR simulations for comparison purposes. For these separate simulations of light and heavy water number, a densities were set to their respective values at 298 K, namely, 0.1000 atoms/Å and 0.0996 atoms/Å [110].

Figure 19 shows the EPSR results when fitting just to the total neutron and total X-ray scattering data, while Figure 20 highlights the differences between fit and data. It is found that the simulation that uses only the HO data to refine the EP (Figure 19(a) and Figure 20(a)), fits the HO data better than the DO data. Similarly, in the simulation that uses only the DO data to refine the EP (Figure 19(b) and Figure 20(b)), the fits to the DO data are better than to the HO data. This does not necessarily mean that structural differences between the two simulations are real, since it could simply be that without the DO data in (a) and the HO data in (b) the empirical potential is less well constrained, and so, structural differences are allowed to develop in the simulations. Therefore, the structural differences that appear by this method will represent the upper limit to true structural differences between the two liquids.

(a) Fit to data |

(b) Fit to data |

(a) Fit to data |

(b) Fit to data |

The corresponding radial distribution functions are shown in Figure 21. There are both similarities and differences to what was shown previously [42]. Generally, the differences between the and functions for the two liquids are similar to what was reported previously, with light water appearing as a slightly more disordered version of heavy water, particularly as seen in the function. The most notable change is the OH intermolecular peak which previously was quite asymmetric compared to OD but now is much more similar to OD in shape. Also, it is now at a slightly *larger* distance compared to OD. Regarding these differences between the present and earlier work, it should be pointed out that the EPSR simulations presented in [42] used, on the basis of the scattering data presented in that work, an intramolecular OH bond ~3% *longer* in HO compared to the same bond in DO, whereas in the present simulations, based on Figure 6, the two bond lengths were set the same. Changing the bond length in this manner could have affected the outcome in the previous work.

It is informative to establish whether the new oxygen isotope difference neutron data [38] can help clarify these trends. To this end, a separate pair of EPSR simulations were run, one for HO and one for DO, in which the corresponding oxygen isotope first-order difference data were included in the structure refinement. The neutron weights were calculated using the revised neutron scattering lengths quoted in that paper, and because the amplitude of these data is very much smaller than the corresponding total interference functions, the weights for the oxygen isotope difference data were multiplied by a factor of 20 to give them extra emphasis when calculating the weights inversion matrix.

The fits to the data are shown in Figure 22, and the associated radial distribution functions are given in Figure 23. It becomes clear that these EPSR simulations of water can fit the oxygen isotope difference data, but that there are some lingering discrepancies in the DO oxygen isotope data that, even with the extra emphasis given to these terms in the scattering weights matrix, are not correctly reproduced by the simulation. There is currently no obvious explanation for these discrepancies, although it is notable that the fit to the total neutron DO data gets slightly worse when the oxygen isotope data are included—compare Figure 20(b) with Figure 22(b). In addition, the simulation with the TTM3-F potential in [38, 40] showed a similar discrepancy near Å in the DO oxygen isotope difference data.

(a) Fit to data |

(b) Fit to data |

The effect of including these extra data in the EPSR simulation does not appear to change the overall outcome significantly—compare Figure 21 with Figure 23: there remain residual differences in structure between the two simulated liquids. To what extent the apparent inability of the current EPSR simulations to fit the oxygen isotope difference data for DO is affecting the outcome is unclear. To try to understand why this misfit might be happening, the EPSR fit and data for the neutron total scattering data, X-ray total scattering and the neutron oxygen isotope data were Fourier transformed over the corresponding range for each data set. The results are shown in detail in Figure 24. It can be seen that in the region of the hydrogen bond peak near 1.85Å (arrowed), the negative intensity in the HO oxygen isotope data is *weaker* than the fit, although this does not appear to be significant within the statistics; see Figure 22(a). For the DO oxygen isotope difference data on the other hand, the same (positive) peak in the Fourier transform of the data is too large compared to the simulation, which has difficulty generating the intensity demanded by the oxygen isotope data in that region. In contrast, in the region of the OO first peak at Å, the EPSR simulation is overestimating the DO data but apparently has no problem with the HO data in the same region. The simulated intensity for DO could only be lowered in this region by reducing still further the height of the first OO peak, but that would then cause a misfit to the total scattering X-ray data, which are well fit by the simulation. (The OH radial distribution function is the only other contributing function to this data set and has low intensity in this region, as shown in Figure 23.) Another possibility is that the X-ray data used in these simulations is overestimating the height of the OO first peak: we have already seen in Section 6.2 that the latest analyses suggest a lower OO first peak than previously shown. Whether this height reduction would be enough to explain this unresolved discrepancy between these oxygen isotope difference data for heavy water and other neutron and X-ray total scattering data on the same material remains to be seen. These relatively minor differences should, however, not distract us from the main message from these data, namely, that independent experiments on independent sources with independent experimental teams are finally leading to a consistent view of the structure of water. The use of a computer simulation procedure such as EPSR has proved crucial here in identifying the similarities and residual discrepancies between different measurements.

(a) Fit to data |

(b) Fit to data |

#### 7. Conclusion

In the foregoing account, a thorough appraisal has been performed of the neutron, and X-ray total scattering techniques used to investigate the structure of water in terms of the site-site radial distribution functions. With both techniques it is shown there are significant unwanted contributions to the scattering that can introduce substantial systematic error and as a result prevent a direct interpretation of the data in terms of distribution functions.

With neutron scattering, the most serious issue by far is the inelastic scattering from light hydrogen, which combined with the large single atom scattering (compared to the useful interference scattering) produces a substantial wavevector dependent, background that has to be removed before any analysis of the structure can proceed. In this paper, I demonstrated for the first time, as shown in Section 3.3, how a (largely) model independent method can be constructed which removes the bulk of the inelastic scattering from pulsed neutron source data, while preserving the important interference function. Being much less subjective than previous methods—the only parameter the user needs to supply is a likely minimum separation of atoms for the system being studied—means that a range of data from different instruments can be compared in a meaningful way and can be used to put realistic error bars on the interference differential cross section for the first time. It is conceivable that in the near or medium future a more accurate method of calculating the inelastic scattering will become available, perhaps based on models of the dynamic scattering law, but this possibility is in the future rather than being a practical solution that can be applied to data being measured today.

With X-rays, several, perhaps less serious, issues arise. The data have to be corrected for Compton scattering which dominates the total scattering at high . It is difficult to give a precise estimate of this scattering, depending as it does on the beam characteristics and detector properties. In addition, particularly for water, it is not clear that using independent atom form factors is ultimately the correct way to proceed given that these factors are almost certainly modified by the chemical binding within the water molecule and potentially by the hydrogen bonding to the neighbouring molecules. Finally, the traditional assumption that orientational correlations between neighbouring molecules can be ignored when reducing X-ray scattering data to distribution function is shown to be only approximately correct. However, using a computer simulation method such as EPSR to form a model structure of the system avoids the need to make this approximation in the first place. Overall, however with a number of water X-ray scattering experiments to choose from over recent years, it seems the degree of overlap between independent measurements is good (see Figure 7), and it is difficult to see how this situation can be improved very much in the foreseeable future.

To put all these data together into a single structural model, it is necessary to resort to computer simulation. This is partly because the data by themselves are often not good enough to give a unique set of radial distribution functions, and partly because the X-ray data in particular contain contributions from the OH (mostly) and HH (very weakly) distribution functions which need to be accounted for if approximations concerning the X-ray data are to be avoided. Here, it seems that empirical potential structure refinement (EPSR) is a useful method, as shown in Section 5, since it allows an initial structural model of the system to be built based on an existing interaction potential, then perturb this potential in a manner that gives the best agreement with the scattering data, while preserving the constraints imposed by the potential, as shown in Figure 12. This method is apparently relatively resistant to residual experimental artifacts that may be present in the scattering data, although even then there are notable variations from one experiment to another, as shown in Figure 16(a). In principle, such a method can also allow the introduction of other data, such as nuclear magnetic resonance data (NMR) [111] or extended X-ray absorption fine structure data (EXAFS) [112] into the structure refinement process, although this is not particularly helpful in the case of pure water. Reverse Monte Carlo could and has been used for the same purpose, but to date, the results do not seem to give overly consistent outcomes starting from different data sets [31, 32].

Combining a merge of the different neutron data sets and a merge of the existing X-ray scattering data with EPSR simulation gives a set of radial distribution functions which are common to a number of datasets. Deviations of individual data sets from the average distributions lead for the first time to a genuine estimate of the uncertainty in these distributions. In fact, given the range of data that has been input into the process, the size of these uncertainties in the extracted radial distribution functions is surprisingly small (see Figure 18), and indicates that in practice peak positions are less certain than peak intensities.

Regarding the quantum effects on water structure that were previously analysed with EPSR [42], it seems that the newer data require some revisions to the previously published results. In particular, using the new method for removing the inelasticity effects, the OH bond length in HO does not appear so markedly different from the OD bond length in DO as was found previously. Perhaps corresponding to this change, the differences in the OH and OD hydrogen bond distributions between the two liquids that were reported previously need to be modified, as shown in Figure 21. Recently, new neutron scattering data using the oxygen isotope difference method, measured on a reactor neutron source (fixed neutron energy) with a different sample geometry and different experimental team, have become available. Remarkably, the previously published EPSR simulations as well as the present EPSR simulations overlap these new data extremely well [43], with some residual small discrepancies to be resolved. This is a very positive outcome because it means we may be at last getting to the bottom of a problem that has plagued scientists for nearly a century: what IS the structure of water? The combination of techniques described in this paper now seems to be giving us a consistent answer.

Numerical values for the main results of the current work are listed in the Appendices.

#### Appendices

#### A. Table of Merged Neutron Scattering Data for Ambient Water

See Table 3.

#### B. Table of Merged X-ray Scattering Data for Ambient Water

See Table 4.

#### C. Radial Distribution Functions for Ambient Water

See Table 5.

#### Acknowledgments

The author would like to thank Chris Benmore, Philip Salmon, Valeri Petkov and Greg Hura for kindly supplying their X-ray or neutron scattering data in numerical form, and Daniel Bowron and Tristan Youngs for help with the neutron scattering experiments and for comments on the final paper.