Abstract

The ability to collect unprecedented amounts of astronomical data has enabled the nomical data has enabled the stu scientific questions that were impractical to study in the pre-information era. This study uses large datasets collected by four different robotic telescopes to profile the large-scale distribution of the spin directions of spiral galaxies. These datasets cover the Northern and Southern hemispheres, in addition to data acquired from space by the Hubble Space Telescope. The data were annotated automatically by a fully symmetric algorithm, as well as manually through a long labor-intensive process, leading to a dataset of nearly galaxies. The data show possible patterns of asymmetric distribution of the spin directions, and the patterns agree between the different telescopes. The profiles also agree when using automatic or manual annotation of the galaxies, showing very similar large-scale patterns. Combining all data from all telescopes allows the most comprehensive analysis of its kind to date in terms of both the number of galaxies and the footprint size. The results show a statistically significant profile that is consistent across all telescopes. The instruments used in this study are DECam, HST, SDSS, and Pan-STARRS. The paper also discusses possible sources of bias and analyzes the design of previous work that showed different results. Further research will be required to understand and validate these preliminary observations.

1. Introduction

While cosmological-scale isotropy is an elemental working assumption in cosmology, multiple observations using different probes have shown evidence of large-scale anisotropy. In addition to the anisotropy in the cosmic microwave background (CMB) [17], large-scale anisotropy has been reported by analyzing the distribution of short gamma ray bursts [8], Ia supernova [9, 10], LX-T scaling [11], [12], dark energy [1316], high-energy cosmic rays [17], quasars [18, 19], and the frequency of galaxy morphology types [20]. Another specific observation that violates the cosmological isotropy assumption is the existence of the CMB Cold Spot [2123]. A correlation between higher and the CMB dipole has also been reported [12, 24].

The observations of large-scale anisotropy, and especially the anisotropy observed in the CMB, have led to models that shift from the standard cosmology. Explanations include primordial anisotropic vacuum pressure [25], double inflation [26], moving dark energy [27], contraction prior to inflation [28], multiple vacua [29], and spinor-driven inflation [30]. Some explanations are related to the geometry of the universe, such as ellipsoidal universe [2, 3134], and rotating universe [3540], where the large-scale anisotropy is expected to exhibit itself through a cosmological-scale axis.

The existence of a cosmological-scale axis has also been linked to theories such as holographic big bang [41, 42], and black hole cosmology [4345], which is also related to flat space cosmology [46, 47]. These theories explain cosmic inflation without the need for dark energy. On the other hand, other cosmological models suggest that the possibility that dark energy itself is anisotropic cannot be ruled out [13, 14].

This study is focused on the probe of spin directions of spiral galaxies. A spiral galaxy is a unique extragalactic object in the sense that its visual appearance is sensitive to the perspective of the observer. The spin directions of galaxies have been shown to be aligned within filaments on the cosmic web [48], but an alignment in the spin directions of galaxies was also observed when the galaxies are too far from each other to have gravitational interactions [49, 50]. A statistically significant correlation was also found between the spin direction of galaxies and cosmic initial conditions, proposing galaxy spin directions as a probe to study the early universe [51]. As these links are defined as “mysterious” [50], the distribution of spin directions of spiral galaxies in the universe is still unknown.

In the past four decades, several studies provided evidence of nonrandom distribution in the spin directions of spiral galaxies These research efforts started as early as the 1980’s [52] with smaller datasets of several hundred spiral galaxies, and found nonrandom distribution with certainly of 92% [52]. With the deployment of robotic telescopes that generate large astronomical databases, other studies using larger datasets of galaxies also showed evidence of nonrandom distribution [49, 50, 5365]. On the other hand, other previous work argued that galaxy spin directions are distributed randomly [6669]. These studies are described and analyzed in Section 5 of this paper.

The disagreements between the results of different studies reinforce further analysis of the large-scale distribution of galaxy spin directions. Previous studies, whether argued that the distribution was random or not, used analyses such that all galaxies being analyzed were collected by the same instrument, which limited the size of these datasets. More importantly, analyzing data from a single instrument limits the size of the dataset footprint. To determine the nature of the distribution, it is therefore required to analyze large datasets of galaxies that cover a relatively large footprint of both the Northern and Southern hemispheres. The large number of galaxies can also enable sufficient statistical significance to determine whether the distribution of spin directions is random.

Here, data from several different instruments used in previous studies [61, 62, 64, 65] are combined into a single large dataset, providing a dataset of nearly 10 galaxies and a far larger footprint compared to any other dataset used for that purpose in the past. This “meta-analysis” provides a more accurate profile compared to analyses based on datasets of smaller footprints. The profile observed with the combined dataset is also compared to the profiles observed with the datasets collected by single instruments. In addition to the analysis of a possible dipole axis done in [65], the paper also analyzes quadrupole alignment.

2. Data

The data were collected from four different telescopes. These include the Dark Energy Camera (DECam), Sloan Digital Sky Survey (SDSS), the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS), and the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) sky survey imaged by Hubble Space Telescope (HST). The number of galaxies from each source is 807,898 from DECam, 33,028 from Pan-STARRS, 8,690 from HST, and 117,638 from SDSS.

SDSS, Pan-STARRS, and the DESI Legacy Survey imaged by DECam are currently the largest and most productive digital sky surveys, with the largest footprints compared other Earth-based digital sky surveys. The data collected by these telescopes are publicly available, making these sky surveys suitable for this study. The sky surveys were also selected such that their combination covers both the Southern and Northern hemispheres. That provides a far larger footprint compared to any other previous study of this kind.

In addition to the three Earth-based telescopes, another sky survey that was used was the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS), imaged by HST. With the exception of the new James Webb Space Telescope (JWST), HST is the most productive space telescope working in the optical wavelength, providing substantial image data throughout its over two decades of service. As a space-based instrument, HST cannot match the vast bandwidth provided by the Earth-based sky surveys, and therefore, the number of galaxies imaged by HST is substantially smaller compared to the other telescopes. The main advantage of HST is that it is not subjected to atmospheric effect, and therefore, no unknown atmospheric effect can impact the analysis.

Because the footprints of the different sky surveys overlap, it is expected that some galaxies would be imaged by more than one telescope and therefore can appear more than once after combining the four datasets into one. To avoid the same galaxy appearing in the dataset more than once, all objects in the combined dataset that had another object within less than were removed. The exceptions are the galaxies imaged by HST, where the fields are dramatically smaller compared to the other sky surveys. HST galaxies are not bright enough to be imaged by the other sky surveys in a manner that allows to identify their spin direction and are therefore not expected to be present in any of the other datasets. Combining all datasets provided a dataset of 958,841 different galaxies such that each galaxy appeared in the dataset exactly once. The specific datasets are described below.

2.1. DECam Data

The dark energy camera (DECam) of the Blanco 4 meter telescope is a powerful imaging instrument [70, 71] capable of covering deg , mostly from the Southern hemisphere. The DECam data were retrieved through the DESI Legacy Survey [72], which provides access to data acquired by multiple different instruments, including DECam.

The list of objects was retrieved from Data Release 8 of the DESI Legacy Survey and included all objects imaged by DECam identified as galaxies, and had magnitude of less than 19.5 in either the g, r or z band. That provided a list of 22,987,246 objects identified as relatively bright galaxies. The images of these objects were downloaded by using the cutout service of the DESI Legacy Survey server. Each image is a 256256 JPEG image. The Petrosian radius was used to scale the image such that the object fits in the frame. To ensure full consistency of all images, all images were downloaded by the exact same computer. The process of downloading such a high number of galaxies lasted nearly nine months as described in [64].

Due to the high number of galaxies, the annotation of the galaxies by their spin directions required an automatic process. Such process must be mathematically symmetric and therefore needs to be based on clear define rules. While machine learning and specifically convolutional neural networks have been becoming very prevalent for solving problems in automatic analysis of galaxy images, they are based on complex data-driven automatically generated rules that are very difficult to conceptualize. Because these rules are complex and nonintuitive, it is very difficult to verify that they are fully symmetric. For instance, neural networks are based on initial random weights that change during the training process, and differences between the images in the classes of the training set or even the order by which the images are being used in the training process can lead to differences in the neural network. That makes it virtually impossible to verify that the neural network is completely symmetric. Machine learning algorithms also tend to make forced choices. That is, even if the galaxy does not have a clear spin direction, the machine learning system will be forced to make a prediction. Slight asymmetries in the model that are very difficult to identify can therefore lead to small but consistent bias. More details about the possible consequences of using machine learning for this task are provided in Section 4.

To have a fully symmetric annotation, the Ganalyzer algorithm was used [73]. Ganalyzer is a model-driven algorithm that works according to mathematically defined rules, and it does not rely on data-driven rules or training data. Ganalyzer first transforms each galaxy image into its radial intensity plot. The radial intensity plot of an image is a 35360 image, such that the pixel in the radial intensity plot is the median value of the 5 5 pixels around coordinates in the original galaxy image, where r is the radial distance measured in percentage of the galaxy radius, is the polar angle measured in degrees, and are the pixel coordinates of the center of the galaxy.

Because arm pixels are expected to be brighter than nonarm pixels at the same radial distance from the galaxy center, peaks in the radial intensity plot are expected to correspond to pixels on the arms of the galaxy at different radial distances from the center. Therefore, peak detection [74] is applied to the lines in the radial intensity plot.

Figure 1 shows examples of two galaxies, their radial intensity plots, and the peaks identified in the radial intensity plots. As the figure shows, each arm is reflected by a vertical line of peaks. One of the galaxies has two arms and therefore two vertical lines of peaks. The other galaxy has three arms and therefore three lines of peaks. The direction towards which the peaks are aligned reflects the spin direction of the galaxy. More information about Ganalyzer can be found in [5760, 62, 64, 65, 73, 75].

The Cartesian coordinates of each peak are , where is the polar angle of the peak compared to the galaxy center , and is the radial distance from the galaxy center. The linear regression slope formed by these points is determined simply by the value of that satisfies . If the slope is positive, the galaxy can be determined as spinning clockwise, while if is negative, the galaxy is a counterclockwise galaxy. For example, Figure 2 shows the linear regression line of the peaks of the leftmost line of the bottom galaxy shown in Figure 1. The slope of is positive, and therefore, the galaxy can be determined to be spinning clockwise.

Not all galaxies are spiral galaxies, and not all spiral galaxies have identifiable spin direction. Therefore, the majority of the galaxies that were downloaded cannot be used for the analysis due to the fact that their spin direction cannot be identified. For that reason, galaxies that have at least 30 identified peaks in the radial intensity plot aligned at the same direction can be used. Galaxies that do not meet that threshold are rejected regardless of the sign of the linear regression of their peaks. That leaves a collection of 836,451 galaxies in the dataset that were assigned with an identifiable spin direction. Some of these galaxies are close satellite galaxies or other large extended objects inside a larger galaxy. Previous work suggested that the presence of duplicate objects can inflate the statistical significance [68], and experiments by duplicating the objects artificially showed that an extremely high number of such objects can affect the statistical signal [63]. A detailed discussion about the presence of duplicate objects is provided in Section 5. To remove such objects, objects that have another object within less than away were removed. That left 807,898 galaxies in the dataset.

To test the consistency of the annotations, 200 random galaxies annotated as clockwise and 200 random galaxies annotated as counterclockwise were inspected manually, as was done in [62]. The visual inspection showed that none of the galaxies annotated by the algorithm as spinning clockwise was visually spinning counterclockwise, and none of the galaxies annotated as counterclockwise was by manual inspection spinning clockwise. Obviously, this small-scale test does not guarantee that no galaxies are misclassified, as the number of galaxies is too large to inspect manually. However, the test suggests that the number of misclassified galaxies is expected to be small compared to the size of the data. More importantly, because the algorithm is symmetric, misclassified galaxies are expected to be distributed evenly between the different spin directions and therefore cannot lead to asymmetry as explain theoretically and empirically in Section 4.

To ensure the consistency of the galaxy annotation process, all images were analyzed on the exact same computer. That ensured that different system settings do not impact the analysis. Although there is no known computer system fault that can lead to differences in the annotation, full consistency was ensured by using just one computer system with a single processor. The annotation of the galaxies required 107 days of operation using a single Intel Xeon processor at 2.8 Ghz.

Tables 1 and 2 show the distribution of the galaxies by their right ascension and declination ranges, respectively. The DECam galaxies do not have redshift, and therefore, the distribution of the redshift was determined by using a subset of 17,027 galaxies that had redshift in the 2 dF data release [76]. Table 3 shows the redshift distribution of the DECam galaxies.

2.2. SDSS Data

SDSS is an established digital sky survey that covers over 1.4 deg , mostly in the Northern hemisphere. To study SDSS data, two datasets from SDSS that were used in previous studies were combined into one larger dataset. The two datasets were a dataset of galaxies with redshift [60, 62] and another dataset of SDSS galaxies that do not have spectra. The preparation of these datasets is described in [62, 63]. Both datasets were prepared by annotating the galaxies automatically as described in Section 2.1.

Since the two datasets are prepared from the same sky survey, their footprint naturally overlaps, and some galaxies are included in both of datasets. To remove galaxies that appear in the combined dataset more than once, all objects in the combined dataset that had another object in the dataset within less than 0.01 were removed. That provided a combined dataset of 117,638 distinct galaxies. Table 4 shows the RA distribution of the galaxies. More information about the distribution of the data and the way it was collected can be found in [62, 63].

2.3. Pan-STARRS Data

The third digital sky survey used in this study is a dataset of galaxies from Pan-STARRS DR1 [62]. The initial set included 2,394,452 Pan-STARRS objects identified as extended sources by all color bands [77]. These galaxies were classified automatically by Ganalyzer [73] as described in Section 2.1 and with more details in [5760, 62, 73]. That process provided 33,028 galaxies imaged by Pan-STARRS and annotated by their spin direction. The distribution of the galaxies by their RA is shown in Table 5. More information about the collection of the dataset and the distribution of the data can be found in [62].

2.4. Hubble Space Telescope Data

Although there is no atmospheric effect that can flip the spin pattern of a galaxy as observed from Earth, space-based observation can eliminate the possible impact of some unknown atmospheric effects that might make a galaxy spinning clockwise look as if it spins counterclockwise. For that purpose, a dataset of space-based observations was prepared from the Hubble Space Telescope (HST) Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey [78, 79]. The collection and preparation of that dataset is described in [61].

The dataset was taken from several HST fields: the Cosmic Evolution Survey (COSMOS), the Great Observatories Origins Deep Survey North (GOODS-N), the Great Observatories Origins Deep Survey South (GOODS-S), the Ultra Deep Survey (UDS), and the Extended Groth Strip (EGS), providing an initial set of 114,529 galaxies [61]. The image of each galaxy was extracted by using mSubimage [80] and converted into 122 122 TIF (Tagged Image File) image.

The initial number of galaxies imaged by HST is far smaller compared to other digital sky surveys such as the DECam survey. While the automatic analysis is fully symmetric, it also leads to the sacrifice of many galaxies that their spin direction cannot be identified with high certainty. Because the number of HST galaxies is smaller, the galaxies were annotated through a long labor-intensive process. During that process, a random half of the images were mirrored for the first cycle of annotation, and then all images were mirrored for a second cycle of annotation as described in [61] to offset the possible effect of perceptional bias. That provided a clean and complete dataset that is also not subjected to atmospheric effects [61]. The total number of annotated galaxies in the dataset was 8,690, and the distribution of the galaxies in the different fields is shown in Table 6. Obviously, the HST galaxies are much more distant compared to the other telescopes and has mean redshift of 0.58 [61].

The only parts of the sky that are covered by all four surveys are the HST fields. From these five fields, only the COSMOS field has a sufficient number of galaxies that allows certain statistical analysis. That field is also within the footprint of SDSS, Pan-STARRS, and DECam. Table 7 shows the number of spiral galaxies spinning clockwise and the number of spiral galaxies spinning counterclockwise in the different datasets described in Section 2.

The size of the COSMOS field is merely 2 square degrees. HST can naturally go deeper than any Earth-based sky survey, and therefore, the number of galaxies in the COSMOS field is far larger than the number of galaxies in the same field in all the other sky survey. To have a sufficient number of galaxies that can allow statistical analysis, galaxies of the other digital sky survey were counted at the field centered at the COSMOS field.

3. Results

The asymmetry A in each sky region is measured simply by , where is the number of galaxies spinning clockwise, and ccw is the number of galaxies spinning counterclockwise. The error is determined by the normal distribution standard error of , where N is the total number of annotated galaxies in the sky region. Figure 3 shows the asymmetry between the number of clockwise galaxies and the number of counterclockwise galaxies in the hemisphere centered at each RA, as well as the same measurement made in the opposite hemisphere. The figure shows that the asymmetry in one hemisphere is nearly exactly inverse to the asymmetry in the opposite hemisphere, and therefore, the mean of the asymmetry observed in the opposite RA hemispheres is very close to zero. The figure also shows that the asymmetry is inverse in opposite hemispheres and peaks around the hemisphere of .

Table 8 shows a simple analysis by separating the galaxies by the RA range into the hemisphere and the opposite hemisphere . The value is the binomial probability to have such difference or stronger by chance when the probability for a galaxy to spin clockwise or counterclockwise is assumed at 0.5. Although the analysis is simple and does not account for differences in the declination, it still shows a higher number of galaxies spinning clockwise in one hemisphere and a higher number of galaxies spinning counterclockwise in the opposite hemisphere.

3.1. Analysis of a Dipole Axis in the Distribution of Galaxy Spin Directions

The analysis shown above shows certain evidence that the sky can be separated into two hemispheres such that one has a higher number of clockwise galaxies and the opposite hemisphere has a higher number of counterclockwise galaxies. That analysis, however, is simplified by ignoring the declination of the galaxies and the nonuniform distribution of the galaxy population imaged by the different sky surveys.

Following Longo [53], to test whether the distribution of the spin directions of the galaxies exhibits a dipole axis, statistics [82] was used to fit the galaxies in the datasets into the cosine of their angular distance from all possible integer combinations. That was done by first assigning the galaxies with their spin direction , which was 1 if the spin direction of the galaxy is clockwise, and -1 if the spin direction of the galaxy is counterclockwise. For each combination, the angular distances between all galaxies in the dataset and were computed.

Then, the cosines of the angular distances were fitted into , where is the spin direction of the galaxy. The computed from each integer combination was determined bywhere is the spin direction of the galaxy (1 for clockwise and -1 for counterclockwise) i, and is the angular distance between galaxy i and .

To measure the statistical significance of the possible axis at , the was also computed 1000 times such that in each run the galaxies were assigned with random spin directions. Using the from 1000 runs, the mean and standard deviation of the when the spin directions are random was computed. Then, the statistical signal can be determined by

The difference between the computed with the real spin directions and the mean computed with the random spin directions was used to determine the of the fitness to occur by chance in each combination. A detailed description of the analysis can be found in [54, 6063].

Figure 4 shows the probabilities of a dipole axis in different coordinates, as defined by Equation 2. The figure shows a Mollweide projection of the computed in all integer combinations by applying Equation 2 to all possible . The most likely axis is identified at , with probability of 3.7 to occur by chance. The 1 error of that axis is for the RA, and for the declination. Interestingly, the peak of the axis is nearly identical to the location of the CMB Cold Spot, at around .

While the proximity of the most likely axis to the CMB Cold Spot can definitely be considered a coincidence, the distribution of a large number of galaxies shows a statistically significant nonrandom distribution that forms a large-scale dipole axis. The observed presence of such Hubble-scale axis in the light of existing observations and current cosmological theories is discussed in Section 6.

Figure 5 shows the for all integer combinations when the galaxies are assigned with random spin directions. That is the same analysis shown in Figure 4, but the initial set of galaxies is assigned with random spin directions. As expected, the analysis showed no significant dipole axis when the galaxies are assigned with random spin directions. The strongest dipole axis had statistical significance of 0.81 . That can be considered a control experiment, showing that the signal is present when the galaxies are assigned with their real spin directions but becomes statistically insignificant when the spin directions are random.

The data used to identify the most likely dipole axis were combined from several different sky surveys. By separating the data from each telescope, it is possible to test whether the axis is consistent across different instruments [65]. Table 9 shows the results of applying the analysis to the data from each sky survey separately. As the table shows, the RA of the most likely dipole axes is aligned across all datasets and well within the 1 error from each other. The differences in the declination are somewhat larger, but still within the 1 error. Because an Earth-based telescope can be either on the Northern or the Southern hemisphere, the declination range in the dataset of each telescope is not as broad as the RA range, and therefore, the error in the declination is expected to be larger than the error in the RA. Figure 6 shows the probabilities of a dipole axis in the different coordinates in the four sky surveys.

3.2. Analysis of a Quadrupole Axis in the Distribution of Galaxy Spin Directions

Similarly to the analysis shown in Section 3.1, the data were fitted into quadrupole alignment. That was done in the same manner of the dipole axis analysis, but by fitting to . Therefore, the analysis was the same as the analysis described in Section 3.1, but when replacing Equation 1 withwhere is the angular distance between galaxy i and , and is the spin direction of galaxy .

Figure 7 shows the probability of a quadrupole axis computed at different coordinates. The analysis showed that the most probable quadrupole axes are at , with 2.9 , and at , with probability of 3.2 . Figure 8 shows the same analysis such that galaxies are assigned with random spin directions, with a most probable axis of 0.76 . Figure 9 shows the same analysis with Pan-STARRS, SDSS, HST, and DECam data separately. HST has a very small footprint and therefore could not be used effectively to analyze a quadrupole to show more than one peak.

3.3. Analysis of Galaxy Spin Directions around the Location of the CMB Cold Spot

The analysis of a dipole axis done in Section 3.1 shows that the dipole axis peaks at close proximity to the location of the CMB Cold Spot, centered at around . While the alignment between the peak and the CMB Cold Spot can definitely be coincidental, the nature of the CMB Cold Spot is still poorly understood. Since both CMB and the spin directions of galaxies correlate with the initial conditions of the early universe [51], a link between the CMB Cold Spot and galaxy spin should not be ruled out.

To test the distribution of the galaxies around that part of the sky, the number of galaxies that spin clockwise was compared to the number of galaxies spinning counterclockwise in all telescopes. Since the CMB Cold Spot is relatively small, using just galaxies that appear in the field of the CMB Cold Spot will not provide a sufficient number of galaxies to make the comparison. Also, SDSS and Pan-STARRS do not have a very large galaxy population around that part of the sky. To use a larger field, the sky region centered at the CMB Cold Spot was used. Table 10 shows the number of clockwise and counterclockwise galaxies in each sky survey. Naturally, the HST dataset cannot be used for the analysis since there are no galaxies in that field that were imaged by that sky survey.

As the table shows, all sky surveys show a higher number of clockwise galaxies in that part of the sky. The asymmetry observed with SDSS and Pan-STARRS is not statistically significance but also does not conflict with the asymmetry observed in the DECam data. SDSS shows difference that is marginally significant, with . However, that can also be due to the fact that SDSS and Pan-STARRS have much less galaxies in that part of the sky. When combining SDSS and Pan-STARRS, the probability to have that asymmetry or stronger by chance is 0.046. While these results do not allow making a definite conclusion about a link between the CMB Cold Spot and galaxy spin directions, they provide a certain indication that can be explored by future empirical or theoretical studies.

4. Possible Errors

One explanation to the observation would be an error in the analysis. This section discusses and explains several possible errors and shows that an error is unlikely.

4.1. Error in the Galaxy Annotation Algorithm

An error in the annotation algorithm can obviously lead to asymmetry. However, multiple indications show that the asymmetry cannot be the result of an error in the classification algorithm. The algorithm is a model-driven symmetric algorithm with clear rules. It is not based on complex data-driven rules used by machine learning systems, which are virtually impossible to verify their symmetricity [83]. An experiment was performed by mirroring the galaxy images by using the flip command in the ImageMagick image analysis toolbox. As expected, mirroring the galaxies led to inverse asymmetry compared to the analysis with the original images.

Another evidence that the asymmetry is not driven by an error in the annotation algorithm is that the asymmetry changes between different parts of the sky and inverse between opposite hemispheres. Since each galaxy is analyzed independently, a bias in the annotation algorithm is expected to be consistent throughout the sky, and it is not expected to flip in opposite hemispheres. The downloading of the images and the automatic analysis of the images were all done by the same computer, to avoid unknown differences between computers that can lead to bias or unknown differences in the way galaxy images are analyzed.

Due to the theoretical and empirical evidence that the algorithm is symmetric, an error in the galaxy annotation is expected to impact clockwise and counterclockwise in a similar manner. If the galaxy annotation algorithm had a certain error in the annotation of the galaxies, the asymmetry A can be defined bywhere is the number of galaxies spinning clockwise incorrectly annotated as counterclockwise and is the number of galaxies spinning counterclockwise incorrectly annotated as spinning clockwise. Because the algorithm is symmetric, the number of counterclockwise galaxies incorrectly annotated as clockwise is expected to be roughly the same as the number of clockwise galaxies misclassified as counterclockwise, and therefore, [63]. Therefore, the asymmetry A can be defined by

Since and cannot be negative, a higher rate of incorrectly annotated galaxies is expected to make A lower. Therefore, incorrect annotation of galaxies is not expected to lead to asymmetry and can only make the asymmetry lower rather than higher.

An experiment [63] of intentionally annotating some of the galaxies incorrectly showed that even when an error is added intentionally, the results do not change significantly even when as many as 25% of the galaxies are assigned with incorrect spin directions, as long as the error is added to both clockwise and counterclockwise galaxies [63]. However, if the error is added in an asymmetric manner, even a small asymmetry of 2% leads to a very strong asymmetry and a dipole axis that peaks exactly at the celestial pole [63]. Figure 10 shows the results of analysis of SDSS galaxies after adding an artificial error of 2%, meaning that a random 2% of the galaxies are assigned with clockwise spin direction regardless of their real spin direction. The signal becomes immediately very strong and peaks exactly at the celestial pole.

It should be mentioned that in one of the datasets used here, which is the dataset acquired by HST, the annotation was done manually, and without using any automatic classification. The galaxies imaged by HST were annotated manually, and the results are in agreement with the automatic annotation of galaxies imaged by SDSS, Pan-STARRS, and DECam.

4.2. Bias in the Sky Survey Hardware or Photometric Pipeline

Autonomous digital sky surveys are some of the more complex research instruments and involve sophisticated hardware and software to enable the collection, storage, analysis, and accessibility of the data. It is difficult to think of an error in the hardware or software that can lead to asymmetry between the number of clockwise and counterclockwise galaxies, but due to the complexity of these systems, it is also difficult to prove that such error does not exist. That possible error is addressed here by using four different completely independent systems. DECam, SDSS, Pan-STARRS, and HST are completely independent from each other and have different hardware and different photometric pipelines. As it is unlikely to have such bias in one instrument, it is very difficult to assume that all of these four instruments have such bias, and the profile of the bias is consistent across all of them.

4.3. Cosmic Variance

The distribution of galaxies in the universe is not completely uniform. These subtle fluctuations in the density of galaxy population can lead to “cosmic variance” [84, 85], which can impact measurements at a cosmological-scale [8688].

The probe of asymmetry between galaxies spinning in opposite directions is a relative measurement rather than an absolute measurement. That is, the asymmetry is determined by the difference between two measurements made in the same field and therefore should not be affected by cosmic variance. Any cosmic variance or other effects that impact the number of clockwise galaxies observed from Earth are expected to have a similar effect on the number of counterclockwise galaxies.

4.4. Multiple Photometric Objects at the Same Galaxy

In some cases, digital sky surveys can identify several photometric objects as independent galaxies, even in cases they are part of one larger galaxy. In the datasets used here, all photometric objects that are part of the same galaxy were removed by removing all objects that had another object within 0.01. An exception is the HST galaxies, which are closer to each other due to the size of the field but were inspected manually.

However, even if such objects existed in the dataset, they are expected to be evenly distributed between galaxies that spin clockwise and galaxies that spin counterclockwise and therefore should not introduce an asymmetry. Experiments by using datasets of galaxies assigned with random spin directions and adding artificial objects to the galaxies showed that adding objects at exactly the same position of the original galaxies does not lead to signal of asymmetry [63].

The experiments were made by using SDSS galaxies and assigning the galaxies with random spin directions. Then, gradually adding more objects with the same location and spin directions as the galaxies in the original dataset and the new artificial galaxies were assigned with the same spin direction as the galaxies in the original dataset [63]. Adding such artificial galaxies did not lead to statistically significant signal.

4.5. Atmospheric Effect

There is no known atmospheric effect that can make a galaxy that spin clockwise appear as if it spins counterclockwise. Also, because the asymmetry is always measured with galaxies imaged in the same field, any kind of atmospheric effect that affects galaxies the spin clockwise will also affect galaxies that spin counterclockwise. Therefore, it is unlikely that a certain atmospheric effect would impact the number of clockwise galaxies at a certain field but would have different impact on galaxies spinning counterclockwise. In any case, one of the datasets used here is made of galaxies imaged by the space-based Hubble Space Telescope and are therefore not subjected to any kind of atmospheric effect.

4.6. Backward Spiral Galaxies

In rare cases, the shape of the arms of a spiral galaxy is not an indication of the spin direction of the galaxy. An example is NGC 4622 [89]. A prevalent and systematically uneven distribution of backward spiral galaxies might indeed lead to asymmetry between the number of galaxies spinning clockwise and the number of galaxies spinning counterclockwise. For instance, if a relatively high percentage of galaxies that actually spin clockwise are backward spiral galaxies, it would have led to an excessive number of galaxies that seem to be spinning counterclockwise.

However, backward spiral galaxies are relatively rare. Also, these galaxies are expected to be distributed equally between galaxies that spin clockwise and galaxies that spin counterclockwise, and there is no indication of asymmetry between backwards spiral galaxies. Therefore, according to the known evidence, there is no reason to assume that the observations shown here are driven by backward spiral galaxies. The same can also apply to multispin galaxies [90], which are also rare and should be equally distributed between both spin directions.

5. Previous Work Showing Different Conclusions

While several previous studies mentioned in Section 1 provided results suggesting that the large-scale distribution of galaxy spin directions is not necessarily random, other studies used similar approaches to reach opposite conclusions. It should be remembered that the null hypothesis is that the distribution of galaxy spin directions is random and therefore could lead to the common bias known in science as “confirmation bias” [91]. This section analyzes these studies to identify reasons for the differences.

An early attempt that showed random distribution was made by Iye and Sugai [67]. In the absence of high-throughout digital sky surveys at the time, the analysis was based on a relatively small dataset of 6.5 K galaxies. When assuming asymmetry of 1% as shown here, 27,000 galaxies are needed to provide a one-tailed value of 0.05. Even when assuming 2% asymmetry, 7,000 galaxies are needed to provide one-tailed binomial distribution probability of P 0.048. Therefore, a dataset of 6 K galaxies is too small to provide a statistically significant observation of the asymmetry.

Another study that used manual annotation of galaxies was based on crowdsourcing done by unprofessional volunteers through Galaxy Zoo [69]. The approach had the advantage of using a large number of volunteers to increase the bandwidth of the annotation. Its main downside was that the annotations were subjected to human bias [69]. That led to inaccuracy of the annotations, but more importantly, the bias of the annotations was systematic. Because the attempt to use crowdsourcing for that task was first of its kind, the presence and dominance of the perceptual bias was not known when the experiment was designed, and therefore, the galaxy images were not mirrored randomly to offset for the bias.

After applying a process of data correction by mirroring the images of a small subset of the galaxies, the results using the mirrored and original galaxy images showed an asymmetry of 1%–2%, which can be seen in Table 2 in [69] that summarizes the results of the small subset of galaxies that were corrected for the human bias by mirroring the galaxies. The table shows that when mirroring the galaxies, the number of galaxies annotated as counterclockwise was reduced by 1.5% (from 6.032% counterclockwise galaxies to 5.942% mirrored clockwise galaxies), while the number of galaxies annotated as clockwise increased by 2% (from 5.525% clockwise galaxies to 5.646% mirrored counterclockwise galaxies). That asymmetry is similar in direction and magnitude to the asymmetry shown in [62]. The observation reported in [62] is the most suitable comparison since it also analyzed SDSS galaxies with spectra, and therefore, the footprint and distribution of the galaxies is similar to [69].

Due to the corrections of the human annotators, the number of galaxies used in [69] for the analysis became much smaller than the initial number of galaxies, and the asymmetry was determined to be statistically insignificant. However, the results also do not disagree with the results shown with SDSS galaxies here and in [62]. The magnitude and direction of the asymmetry observed with Galaxy Zoo data are aligned with the results observed with SDSS data used here, although there is no statistical significance to neither accept nor reject that agreement. It has also been proposed that nonrandom distribution of the spin directions of the galaxies annotated by Galaxy Zoo cannot be ruled out [92].

A study that used automatic annotation of the spin directions of spiral galaxies was by Hayes et al. [66]. The abstract suggests that “when viewed across the entire GZ1 sample (and by implication, the Sloan catalog), the winding direction of arms in spiral galaxies as viewed from Earth is consistent with the flip of a fair coin.” That conclusion certainly conflicts with the results shown here. To understand the reason for that conflict, one might need to pay close attention to the details of the experimental design. The explanation to the absence of asymmetry can be explained by one sentence in Section 4.1, which explains the implementation of the annotation algorithm used to determine the spin direction of the galaxies: “we choose our attributes to include some photometric attributes that were disjoint with those that Shamir [56] found to be correlated with chirality, in addition to several SPARCFIRE outputs with all chirality information removed.”

That is, to create a machine learning algorithm that can determine the spin direction of galaxies, Hayes et al. [66] removed attributes that correlate with the spin direction asymmetry that were reported in [56]. Naturally, when removing specifically the attributes that correlate with the asymmetry in spin direction, the machine learning algorithm produced a dataset that is fully symmetric and aligned with random distribution of the spin directions.

The attributes that correlate with galaxy spin direction asymmetry identified in [56] do not have an obvious direct link to galaxy spin direction. Hayes et al. [66] do not provide a scientific motivation for removing these attributes, and it seems that the decision to remove them was observational rather than scientific, with the goal of removing “bias.” Ignoring specifically the attributes that correlate with galaxy spin direction asymmetry naturally removed the asymmetry and led to a system that provided a dataset that showed no asymmetry. That experiment, however, could be biased by the selection of the attributes.

When using all attributes, the asymmetry between the number of clockwise and counterclockwise galaxies was with statistical significance of 2.52 , as specified in Table 2 in [66]. That distribution is not necessarily random and in fact agrees with the results shown here more than it agrees with the null hypothesis. These results are also in agreement with previous analysis of SDSS galaxies as reported in [62].

As explained in [66], the random distribution was observed only after removing the specific attributes that are known to correlate with the asymmetry between clockwise and counterclockwise galaxies. Ignoring these attributes naturally led to a random distribution, but since certain specific attributes were intentionally ignored, that distribution may or may not reflect the distribution of the galaxy spin directions in the real sky. In any case, the experimental design according which all attributes that are known to reflect asymmetry between clockwise and counterclockwise galaxies are ignored naturally leads to an algorithm that produces a randomly distributed dataset. When not removing these attributes, the observed distribution was 2.52 , which is not necessarily random.

Another study that showed opposite results used the dataset of [59] and suggested that the asymmetry is the result of “duplicate objects” in the dataset [68]. When removing the “duplicate objects” to create a “clean” dataset, the signal drops to 0.29 . As the abstract claims “the actual dipole asymmetry observed for the “cleaned” catalog is quite modest,  = 0.29.”

However, the dataset used in [59] was used for photometric analysis. No claim for the presence or absence of any kind of dipole axis was made in [59], and no such claim about that dataset was made in any other paper. When using that dataset for analyzing the distribution of the galaxy population, photometric objects that are part of the same galaxy become “duplicate objects.” However, as mentioned above, there was no attempt to study the presence or absence of a dipole axis with that dataset [59], and no claim about any kind of dipole axis formed by that dataset was made in [59] or in any other paper.

However, the more interesting question is why a “clean” dataset showed random distribution of the galaxy spin directions. The answer can be found in a sentence in Section 3 of the paper: “The second sample we studied is a volume-limited sample retaining 111,867 spirals with measured redshift [93] in the range 0.01 z 0.1.”

As also explained in [64], the signal of 0.29 reported in the abstract of the paper was observed with the “second sample,” where the redshift of the galaxies is limited to z 0.1. As shown in [60, 62], when limiting the redshift to lower redshift ranges of z 0.15, the distribution of galaxy spin directions is random. That is also shown in Tables 3, 5, 6, and 7 in [62]. These tables show random distribution in lower redshift ranges. Therefore, the random distribution in reported in [68] is completely expected and in full agreement with previous work [60, 62]. The paper [68] does not provide a scientific motivation for limiting the redshift to 0.1.

More importantly, the “measured redshift” used to determine the dipole axis is in fact the photometric redshift from the catalog of [93]. In the analysis shown in this paper and in all previous work, the position of each galaxy is determined by its RA and declination, which are considered accurate measurements. In [68], however, the analysis is three dimensional, and the position of each galaxy is determined by its RA, declination, and distance. The distance of each galaxy is computed by , where c is the speed of light, is the Hubble Constant, and is the redshift of galaxy i. Because the vast majority of the galaxies in that dataset do not have spectra, the distance was determined by using the photometric redshift. The photometric redshift is a highly inaccurate, ambiguous (in the sense that one galaxy can have multiple different photometric redshifts), and systematically biased. The error of the photometric redshift used for the analysis is 18.5% [93], which is far greater than the 1–2% signal of asymmetry reported here and in previous work. The very substantial error of the photometric redshift is therefore expected to weaken the signal. Because the photometric redshift is determined by complex pattern recognition rules, the systematic bias of the photometric redshift might also impact the results in a manner that is difficult to predict. For these reasons, the photometric redshift is not a sound probe for analyzing subtle anisotropies in the large-scale structure. The 3D analysis such that the distance was determined by using the photometric redshift is therefore expected to lead to random distribution. Indeed, all results shown in [68] are completely different from the results shown here or in all previous work. For instance, the “unclean” dataset showed a dipole axis with statistical strength of 4.00 when the dataset was limited to Iye et al. [68], while the only previous attempt to limit to lower redshifts showed no statistically significant dipole axis in that redshift range [62].

When using the photometric redshift for determining the position of the galaxies, the observed signal when not limiting the redshift range is 1.29 [68]. Since the photometric redshift is highly inaccurate and in fact its inaccuracy is far greater than the expected signal, it is likely that the photometric redshift leads to a substantially weaker statistical signal. Indeed, an analysis by the National Astronomical Observatory of Japan showed that when using basic statistics where the photometric redshift is not used, the distribution of the galaxy spin directions in that dataset is not random [94]. The analysis is as follows.

Table 11 shows the number of clockwise and counterclockwise SDSS galaxies in the hemisphere centered at RA = 160 and in the opposite hemisphere in the exact same dataset used in [68]. Statistically significant signal is observed in the hemisphere centered at 160. The asymmetry in the opposite and less populated hemisphere is not statistically significant. However, because it has more counterclockwise galaxies than clockwise galaxies, it is also not in conflict with the distribution in the hemisphere centered at (RA = 160) for forming a dipole axis. Because there are two hemispheres, the two-tailed probability needs to be corrected to 0.01. That simple analysis provides certain evidence that the distribution in the specific dataset of SDSS galaxies used in [68] might not be random.

A Monte Carlo simulation was applied such that each galaxy was assigned with a random spin direction, and a search was applied to test whether any two hemispheres have an asymmetry described in Table 11 or stronger. Out of 10,000 runs, 155 runs provided a distribution that could be divided into two hemispheres with equal or stronger asymmetry to the two hemispheres shown in Table 11. That also shows results that might not necessarily be random.

Figure 11 shows the statistical significance of the dipole axis from each possible pair of integer when using the exact same dataset used by [68], but without using the photometric redshift to determine the position of each galaxy. The dataset contains 72,888 galaxies and available at https://people.cs.ksu.edu/∼lshamir/data/assym_72k/. The most likely location of the dipole axis is identified at , and the statistical signal of the axis is 2.16 . That statistical signal is not necessarily random and does not conflict the observations shown in Section 3. Since the dataset of [59] contains bright galaxies (i magnitude 18, Petrosian radius 5.5’), it is expected that these galaxies are also of lower redshift, and therefore, the asymmetry is expected to be weaker as shown in [62]. In any case, the statistical strength of the asymmetry is and cannot be considered necessarily random.

Figure 12 shows the likelihood of the dipole axis when the galaxies are assigned with random spin direction, showing much lower probability of .

These results also can be also compared to the results when using the dataset that [66] used to determine and remove attributes that correlate with the galaxy spin direction asymmetry. The dataset contains 13,440 galaxies that were annotated manually and available at https://people.cs.ksu.edu/∼lshamir/data/assym. Figure 13 shows the probabilities of a dipole axis to peak at the different combinations. The figure shows a similar profile to the profile shown in Figure 11 and nonrandom distribution.

6. Discussion

Autonomous digital sky surveys powered by robotic telescopes have allowed the collection of unprecedented amounts of astronomical data, enabling to address research questions that were not addressable in the pre-information era. The question addressed here is the large-scale distribution of the spin directions of spiral galaxies as observed from Earth. Multiple previous experiments have shown that the distribution of spin directions of spiral galaxies as observed from Earth might not be random and might form patterns at scales far larger than any known cluster or supercluster [49, 50, 5365]. Analysis of galaxies with spectra and separating the galaxies to different redshift ranges showed that asymmetry is weak at low redshifts but increases gradually as the redshift gets higher [62, 65].

This study shows the most comprehensive and largest analysis of its kind to date. The analysis uses several different telescopes systems with different photometric pipelines. The analysis covers the Northern hemisphere and the Southern hemisphere and uses both space-based and ground-based instruments. Each dataset is analyzed independently, and without using any assumptions from other datasets. While the telescopes cover different parts of the sky, based on completely different hardware, use different photometric pipelines, and the data were annotated using different methods, all telescopes show very similar patterns of the asymmetry. The agreement in the results from the different telescope systems and the different parts of the sky provides an indication of consistency, showing that the observations do not necessarily depend on a specific dataset. As discussed in Section 1, other datasets collected in the past four decades also showed asymmetry, although the small size of these datasets did not allow to profile the distribution within statistical significance. Section 5 analyzed studies that suggested random distribution and showed that these results do not necessarily conflict with nonrandom distribution.

Some first attempts to study the distribution of the spin directions of galaxies were based on manual annotation of the galaxies [53, 69], showing evidence of nonrandom distribution. By using a fully symmetric algorithm, much larger databases can be analyzed without the possible effect of human perception [5456, 6062]. The application of the automatic annotation to data acquired by several different telescopes showed similar profiles of distribution. The distribution can be fitted to dipole or quadrupole alignment with probability far higher than mere chance.

Studies with smaller datasets of galaxies showed nonrandom spin directions of galaxies in filaments of the cosmic web [48, 95, 96]. Other studies showed alignment in the spin directions even when the galaxies are too far from each other to interact gravitationally [49, 50], unless assuming modified Newtonian dynamics (MOND) gravity models that explain longer gravitational span [9799]. It should be mentioned that the physics of galaxy rotation is still not fully understood, and it is still not clear why and how galaxies spin. While the common theory that can explain the anomaly in the galaxy rotation curve [100] is the existence of dark matter, there is still no certain proof that dark matter indeed exists [101]. More recent observations showed that cosmic filaments also spin, and the origin of their spin can be explained by angular momenta originating from the universe initial conditions [102].

Other observations of large-scale alignment in spin directions were observed with quasars [103]. Position angle of radio galaxies also showed large-scale consistency of angular momentum [104]. These observations agree with observations made with datasets such as the Faint Images of the Radio Sky at Twenty-centimetres (FIRST) and the TIFR GMRT Sky Survey (TGSS), showing large-scale alignment of radio galaxies [105, 106]. Large-scale clustering suggesting evidence for axis alignment was also observed in Fermi blazars [107].

In addition to the empirical observations, simulations of dark matter also showed links between spin directions and the large-scale structure [108110]. The magnitude of the correlation has been associated with the color and stellar mass and the galaxies [111], and that association was linked to halo formation [112], leading to the contention that the spin direction in the halo progenitors is related to the large-scale structure of the early universe [113].

The large-scale analysis of spin directions done here shows evidence of dipole and quadrupole large-scale alignment. The results with DECam data agree with previous results using [6163]. The observation of a large-scale axis has been proposed in the past by analyzing the cosmic microwave background (CMB), with consistent data from the Cosmic Background Explorer (COBE), Wilkinson Microwave Anisotropy Probe (WMAP), and Planck [6, 114119]. Observations also showed that the axis formed by the CMB temperature is aligned with other cosmic asymmetry axes such as dark energy and dark flow [118]. Other notable statistical anomalies in the CMB are the quandrupole-octopole alignment [120124], the asymmetry between hemispheres [4, 117, 125], point-parity asymmetry [126, 127], and the CMB Cold Spot. If these anomalies are not statistical fluctuations [128], they can be viewed as observations that disagree with CDM [129].

The most likely dipole axis identified using the spin directions of galaxies shown here peaks at very close proximity to the CMB Cold Spot. While that can be coincidental, it can also indicate on a certain link between the CMB distribution and the distribution of galaxy spin directions. The nature of the CMB Cold Spot is still a mystery. It is statistically significant [21], consistent across different instruments, and cannot be explained by foreground contamination [130], but there is still no clear explanation to its existence. One possible explanation is a supervoid in that part of the universe [131], but observations have shown no evidence of unusual distribution of galaxy population around the location of the CMB Cold Spot [23, 132]. Here, the CMB Cold Spot is aligned with an axis formed by the distribution of the spin directions of spiral galaxies. It should be mentioned that a link between cosmic vacuum and galaxy rotation has also been proposed [133].

The concept of a cosmological-scale axis has been proposed through theories related to the geometry of the universe such as ellipsoidal universe [2, 3134]. An ellipsoidal universe is not expected to be isotropic, and the anisotropty is expected to exhibit itself in the form of cosmological-scale quadrupole [25]. Another cosmological model relies on the existence of a cosmological-scale axis is the rotating universe [3540, 134136]. The existence of a cosmological-scale axis has also been linked to theories such as holographic big bang [41, 42].

Black hole cosmology [4345, 137, 138] can also explain the existence of a cosmological-scale axis. Since stars spin, black holes also spin based on the spin of the stars from which they were created [139]. If the universe is hosted in a black hole, the universe should have a preferred direction inherited from its host black hole [138, 140, 141], which would exhibit itself in the form of an axis [142]. Such black hole universe might not be aligned with the cosmological principle [143] but can explain other observations such as dark energy and the agreement between the Hubble radius and the cosmological Schwarzschild radius.

A possible universal pattern of galaxy spin directions can be related to the proposed existence of a Universal force field [144]. The observation that galaxies in opposite lines of sight show opposite spin directions also agrees with cosmology driven by longitudinal gravitational waves [145], according which each galaxy at a certain distance from Earth is expected to have an antipode galaxy under the same physical conditions, but accelerating oppositely [145]. This model also agrees with the greater asymmetry observed in the earlier universe [62].

The ability to analyze a possible nonrandom distribution of the spin directions of spiral galaxies is a research question that its studying was not practical in the pre-information era. As evidence for such nonrandom distribution is accumulating, additional research will be needed to fully understand its nature and match it with other probes in addition to CMB.

Data Availability

The datasets from SDSS and HST used in this study are available freely from the URLs specified in the manuscript. Other datasets generated during this study are available on reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The research was supported in part by NSF grants AST-1903823 and IIS-1546079.