In this paper, we present a complete framework, both technical and conceptual, aimed at developing and analysing Networked Music Systems. After a short description of our technical framework called soundworks, a JavaScript library especially designed for collective music interaction using web browser of mobile phones, we introduce a new conceptual framework, we named interaction topologies, that aims at providing a generic tool for the description of interaction in such systems. Our proposition differs from the theoretical approaches generally proposed in the literature by decoupling the description of interaction topologies from the low level technical implementation of the network. We then report on a set of scenarios and prototypes, illustrating and assessing our framework, which were successfully deployed in several public installations and performances. We particularly show that our concept of interaction topologies succeeds at describing and analysing global aspects of interaction from multiple point of views (e.g., social, human-computer) by allowing for composing simple abstract figures. We finally introduce a discussion on agencies and perception of users engaged in such systems that could later complete our framework to conceive and analyse Networked Music Systems.

1. Introduction

The use of computer networks in music performances has a long history. The League of Automatic Composers in 1978 can be cited among the first ones, at least as one documented in the literature [1]. Since then, many tools and protocols necessary to transmit musical information have been developed to handle communication between devices, such as the ubiquitous MIDI [2] and OSC [3] protocols. In particular, the Internet infrastructure and its related protocols have enabled researchers and artists to create various ad-hoc music networks systems. More recently, the developments of web standards—with its numerous Application Programming Interfaces (APIs) [4] and its vast ecosystem of libraries—have significantly enhanced the rapid development of complex music network systems. As we will describe in this article, web technologies can be easily and favourably combined with mobile and miniature computer systems.

A number of authors have formalized musical computer networks [59] from a theoretical point of view and proposed various classifications. For example, Renwick in [9] has proposed a very broad definition of “Network music” as a “musical practice in which conceptual, technological, ideological, and/or philosophical concepts of the network are included in the design, composition, production, and/or performance process. The network may influence the work’s aesthetic, composition, production, or reception. The network may or may not be limited to electronic computerised networks”.

Several theoretical works on Network Music Systems have specifically focused on the case where users (performers, spectators) are located in different spaces. For example, Lazarro describes “Network Musical Performance (NMP) occur[ing] when a group of musicians, located at different physical locations, interact over a network to perform as they would if located in the same room” [10].

In this paper, we refer to the opposite cases where a group of people—small or large, expert or not—play, listen, and interact together in a shared space and time, systems that we qualify as situated. Therefore, we refer to our cases as Situated Networked Music Systems. Our approach is similar to Weinberg’s concept of Interconnected Musical Networks (IMN) [6, 11] that considers social interactions as key elements: “My definition for IMNs - live performance systems that allow players to influence, share, and shape each other’s music in real-time - suggests that the network should be interdependent and dynamic, and facilitate social interactions.” Furthermore, we propose to also consider any actor, human or not, performing or not, as a node in the network. Globally, according to the taxonomies proposed by Barbosa and Weinberg, every application we will describe here can be categorized as Local Musical Networks, Collective Creation Systems [5], or Real-Time Local Networks [6].

We implemented these concepts using mobile technologies (typically smartphone and tablet), relying on web standards—such as the relatively recent Web Audio API (https://www.w3.org/TR/webaudio/ accessed 29 November 2018)—and Wi-Fi network capabilities.

Our main contributions are thus threefold. First, we will describe a conceptual and technical framework based on web technologies aimed at developing Situated Networked Music Systems. We will particularly introduce a conceptual framework dedicated to the description of such network systems, we call interaction topologies. This is in contrast to a technological point of view often reported in the literature that is solely based on low-level information network. Our approach allows for consolidating current theoretical frameworks by decoupling the topological description from its low-level technical implementation aspects. Second, we report on a set of Situated Networked Music Systems, implemented with our technical framework, that illustrate the use of the interaction topologies we propose. Finally, we introduce a discussion on perception and agencies that could offer a complementary perspective, centered on users, to the proposed conceptual framework.

2. Conceptual and Technical Framework

In this section, we describe a framework dedicated to Networked Music Systems. The framework is composed of two complementary components:(1)A technical part based on web standards—the soundworks framework—for rapid prototyping of collective and interactive scenarios, similar systems have been described in [1214].(2)A theoretical part based on the concept of interaction topologies, aimed at describing and analysing such systems.

2.1. The Soundworks Framework

The scenarios of musical interaction explored in this research require that participants can spontaneously join an experience and interact within a distributed environment composed of numerous devices, such as smartphones. In order to enable short cycles in an iterative design process, the applications have to be rapidly prototyped. Moreover, they must be easily deployed to arbitrary audiences.

These constraints led us to create a prototyping environment based on web standards. Indeed, these technologies have the following qualities in our context:(i)Applications can be developed rapidly and immediately deployed on local or public networks.(ii)Participants can access applications with the web browser already installed on their smartphones connected through Wi-Fi or 3G.(iii)Web standards provide a number of APIs for interactive multimedia (e.g., audio synthesis, 2D and 3D rendering, motion sensors, geolocalization), and real-time networking [4].

Furthermore, web technologies allow for easily integrating additional devices as clients of our system [15, 16], enabling for a wide range of interaction and audiovisual rendering possibilities. From this perspective, the scenarios we discuss here can be described as networks of interactive audiovisual elements that are dynamically constituted or completed by the mobile devices of the participating audience (cf. Figure 1).

To support experimentation of a wide range of different scenarios, we developed a JavaScript framework, soundworks (https://github.com/collective-soundworks/soundworks License BSD-3-Clause, accessed 29 November 2018), that provides a set of services and abstractions for the most common requirements and functionalities of such applications. The framework is entirely based on web standards on the client side and uses Node.js (https://nodejs.org/en/ accessed 29 November 2018) on the server side. Since its very first version [17], the framework has been iteratively redesigned and became the basis of numerous applications.

A soundworks application typically consists of a set of synchronized web clients that connect to a server through a wireless network to exchange messages and data streams (see Figure 1). Depending on the context, an application may be deployed locally through a dedicated Wi-Fi network or over the Internet using existing Wi-Fi or 3G/4G infrastructures. The former allows for rapid iterations during development and test of an application, as well as a better control of bandwidth and latencies. However, the latter is more suitable for large scale events, especially outdoors.

The underlying philosophy of the framework is to provide a single place to write application specific code (i.e., the Experience), while being able to easily access predefined pieces of functionality (e.g., clock synchronization, preloading of sound files) by simply requiring a dedicated service. Among the numerous services and abstractions provided by the framework, the most important ones are:(i)clock synchronization between clients and server [18], similar to the Network Time Protocol (NTP) [19](ii)real-time messaging and data streaming based on WebSockets(iii)shared parameters state between clients and server(iv)synchronized scheduling of (audio) events [20](v)loading of sound files and related annotations(vi)simple abstractions for HTML views and 2D rendering.

Another important feature of the framework is also its ability to automatically manage the initialization of these interdependent processes—that may require communicating with the server to initialize (see Figure 2).

Figure 2 illustrates the initialization process of a typical application composed of a user-defined experience that uses three services dedicated to device initialization, synchronization (the sync service), and management of audio assets (the audio-buffer-manager service). The device initialization service that does not have any server-side counterpart is principally aimed at verifying that the client (e.g., smartphone’s browser) supports all the APIs required by the application, and at resuming audio rendering when a user gesture (e.g., a touch) is captured. In parallel, the audio-buffer-manager—responsible for loading sound files and annotations from the server—starts to request sound files to the server. The sync service, on the contrary, relies on the audio clock to work, as such it must wait for the ready event of the platform service to start the synchronization process with the server. When all services have fired their ready event, the application specific code can start safely.

Alongside with the framework, an application template is also available (https://github.com/collective-soundworks/soundworks-template License BSD-3-Clause, accessed 29 November 2018). This template contains all the boilerplate code and generic configuration necessary to the framework. As such, it provides a simple and structured way of accessing the APIs exposed by the framework and thus allows for starting the development of a new application in a few minutes.

2.2. Interaction Topologies

An important problem of existing approaches regarding analysis of topologies in Networked Music Systems comes with the idea that the “social organization of the network, an abstract, high-level notion, is addressed by designing and implementing the lower-level aspects of the network’s topology and architecture” [6]. Regarding our technical framework—which is entirely based on a centralized server—this statement would imply reciprocally the impossibility to design scenarios and interactions outside from a star topology (the flower topology in Weinberg’s terminology). To overcome this technological orientation concerning topologies, we introduce here the concept of interaction topologies. This approach aims at proposing a set of basis figures that can be used to describe several levels of interactions without focusing solely on technical aspects. As such, it proposes to describe networks of relations between entities (e.g., human, technical artifacts) without any a priori hierarchy on their agencies. Furthermore, the deliberate simplicity and genericity of the proposed graphs seek to promote their reuse for descriptions in multiple dimensions (e.g., time, space, and information flow) and, thus, emphasize the decoupling of the description of interactions from their underlying technical implementation. Indeed, while some of the abstractions provided by soundworks can support and ease the implementation of these figures in multiple ways, there is no one-to-one correspondence between the provided APIs and the figures presented here.

While numerous formal graphical notations dedicated at precisely modeling systems (or some of their components) have been proposed in the HCI and CS communities (e.g., petri nets, statecharts [21], or, more recently, the interface relational graph system [22]), our aim is to propose a complementary and high-level perspective that tries to stress the similarities between the described systems rather than their specificities.

Figure 3 shows the set of six figures—the disconnected graph (a), the unidirectional circular graph (b), the bidirectional circular graph (c), the centrifugal star graph (d), the centripetal star graph (e) and the forest (f)—that we propose. These graphs represent the actual possible interaction between each entity, human and technical artifacts. Importantly, they do not correspond to the representation of low-level information transmission through the network, as represented in Figure 1.

Our guess is that this minimal set could be sufficient for describing, analysing, and classifying Networked Music Systems from several perspectives. In the next section, we precisely describe a series of examples illustrating different interaction topologies.

3. Scenarios and Prototypes

In this section, we describe a set of experiments and scenarios that have been explored and refined during our research. The choice of the scenarios presented here also aims at illustrating each of the interaction topology figures presented in Section 1. As we will see, an interesting aspect concerning the proposed figures—and perhaps a byproduct of their simplicity—is the possibility to describe a single system from multiple points of views by combining several figures.

3.1. Birds, Rain Sticks, and Monks

This application proposes a set of simple gesture-controlled instruments. The main objective of the application is to propose to participants a didactical and ecological approach to create multisource sonic environments by giving them access to simple instruments created around obvious metaphors (e.g., rain stick) and gesture interactions based on motion sensors (e.g., shake, orientation). Once the application is loaded on the web browser of each mobile, every participant can entirely act independently, corresponding thus to the disconnected graph topology (see Figure 4).

3.2. Drops

The Drops experience has been strongly inspired by the iOS application Bloom developed by Brian Eno and Peter Chilvers (http://www.generativemusic.com/bloom.html accessed 29 November 2018). Similarly to the Bloom application, the Drops application allows players to touch the screen of their mobile device to generate drops. Each generated drop is characterized by two complementary aspects of rendering: the trigger of a percussive and resonant sound through the device loudspeakers and a colored circle that grows at the touch position and fades away. According to the touch position, users can control simultaneously the pitch and the duration of the sound.

Unlike Bloom, Drops has been designed for an unlimited number of colocated participants playing together. The participants’ mobile devices are synchronized and each drop played by a participant is echoed on the device of two other participants’ devices before coming back to the original device. The delay and attenuation introduced in each echo produce an ever evolving and vanishing distributed texture among the participants. Additionally, each participant is associated with a specific color that allows for identifying his contributions as well as other participants’ contributions on his own screen.

In its situated version, where all the people are in the same location, Drops can be first described as a bidirectional circular graph topology.

We also created an online version of the application, where each participant is geolocalized and relationships between participants are created by minimizing the distance between each of them (i.e., an application of the salesman problem). For example, participants who are close are grouped and can play together. Persons of this subgroup will still remain connected to people in other groups located elsewhere. As shown in Figure 5, this can be described by superimposing the forest figure to the bidirectional circular graph.

3.3. Collective Loops

Collective Loops is an installation that allows up to eight users to collaboratively interact within a shared audio-visual environment using handheld devices [15]. Conceptually, the whole installation can be seen as an 8-step loop sequencer in which each participant embodies a single step of the sequence.

The installation is composed of two different interleaved and synchronized layers. At the local level, a participant’s mobile device acts both as a controller and as an audio source. As a controller, the device exposes two different modalities of interactions: the user can enable and disable particular notes of its synthesizer by touching the screen but also control the cutoff frequency of a lowpass filter by modifying the position of the device around its pitch axis.

Additional to the participants’ devices, the installation features a shared visual projection on the floor that reproduces the current state of the control of all devices. In addition to provide a way to place the participants in space, this shared representation also allows for enhancing collaborative aspects by giving participants information on one another’s actions. Finally, the projection offers a simple way to follow the advance of the step in the sequence.

As a form of step sequencer and from the point of view of the control and audio rendering, Collective Loops can be described as a unidirectional circular graph where a token advances according to a predefined time step. However, by considering the system from the point of view of the shared visual rendering projected on the floor—on which every participant contributes equally—we can as well describe the system as a centripetal star (see Figure 6). Furthermore, this shared rendering also creates a new way for participants to interact with one another (e.g., by creating visual figures such as circles or stairs). From this point of view, the system could also be seen as a centrifugal star topology. Hence, several layers of intertwined audio, visual, and social interactions could be described by the simple combination of three basis figures.

3.4. GrainField

GrainField requires the presence of an improvising performer (instrumentalist or singer) placed in the center of the audience seated around her/him.

In this experience, the performer is continually recorded by the system which creates every second an audio file of the two previous seconds of recording. This process occurs continuously from the beginning to the end of the improvisation. Each time a new audio file is created, it is sent to a random selection of the participants’ mobile (representing typically 10% of the audience). On the participants’ mobiles side, the received sound files are replayed in a granular synthesizer. Participants can scrub into the samples by waving their device to control the playback position. The screen of the device is only used to give additional feedback to users in two different ways: by displaying the current playback position of the synthesizer and by changing the background color each time a new sample is received.

Another client of the system—that is not seen by participants—allows for globally controlling synthesis parameters (e.g., grain duration, resampling) on every participant’s devices, in order to adapt to and/or reinforce some characteristics of the performance. The resulting global audio rendering can be described as a distributed granular echo of the sound material proposed by the performer, creating an ever evolving texture.

In term of topology, as the musical material created by the performer is distributed over the participants smartphones, the system can be described as a centrifugal star (see Figure 7). However, the performer is also influenced by the feedback received from this delayed and granularized material; therefore, from this other point of view, the topological description of the system could also be complemented by a centripetal star.

3.5. 88 Fingers

88 Fingers is a collaborative performance in which up to 88 participants perform on a automatized piano (i.e., a YAMAHA Disklavier) using their mobile devices. The performance plays with codes of the classical concert by keeping the scenography of a piano recital: the piano on stage while participants sit in the room.

At the beginning of the performance, each participant can choose one single key of the piano among the remaining ones (once a key has been chosen by a participant, it is no longer available for others). When the performance starts, participants can play their key for the duration of the performance by simply touching the screen of their mobile phone. The graphical interface allows for controlling only two parameters: pressing the key of the piano by touching the screen with a velocity that corresponds to the vertical position of the touch.

The experience is built around ideas of “freedom and responsibility,” by not adding any additional rules to the system (computational or verbal). From an interaction point of view, it corresponds to the centripetal star graph, where each participant acts towards a single element, the piano (see Figure 8). Hence, it can be seen as the reverse of GrainField in terms of interaction topology.

3.6. ProXoMix

ProXoMix is an installation where participants, equipped with mobile devices connected to earphones, interactively remix a piece composed of complementary loops by moving physically in the space. In this installation, each participant embodies a predefined track that can be chosen through a dedicated interface. Once inside a track, the participant can modulate its content with two complementary modalities: samples composing the track can be activated and deactivated by touching the screen and the cutoff frequency of a lowpass filter can be changed by tilting the device.

The principal interaction, however, consists in moving in the space to get closer to other participants. Indeed, when two or more participants get close enough from one another, they start to hear the track of their peers with their earphones. The installation engages participants to collaboratively mix the proposed tracks, creating thus social-musical assemblies and spontaneous choreographies.

In terms of topologies, the formation of small groups can be first seen as a forest topology. Nevertheless, the evolving forest that describes ProXoMix at the highest level can also be refined further. Indeed, each subgroup of the forest is characterized by the fact that all participants share the same audio rendering, creating thus the forest of centripetal stars illustrated in Figure 9.

4. Discussion and Conclusion

First, the examples we described were all shown several times, in different settings from public installations to concerts and performances. These public events demonstrated that each described system worked as planned from a technical point of view, and that the setup could directly scale for performances with up to 150 participants.

The interaction topologies we proposed offer one point of view we found useful for describing global view of interaction between the different elements of the systems, both human and technical. Such an approach is complementary compared to other approaches proposed in the literature [6].

Nevertheless, it is also interesting to assess a point of view based on the user experience. For this, we propose to take into account several properties, considering user degrees of freedom for action with the device and the musical constraints on the user actions, as well as the user perceived interaction and agencies. Table 1 shows a possible analysis for each of the proposed scenarios. The rating proposed here are based on our own experience of the systems as designers and discussions with participants. As such, our point here is not to propose a formal user evaluation for each, which would be out of the scope of this paper (interested readers can refer to [23] for such work concerning Collective Loops), but rather to propose a series of starting points for discussion and analysis.

These criteria provide complementary properties to the ones exhibited by the interaction topologies. The table illustrates that these criteria can also be used to distinguish between the different applications.

For example, in Birds and Drops, the systems have been designed to scale from small to large participating audience without technical modifications. We observed a shift in the way participants engage into the experience. Indeed, the perceived contributions to shared rendering and perceived interaction with other participants decrease as the number of participants engaged in the experience increases.

In 88 Fingers and ProXoMix, the perceived agencies might vary depending on musical materials. For example, very low or very high pitches are much easier to perceive in the collective improvisation in 88 Fingers. Similarly, some tracks in ProXoMix, such as drums or melodic tracks, are easier to perceive and have more musical impact compared to more discreet sound elements.

Interestingly, GrainField offers an example where the perceived contributions to the shared rendering remain clear while the possibilities offered by the system are really limited.

In summary, we have proposed a complete framework, both technical (open-source soundworks library) and conceptual (interaction topologies), aimed at developing and analysing Situated Networked Music Systems. We then presented a set of scenarios and prototypes developed using soundworks that illustrated each of the proposed topologies. Furthermore, we showed that the framework dedicated to interaction topologies can be used to describe our applications from multiple perspectives, confirming its qualities compared to approaches centered on technical aspects of the network topologies. We believe that the simplicity of the proposed approach and figures, which allows for combining several figures to describe a single application, could provide a powerful tool to describe and analyse a wider range of Networked Music Systems.

Still, a number of aspects of the framework can still be improved. On the technical side, while our platform has proved to be efficient for prototyping a wide range of application, an important work is currently performed to improve its accessibility to nonexpert programmers such as researchers and artists. On the theoretical side, and particularly concerning interaction topologies, these concepts should be assessed on a wider range of scenarios and applications. Also, a complementary work could be pursued to combine our concept of interaction topologies with a user perspective taking into account interaction perception and agencies.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


The work described in this article has been conducted in the context of the CoSiMa research project funded by the French National Research Agency (ANR) [ANR-13-CORD-0010]. We would like to thank our colleagues Jean-Philippe Lambert, Sébastien Robaszkiewicz, David Poirier-Quinot, and Victor Audouze, as well as our partners Orbe, EnsadLab, ESBA TALM–Le Mans, ID Scènes, and NoDesign, for their valuable contributions to this work.