Selected Papers from the International Conference on Information, Communication, and Engineering 2013View this Special Issue
A Karaoke System with Real-Time Media Merging and Sharing Functions for a Cloud-Computing-Integrated Mobile Device
Mobile devices such as personal digital assistants (PDAs), smartphones, and tablets have increased in popularity and are extremely efficient for work-related, social, and entertainment uses. Popular entertainment services have also attracted substantial attention. Thus, relevant industries have exerted considerable efforts in establishing a method by which mobile devices can be used to develop excellent and convenient entertainment services. Because cloud-computing technology is mature and possesses a strong computing processing capacity, integrating this technology into the entertainment service function in mobile devices can reduce the data load on a system and maintain mobile device performances. This study combines cloud computing with a mobile device to design a karaoke system that contains real-time media merging and sharing functions. This system enables users to download music videos (MVs) from their mobile device and sing and record their singing by using the device. They can upload the recorded song to the cloud server where it is merged with real-time media. Subsequently, by employing a media streaming technology, users can store their personal MVs in their mobile device or computer and instantaneously share these videos with others on the Internet. Through this process, people can instantly watch shared videos, enjoy the leisure and entertainment effects of mobile devices, and satisfy their desire for singing.
The vigorous development of information and communication technologies and the increased use of the Internet have led network technologies to be integrated into people’s daily lives. The proportion of people using mobile application services has dramatically increased following the popularization of mobile devices. Despite the global economic recession of recent years, major technological and telecommunication companies have maintained positive stances toward the smart phone and mobile business service markets, believing that these markets present substantial potential for development.
Recently, cloud-computing technology has developed rapidly, attracting considerable attention from numerous companies, enterprises, and users. This technology is highly effective in that it can handle computations for massively complex systems on the Internet, thereby enabling remote service providers to process a vast amount of information within a short period of time. Cloud computing exhibits excellent computing performance similar to that of a supercomputer.
Increasing numbers of people have focused on the mobile web for mobile devices. According to the Institute for Information Industry, a survey on the mobile Internet penetration rate among Taiwanese citizens showed that since 2010 the rate has increased rapidly, at a pace of more than 10% per year. In the first quarter of 2013, the penetration rate reached almost 50%, as shown in Figure 1.
Advancements in Internet technology and mobile devices have rendered video streaming one of the most popular application services. This technology offers services such as entertainment video sharing and live Internet television. Users can browse the channel directory to obtain information regarding the channel content and choose the content or the program they wish to view. Currently, numerous websites provide free or paid online video services (e.g., YouTube, Vimeo, and I’mTV), where users can watch videos wherever and whenever they desire.
Singing-related talent shows have received considerable attention worldwide. People can participate in the entire selection process from initial auditions through to the finals, during which each process is broadcast on television. Currently, popular large-scale singing talent shows include American Idol, The Voice, The Voice of China, and Taiwan’s One Million Star and Super Idol. The distribution of these shows has increased the viewership of associated television channels and become a common topic of discussion among various communities. The increased popularity of television talent shows has also potentially raised people’s interest in singing, prompting them to frequently organize social events with friends at karaoke establishments (hereafter referred to as KTV). According to a 2011 Survey on the Music Industry in Taiwan announced by the Bureau of Audiovisual and Music Industry Development, MOC in 2012, the revenue of the Taiwanese karaoke industry was estimated to be NT$ 843 million. For the majority of consumers, singing at KTVs is not only a form of leisure and entertainment but also an activity in which people can easily interact and socialize with others. Singing enables people to release their emotions and relieve stress. People generally practice singing techniques to achieve their goal of becoming a celebrity, and they select singing as their basic form of entertainment. Even without going to KTVs, people can casually hum or sing songs whenever they desire.
In an era where modern technologies are vigorously developing, establishing a method in which to integrate various forms of entertainment with fast-growing and commonly used mobile devices has raised considerable awareness. This study combines cloud computing and a mobile device to design a karaoke system that is integrated with real-time media merging and sharing functions, thereby entertaining users with a singing-related application service. This service was established based on the cloud-computing framework. The application program enables users to sing into their mobile devices, which simultaneously records and uploads their voice to the cloud server. In this process, the noises that interfere with the recording are eliminated. This system also allows them to merge their song recordings with a music video (MV) and share their personal MVs with others. Overall, users can create and store their personal MVs, which can be synchronized and shared with others on the Internet. This study provides the following contributions.(1)Users can attain entertainment goals without being limited by time and location with their mobile phone and Internet access. Through their cloud-computing-integrated mobile device, they can sing and share their creations with other users. (2)Audio and video media can be merged and immediately shared.(3)Users can merge their song recordings with an MV or with a self-developed video to create personal MVs.(4)Consequently, through their mobile devices, people can sing heartily without having to visit KTVs, experience the enjoyment of friends and relatives’ gatherings without having to attend, and feel the pleasure of being a singer.
2. Literature Review
2.1. Cloud Computing
Cloud computing has received the attention of numerous companies and users in recent years. By using this technology, users can store their data or application programs in cloud, from which they can download or share the stored data with others on the Internet. Essentially, cloud-computing service is accessible and available online regardless of the time and location [1–4].
Cloud computing excels in calculation processing in that it allows remote service providers to process a vast amount of information within a short period of time. Therefore, it has excellent computing performance similar to that of a super computer. Furthermore, through Internet connections, this technology facilitates the collaboration and services between the service provider and its clients. Currently, numerous companies are actively adopting the cloud-computing service technology for internal and external use to reduce costs and enhance the competitiveness of the given company.
2.2. Internet Video Sharing
Internet video sharing functions have increased in popularity and are incorporated into numerous applications. For example, YouTube and Vimeo are two well-known applications. Previously, Live Video was only provided by Internet service providers; presently, this application can be personalized, enabling users to instantly broadcast videos they wish to share with others on the Internet whenever and wherever they desire. It also provides audiences with real-time services. Because of this transformation, general citizens, celebrities, politicians, and business celebrities are able to share self-produced videos and interesting clips in their personal and video blogs.
2.3. Video Streaming Technology
The advancement of broadband Internet technology has prompted users to frequently use multimedia streaming services on the Internet [5–9]. The development of high-speed Internet has also rendered the provision of real-time multimedia services on the Internet feasible. Users no longer need to spend prolonged periods of time downloading an entire file or store large files in hard drives. By using the Internet, the server end can constantly transmit files, and, in turn, the user can receive the file while watching the video. Figure 2 presents the framework of the media streaming system, which can transfer media files or live broadcasts in the server. Popular media sharing websites adopt this type of streaming technology for online transmissions.
Real-time transport protocol (RTP) is a streaming communication protocol commonly used to control video and audio files . Real time streaming protocol (RTSP) is a communication protocol designed to remotely control multimedia playbacks . Additionally, it is a multimedia streaming protocol used to control audios and videos and is frequently used in conjunction with RTP. This study streamed data to instantly merge and share videos by using the RTSP approach, which enables the packet to transmit data without interruptions and allows users to smoothly render the audio and media content.
2.4. Noise Elimination
Digital signal processing is a critical aspect in information technology. Nowadays, people generally listen to music using a CD or an MP3 file format, which are classified as digital signals. Professional singers typically record and produce music in a fully equipped recording studio, which is designed to isolate external noise interferences and record the most original sound. However, general citizens have no access to these professional studios. Background noise is often recorded when singing using general microphones or at KTVs. Therefore, background environmental noises must be attenuated when using mobile devices to record singing.
Spectral subtraction is an effective speech enhancement technique for processing speech frequencies. In 1979, Boll  proposed a spectral subtraction method to reduce noise signals . The spectral subtraction algorithm is simple, fast, and effective, requires few calculation steps, and can enhance the signal-to-noise ratio of a sound signal. Assuming a noise corrupted input signal , clean speech signal , and noise signal , the signal influenced by noise can be expressed as follows:
Therefore, an original clean speech signal can be considered a corrupted input signal minus noise:
To reduce a speech signal, Boll  modified the basic spectral subtraction method:
is calculated below, where is expressed in decibels (dBs):
In reality, noise frequency in a noisy environment is irregular. To examine the effects of noise on speech signals, Berouti proposed a method that segments speech signals into multiple frequency bands . The equation can thus be rewritten as follows: where is calculated as follow:
The equation below represents nonlinear spectral subtraction, which is used to reduce high-SNR-subtracted instantaneous power spectrum and enhance low-SNR-subtracted instantaneous power spectrum, where denotes the enhanced speech signal, denotes the noise speech signal, and represents the estimate dependent on noise signals:
To eliminate musical noise, Berouti also proposed a spectral subtraction with the oversubtraction method, asserting that the subtraction quantity of noise spectral amplitude should be inversely related to SNR. Audios with louder speech should be subtracted with softer audios: where is a constant and can be calculated as follows: where and ; therefore, when speech signal is weak (i.e., low SNR), (SNR) increases. The amplitude of the noise spectrum is oversubtracted, and the musical noise is eliminated by using in place of the subtracted result.
3. Karaoke System with Real-Time Media Merging and Sharing Function
3.1. System Framework
During leisure time, people often organize social events with friends at KTVs. The increased popularity of television talent shows has potentially increased people’s desires to become a celebrity. The majority of these people select singing as their basic form of entertainment, which is conveniently available. This study combined cloud computing and a mobile device to design an application system that allows instant singing and sharing; thus, users can sing wherever they are and instantly share their recorded singings with others by using a cloud-computing-integrated mobile device that is connected to the Internet. Furthermore, the recordings can be merged with MVs and self-developed videos, thereby permitting users to feel the pleasure of being a singer. Figure 3 exhibits the system framework.
A description of the procedures that occur when users have completed downloading and installing the system application program into their personal device is outlined as follows.(1)According to the interface display, users can enter the song selection mode and select the “Select Songs by Classification” option to choose from the subcategories of the songs (e.g., Chinese, Taiwanese, and English songs, male and female singers, rock and roll, sentimental songs, and hip hop music) for the music file they wish to sing and search in the “Search Songs” selection by inputting keywords relevant to the songs.(2)The system shows results in the “Select Songs by Classification” or “Search Songs” function, displaying the possible list of songs for users to select and verify.(3)Users select the music file they intend to sing.(4)The system displays the option for selecting tunes; users can choose “male key” or “female key” based on their ability or preference.(5)Once the user has selected the desired song and tune, the application program searches for the song and tune file from the MV database in the cloud server.(6)Subsequently, the MV file relevant to the selected song and tune is streamed to users’ mobile device. (7)The mobile device begins receiving the MV file of the selected song, while users are able to sing the song through the mobile device.(8)While the user sings, the system synchronously uploads the input speech/voice signal in the mobile device to the application program in the cloud server, where the voice is processed and subsequently merged with the MV file in the database.(9)If users do not wish to merge their recordings with the MV files in the database, they can select their personal videos (images or motion pictures) stored in the mobile device to merge with their voice signals.(10)The merged media file can be shared instantly through the use of the Internet.(11)Other users can watch the merged video from their mobile devices or personal computers. The transmission process used in this process also involves a media streaming technology.(12)Users can also store their completed works or upload and share them directly on social networking websites such as YouTube and Facebook.
3.2. Instant Video Merging
For cloud servers to process real-time media merging, a real-time media merging server must be installed in the cloud server. When users sing through their mobile device, the voice signal is first uploaded into the cloud server for background noise elimination to reduce noise interferences, thereby attaining a crisp and clear sound. Upon receiving the processed signals, the real-time media merging server in the cloud server obtains the MV file of the selected song from the MV database to merge the video and audio signals in the streamed media file with the uploaded and processed audio stream. If users do not wish to merge their recordings with the videos in the MV database, they can select their personal videos (images or motion pictures) stored in the mobile device and upload them to the real-time media merging server in the cloud server while they sing. Figure 4 presents the real-time merging framework.
The merging process is summarized as follows.(1)When users sing through their mobile device, the voice signal is first uploaded into the cloud server for background noise elimination, which is explained in Section 3.2.1. (2)The system retrieves the MV media file of the song that has been sung from the MV database in the cloud server.(3)When the real-time media merging server installed in the cloud server receives the processed signal (i.e., noise eliminated), the merging server begins to extract MV media files from the database and buffer the signal of the media file of the user’s song. (4)Upon streaming the media MV of the song singing in progress, the system merges the buffered video and audio signals with the uploaded and processed audio stream. (5)If users do not wish to merge their recordings with the videos in the MV database, they can select their personal videos (images or motion pictures) stored in the mobile device and upload them to the merging server in the cloud server, where merging is simultaneously conducted.
3.2.1. Noise Elimination and Signal Buffering
Background noise is often recorded when singing using mobile devices. This is in contrast to the recordings by professional singers, who typically record and produce music in fully equipped recording studios which are designed to isolate external noise interferences and record the most original sound. Background environmental noises must be attenuated when using mobile devices to record singing. The study designed the karaoke system based on the methods outlined in Section 2.4, including the spectral subtraction method proposed by Boll  and the spectral subtraction with oversubtraction method proposed by Berouti. These methods were used for processing sound signals to eliminate noise. Figure 5 presents the workflow for noise elimination.
3.3. Real-Time Media Sharing
Among the numerous services provided in mobile devices, entertainment services are extensively used and extremely popular. These services not only entertain users and serve as a mean for users to release stress but also provide mobile Internet for them to interact with others. The system designed in this study allows users to sing through their mobile device, records their singing, and instantly merges the recording with a media file. Users can then share the content of their merged file with others through an online media streaming technology. Consequently, other users can watch the shared file through their mobile device or computer, thereby enabling them to experience the fun similar to that when singing at KTVs. Figure 6 presents the real-time media sharing framework.
When a song is merged with a media file, the system stores the merged file in the “Media Storage” section of the cloud server. Other users can then connect to the system cloud server and download this file to their mobile device or computer by using the media streaming download service provided in the “media storage” section. RTSP is a multimedia streaming protocol used to control audios and videos. Additionally, it permits multiple streaming demand control, which not only reduces the network traffic at the cloud server end but also supports multiple rendering. Therefore, with regard to real-time media sharing, this study adopted the RTSP approach to perform online streaming, thereby permitting packets to transmit data without interruptions. Consequently, users can easily connect to the server and render the selected media content.
This process is outlined as follows. (1)The system performs media merging as outlined in Section 3.2.(2)The merged file is stored in the “Media Storage” section and transmitted for real-time sharing.(3)Other users must connect to the “Media Storage” section in the system cloud server before they can watch the content of the merged medial file. (4)Once connected, real-time media sharing is implemented by reading the media content stored in the “Media Storage” section.(5)Media streaming is used as the technology for transmitting online media. (6)By using the Internet and adopting a media streaming approach, multiple users can simultaneously and instantly watch the media content.
4. System Analysis and Comparison
This study combined cloud computing and a mobile device to design a media sharing system; thus, users can sing wherever they are and instantly share their recorded songs with others by using their cloud-computing-integrated mobile device that is connected to the Internet. Furthermore, the recordings can be merged with MVs and self-developed videos, thereby allowing users to feel the pleasure of being a singer. This section outlines the analysis on the performance and features of the developed system and a comparison of the advantages of this system with traditional KTV.
4.1. Application in Mobile Devices
Because of the rapid development of mobile devices and their high rate of penetration, this study primarily employed a mobile device to operate the system. The main reasons and goals for using a mobile device are described as follows. (1)Mobile devices are light and portable. Through years of development, such devices contain numerous functions and have become a crucial part of people’s lives. (2)Entertainment-based programs available in mobile devices are widely used and extremely popular and have received substantial attention. The developed system not only provides users with an entertainment service and serves as a mean for users to release stress but also enables users to interact with others on the Internet. (3)Because of the convenience and portability of mobile devices, by using this system, users can simply install the application program to sing whenever and wherever they want. They can also share their media file (containing recordings of their singing) with others, thereby achieving the goal of using entertainment services in their mobile device.
4.2. Applying Cloud-Computing Technology
This study developed the karaoke system based on cloud computing technology. The main reasons and goals for using this technology are described as follows. (1)Cloud computing excels in the calculation processing and can process large computing programs into numerous small subprograms on the Internet. Therefore, numerous application programs operate based on this technology. (2)Cloud computing can compute and analyze large programs, reduce system load in users’ mobile devices, and increase the computing performance of the mobile device. (3)The cloud server established at the cloud end provides several functions: store MVs and media files (which contain users’ recorded singing that is merged with a MV), eliminate noise, merge real-time media, search songs, and generate lists of recommended songs.(4)By using the cloud computing technology, users can share media files with others on the Internet. Thus, regardless of the location, users can remotely experience fun and pleasure.
4.3. System Performance
In this study, a karaoke system integrated with real-time merging and sharing functions was designed. This system was developed by integrating cloud computing into a mobile device and operates by using real-time media merging and sharing technology. Through this approach, the rate of using entertainment services in mobile devices can be enhanced, which subsequently increases the convenience of singing. Thus, users can sing and share their singing with others. Compared with the traditional karaoke system, the system developed in this study is more suitable for users of the current era in which digital information technology is extremely popular. Moreover, the traditional KTV market is approaching saturation. Therefore, the potential of this study’s system for future development is considerably high. In contrast to the online music platform, KKBOX, this study’s system not only serves as a karaoke option but also possesses real-time media merging and sharing functions. This system is also integrated with advantageous functions identical to those of the KKBOX. Thus, if combined with the KKBOX, the value of the system developed in this study can be further enhanced. Table 1 presents the comparison of the advantages between the traditional KTV and this study’s system.
This study investigated a method by which people’s leisure and entertainment activities and cloud computing technology can be integrated into mobile devices, which are rapidly advancing and are frequently used, thereby creating a new type of service. Subsequently, this study designed a karaoke system integrated with real-time merging and sharing functions, allowing users to enjoy a singing-related application service regardless of the time and location. This service was established based on the cloud computing framework. The application program enables users to sing through their mobile device and upload the recorded singing to the cloud server where background noises within the recording are eliminated. Additionally, this program provides a service whereby users’ singing can be merged with MV media, thus permitting users to possess personal MVs, which can then be shared with others on the Internet for others to watch. This study provides the following contributions.(1)Users can attain entertainment goals without being limited by time and location with their mobile phone and Internet access. Through their cloud-computing-integrated mobile devices, they can sing and share their creations with other users. (2)Audio and video media can be merged and immediately shared.(3)Users can merge their song recordings with a MV or with a self-developed video to create personal MVs.(4)Consequently, through their mobile devices, people can sing heartily without having to visit KTVs, experience the enjoyment of friends and relatives’ gatherings without having to attend, and feel the pleasure of being a singer.
The current market for traditional KTV is approaching saturation. The system designed and developed in this study can effectively facilitate the expansion of the current KTV market, which is reaching saturation. It is more suitable for users of the current era in which digital information technology is extremely popular. Furthermore, the system facilitates the transformation of the current lifestyle into a mobile-based entertainment lifestyle. Therefore, in addition to visiting KTV establishments, people who wish to sing are also provided with a more convenient and immediate alternative. Overall, the availability of the system developed in this study can ultimately reduce the infrastructure and maintenance costs, which would subsequently decrease the manpower demands and costs.
This work is partially supported by the National Science Council of Taiwan under the Grants NSC 101-2221-E-218-052, NSC 102-2221-E-218-017, and NSC 100-2632-E-218-001-MY3.
H. Chih-Kai, Dynamic adjustment mechanism of the virtual machine computing resource in the cloud computing [M.S. thesis], 2010.
D. Xiao-dan, H. Qing, L. Yong-hong, and Y. Hong, “The system construction and the implementation of QOS control mechanism in intelligent streaming media,” International Conference on Solid State Devices and Materials Science, vol. 25, pp. 808–813, 2012.View at: Google Scholar
G. Sebestyen, A. Hangan, K. Sebestyen, and R. Vachter, “Self-tuning multimedia streaming system on cloud infrastructure,” in International Conference on Computational Science, pp. 1342–1351, 2013.View at: Google Scholar
J. Chen, R.-M. Wang, L. Li, Z.-H. Zhang, and X.-S. Dong, “A distributed dynamic super peer selection method based on evolutionary game for heterogeneous P2P streaming systems,” Mathematical Problems in Engineering, vol. 2013, Article ID 830786, 9 pages, 2013.View at: Publisher Site | Google Scholar | MathSciNet
Peterson, L. Larry, and B. S. Davie, Computer Networks 2007, Morgan Kaufmann, 4 edition, 2007.
RFC 2326, Real Time Streaming Protocol (RTSP), IETF, 1998.
S. F. Boll, “A spectral subtraction algorithm for suppression of acoustic noise in speech, Acoustics, Speech and Signal Processing,” pp. 200–203, 1979.View at: Google Scholar