Sunday, November 17, 2013

Business Value Proposition of Speech Recognition

Business Value Proposition of Speech Recognition



Software engineers have been working on voice recognition for over 40 years, and it had previously been a sci-fi like reality – but has its time finally arrived?

The human ear is designed to hear and analyze. We can distinguish a single vice out of many in even a noisy environment, which is one of the main design challenges for speech recognition. It has always been a challenge for the computer to understand human voices or how the syntactic structure of a sentence constitutes a meaning. Since the advent of the first speech recognition software in 1970’s, the industry has come a long way where we now have digital assistants with a personality known as Siri.

Speech recognition is actively deployed in commonplace areas ranging from cars, TV’s, phones, etc. In most of today’s new car models users speak to the console system to play music or search for a place over maps. Manufacturers who have been using this technology include but are not limited to Ford, GM, Mercedes, BMW.

Voice recognition has been integrated into operating systems including iOS, Android, Windows, Mac OS, and also forms a basis of Google’s search engine where users could speak the search query. Nuance and Google are the dominating players in the voice recognition field. The technology developed for products by these big companies is being implemented in mainstream areas.

We outline a few examples which when associated with business propositions help us find new domains into how intuitive technology has become today to propel the success of a company.

Contact centers have been using voice recognition systems for over a decade now, resulting in greater customer satisfaction and reduced dropped calls. Speech technology assists customers to find the right person at the right time for their query instead of waiting for long time on the phone.

Smart TV’s are also exploring new domains by taking away the remote from the user wherein the user simply has to say commands in natural language and the system recognizes the needs and performs the necessary operations: changing channels, lowering volume, playing games.

The defense branches of many countries now have user interfaces that recognize full sentences and operate accordingly in combat situations. The industry as a whole has come a long way, yet there are still improvements to be made with the software available today. These challenges include: accent recognition problems, filtering useful information from noise, distortion, and the speed with which results are delivered.




Wednesday, November 13, 2013

Web Speech API - A Chrome experiment

Here’s a quite interesting experiment for our followers.

Peanut Gallery is a Google Chrome experiment that shows off voice recognition by letting you title silent film clips with your voice.
It uses your computer's microphone and the Web Speech API to turn speech into text.

How does it work?

You must have Chrome on your computer to enter the following website address: PeanutGalleryFilms.com.
Once on the website, select a black and white movie. As it plays, any words you vocalize will appear as if the characters were saying them.

Discover a fun tool, share it with your friends and become familiar with talking out loud to your computer.

Watch a demonstration of the Peanut Gallery:
http://www.youtube.com/watch?v=hd8HzLCIstE

Tuesday, November 5, 2013

Siri — Your Wish is its Command

Siri : Your Wish is its Command


The Big Bang Theory - Meeting Siri

Siri has revolutionized the way users interact with iDevices in their daily lives. A Norwegian name for, "beautiful woman who leads you to victory," represents an iconic message. If victory consists on being your personal assistant then Siri is up for the task, which include but are not limited to the following catch phrases:

  • Siri, wake me up. — Set's your alarm.
  • Siri, remind me to do something every day. — Writes in your reminders.
  • Siri,  do I need a sweater today? — Check's the forecast.
  • Siri, find me some coffee. — Searches business establishments near you.
  • Siri, send a text to… — Will text on dictation.
  • Siri, how do I get to downtown . . . ? — Searches map for directions.
  • Siri, play me some Avicii. — Interfaces with your iPod.
  • Siri, remind me when I leave here to pick up some bread. — Alerts based on location.
  • Siri, solve x +y . . . — Your dog didn't eat your homework.
  • Siri, send a Tweet . . . — Integrates with Social Media

These are only some of the plethora of options and level of interactivity Siri offers. Below we share a recorded personal experience with Siri and the ease of use amongst the speech recognition industry. Siri previously was an independent app and corp before being purchased by Apple. Purchased spring of 2010, Apple introduced its integrated features with the release of the iPhone 4S. Siri's speech recognition engine is powered by Nuance technologies, the topic of one of our earlier blog posts, and leader in it's voice recognition algorithms. The power that Siri wields is it's implementation of artificial intelligence coupled with database interfacing.


Forbes magazine has suggested that the seamless integration with data base queries will revolutionize the advertising market and direct competition between Apple, Google, and Microsoft. With Apple's acquisition of Siri, Inc, all development for Android and Blackberry platforms ceased making Siri and exclusive Apple commodity. Various successful ports to older devices have been made using 'Jailbroken' iPhones for the older 3GS and iPhone 4 generations. However these ports typically require more tech savvy users with limited support options.




Contrasting Siri's advanced voice recognition capabilities, we include a video demonstrating a Blackberry's voice recognition prompts, and the difficulty it has distinguishing a simple call prompt. Siri's advanced artificial intelligence coupled with advanced voice recognition technology makes the interface approachable and intuitive. This easy of use is a game changer for Apple and the smartphone industry as a whole.






There is now greater competition than ever within the smartphone industry with the advent of voice recognition tech. Apple has systematically selected queried databases and backend computing that searches the information source a user requests. Recently Apple opted for the default implementation of Bing (Microsoft) as its search engine (direct competitor to Google), and OpenTable as a restaurant reservation link. We note that Siri has brought with it a head-to-head competition with Google and its advertising services. Siri is not purposefully bypassing Google as a search option and using other applications and resources to fulfill user's information needs.


How far Siri can go is yet to be seen. There are still a share of users that own Siri enabled iDevices (also available for iPod Touch and iPad), yet fail to exploit the features to their full potential. Nevertheless, the ubiquitous nature and spunky personality that has been programmed into Siri will keep further penetrating the smartphone market. Many new languages are supported, with a full list including: English, French, German, Spanish, Japanese, Italian, Korean, Mandarin, Cantonese. Apple claims that Siri will learn user's specific accents and way of speech the more the user interacts with their iDevice. Personification and custom ability are high priority targets in a fast-evolving game where the rest of the big companies seem to playing catchup to Siri.

Lastly, see how Raj from The Big Bang Theory — Dates Siri




Friday, November 1, 2013

Breaking Language Borders - The Future and Beyond


Speech recognition technology has accomplished many breakthroughs in the past 5 years. The implementations of the technology seem to expand exponentially revolutionizing the manner in which we go about many of our daily activities. One of the most iconic, many would say started the buzz, is the use of speech recognition technology in translation and intra language communication.


“One day in the not too distant future, you might ask about your dinner options in a Parisian restaurant, give detailed directions to a taxi driver in Moscow, or discuss a business deal with potential partners in Tokyo—fluently, in your own voice, without knowing a word of French, Russian, or Japanese. Your tablet or smart phone will do the heavy lifting of understanding what you’re saying in English, translating it into your listeners’ tongue, and speaking it in your voice with the pronunciation, tones, and inflections of a native speaker.”

This technology is composed of three unique and distinct technologies that when implemented in a single application creates that ‘magic of speech translation’. These three sub components are: speech recognition, language translation, and speech synthesis.

“As compelling as the promise of early translation software was, the results were almost always underwhelming”