Monday, October 28, 2013

Get me out of here





Journeys and maps.

These two always came hand in hand for the last centuries, if not millennia. The pre-journey planning was essential in order to choose the best route and save time on the way.

All this was revolutionized by the U.S. Department of Defense with the invention of GPS tracking and route calculating in the last decades.¹


It is therefore, that Generation Y is the first one, that is able to travel without spending a significant amount of time before starting to get moving.
- From now on, it's more convenient to travel -

Since 1975, the pioneering Dragon Speech Recognition software has continuously improved to understand human’s voice & meaning.²

The integration of voice recognition and GPS services brought the next level to new age voyage planning.
- From now on, it’s more safe to travel -

Voice Command Systems in cars are the realization of many childhood phantasies. They enable not only the trip planning to be controlled with linguistic commands, but also other multimedia & infotainment functions of the vehicle. There are Ford Sync, Lexus Voice Command, Chrysler UConnect and GM IntelliLink to list a few.

A major risk to be avoided is the distraction the driver faces when using his or her hands to operate the car’s dashboard. Due to the ability to tell the system where you want to go, who you want to call & what you want to listen to, the need for manual control decreases drastically. 

The innovative development of these speech-recognition software, such as Nuance, will enable drivers to make natural-language requests in their vehicles. Meanwhile, compared to other outdated systems, which supported 50 to 60 voice commands, there are up to 10,000 commands in modern systems. These also include the function to initiate tasks to book a table at your favorite restaurant.³
The continuos development of these integrated voice control function increases the car’s functionality of being simply a vehicle. Slowly but surely, the fiction of the popular 80’s TV series „Knight Rider“ is about to become reality, where the lingual interaction between the driver & the car resulted in team-play scenarios.

However, the increased scope for functionality of speech recognition in a car, is pushing the expectations and requirements. What started as a voice controlled navigation & route planning, is currently evolving into onboard personal assistance.
For this, the support of 10,000 voice commands don’t seem to be sufficient enough and the artificial intelligence of Voice Command Systems needs to be adapted to various languages, pronunciations and cultures.³


It sure can be said, that the development of Voice Command Systems in cars has not only been created but already revolutionized the interaction between driver and vehicle. However, there are endless possibilities which will give a vehicle not only a strong functionality but also a character.
- From now on, it will be more than just riding a car -


___________________________________________________________________________

¹ National Research Council (U.S.). Committee on the Future of the Global Positioning System
² http://eandt.theiet.org/magazine/2013/07/inner-voice.cfm
³ http://reviews.cnet.com/8301-13746_7-57321094-48/siri-like-voice-recognition-coming-to-cars/

Thursday, October 24, 2013

The Application of Speech Recognition in Education


Although it was more than 60 years ago that the speech recognition (also known as "Automatic Speech Recognition" or "ASR") was born, not until recent years has this technology been booming and its application can be seen in almost every industry and the application in education is one worth mentioning.

Both Apple and Microsoft have integrated speech recognition technology into their operating systems (as is shown below) and users can use their voice to control the computer and applications[1], that is, they can speak certain phrases, or “spoken commands,” to make the computer take different actions, such as opening documents or switching applications[2]. Therefore, this build-in technology can be applied to teachers, students or staff in an educational institution.


  Mac OS X 
 
Windows

 
Apart from these two OS providers, such companies as Nuance (www.nuance.com) in the USA and iflytek (www.iflytek.com) in China have also offered speech recognition solutions for education. Their target customers can be categorized into two parts: the educational institution itself (including teachers and staff) and students.
 
For educational institutions, one important yet tedious task is conducting oral tests because there are a large amount of examinees and traditionally the examiners have to follow a one-to-one approach or record test-takers’ words for further analysis, either of which is time-consuming, costly and may involve subjective judgment. Now evaluation systems based on speech recognition technology can automatically analyze test-takers’ words according to a combined standard including linguistics and statistics and output test reports. Iflytek’s Multilingual Intelligent Speech Evaluation System is one case in point where the system can automatically grade examinees while pinpointing the errors and flaws in the user's speech and give suggestions on improvement[3]. In this way, educational institutions can truly improve efficiency and objectivity and lower the examination cost.
 
Another thing educational institutions have to cope with is how to improve communication efficiency and fluency while maintaining or even reducing operating cost. Communication here has two dimensions: communication in between staff and with people outside and a large number of these communications are fulfilled by telephone call since it is a more direct way than other methods such as by email and fax. A question may be addressed here: since each staff or department has independent telephone number, how can one get in touch with them by telephone? A telephone directory, either online or in paper, might be a solution, but would it be better if we had telephone attendants specializing in connecting two parties over the phone? Yet new posts means new cost and a qualified attendant can be quite costly and cannot be at office 24h a day. Facing it, some institutions come up with the idea of “auto attendant” which can answer phones automatically, however this will sacrifice callers’ satisfaction because it forces callers to listen to menus, make touch-tone selections.[4] Nuance Employee Productivity Suite (EPS) is a much better choice here which is a voice-driven auto attendant and directory services solution that offers callers fast and efficient voice-command access to other people, places, and information resources from any telephone device at any time using simple, natural voice commands.[5]
 
For students, more and more are using computers to do homework, much of which involves text inputting and editing. However they may not know they can “type” text in a much more efficient and creative way - using speech recognition. Software like Nuance Dragon 12 can satisfy this need, which allows students to edit and format documents by voice.[6]
 



From above discussion, the overall benefit of using such technology is self-evident such as the improvement of efficiency, objectivity and satisfaction and the reduction of cost and time. However, there are still some issues that need to be considered. Investment might be the first problem people encounter when trying to apply these solutions in practice. What’s more, the accuracy of speech recognition is still one of the biggest concerns for customers, even though the accuracy are declared to be more than 99%, still customers will encounter errors while using it due to many factors such as background noise, accent and the quality of input device (microphone).
 
To sum up, though speech recognition technology is growing in educational field, operating system and software manufacturers still have a long way to go: They still need to suit diverse customers’ needs by developing software or applications that are more affordable and more accurate in “listening”, ”understanding” and “reaction”.





Works Cited:

[1][2]"Getting Started:Apple Technology for Diverse Learners." Apple Inc., n.d. Web.24 Oct. 2013. <http://www.apple.com/education/docs/L360989C-US_L360989C_DiverseLearners_ff_acc.pdf>.

[3]"Multilingual Intelligent Speech Evaluation System_Education Product_Anhui USTC IFLYTEK Co., Ltd." Multilingual Intelligent Speech Evaluation System_Education Product_Anhui USTC IFLYTEK Co., Ltd. N.p., n.d. Web. 24 Oct. 2013. http://www.iflytek.com/en/content/details_10_1680.html.

[4][5]"Experience the Difference Speech Makes…Speech-Enabled Employee Directories." Nuance Communications, Inc., 2007. Web. 24 Oct. 2013. <http://www.nuance.com/ucmprod/groups/corporate/@web/documents/collateral/nd_004545.pdf>.
 
[6]"Dragon for Education." Dragon Education Solutions. N.p., n.d. Web. 24 Oct. 2013. http://www.nuance.com/for-business/by-industry/education/dragon-education-solutions/index.htm.



Monday, October 21, 2013

Flying Solo

Direct voice input on aircrafts





Speech recognition is becoming an essential tool in the life of every individual. Voice innovation is redefining the ways we interact with machines. Gone are the days when you needed codes to access the information system. Today we say the command “Open” to open a door and like “Abra Kadabra” it’s open.

Direct Voice Input (DVI) was developed to automate in flight communication systems, so that the pilots can actually focus on more important information. DVI was implemented in the early part of the 21st century.

Direct voice input is a speech recognition system that is employed in the military aircrafts such as the Eurofighter Typhoon. It is a style of human–machine interaction, in which the user gives voice commands to issue instructions to the aircraft. The development of aircraft capabilities and functionalities has dramatically increased pilot workload.

The goal of this interaction is to increase efficiency of operations and to control the machine on the user's end. The feedback from the machine aids the operator in making operational decisions. Examples of this broad concept of user interfaces include the interactive aspects of computer operating systems, hand tools, heavy machines and aircrafts.
Looking at its history, it has been in service on the Eurofighter Typhoon since 2005. In addition to aircraft’s capability enhancements, DVI has further potential for growth.

The technical process consists of a real time comparison between the incoming audio signal, the pilot voice, and stored data speech models. DVI is a great example of speech recognition applications. Nonetheless, there are challenges that include background audio signal acoustics — the pilot’s speech style/accent or the cockpit noise of the engines.


Direct Voice Input does not compromise flight safety, but actually enhances it. Pilots work environment is becoming more efficient and manageable thanks to the optimization of DVI. DVI has opened a new frontier in automated flights that are more performance oriented when the number of tasks increase and the maneuvering of the aircraft depends on both of the pilot’s hands.

Thursday, October 17, 2013

Nuance Communications


Nuance Recognizer for Contact Centers



Voice-activated technology is improving because of powerful processors, advancements in natural language processing, and improved algorithms for recognizing voice. Nuance Communications, is an innovative company based in the US that provides customer support services, whose motto is, “Ease of use through Speech Recognition.”
Nuance is of the highest-grade software’s with the best recognition accuracy, encouraging natural human-like conversations. Natural Language Processing (NLP) utilizes the maximum capacity of various Hidden Markov Models (HMM) to sift through data hidden in speech analysis graphs. HMM results are then processed by mathematical models that yield a specific command.
Nuance specializes in the following products:
  • 1.    Automatic Speech Recognition (ASR) for contact center automation. ASR has the ability of analyzing 79 different languages and encourages human interaction in a natural manner. Benefits of using ASR include but are not limited to: cost savings, funneling calls that promote business, filtering unwanted calls, selective processing of information, thereby increasing overall business proposition.
  • 2.    Dragon Speech Recognition for Personal Use – Dragon Speech Recognition enables daily processes and tasks to be automated. These include speech-to-text dictation in word documents and accessibility functions to assist disabled users better interact with the computers.
  • 3.    Clintegrity – Hospitals ha ve extensive Customer Relationship Management (CRM) systems that organize and update data for many tasks. Many of these tasks require manual entry, which is time consuming. Clintegrity provides mechanisms where the software analyses speech in order to input entries or access records for doctors to examine. This in turn helps the hospital to focus on more important things at hand, to serve the patient in a timely and orderly manner.
Nuance has set benchmarks for the speech recognition industry yet new methods are still prone to analysis errors. The HMM model works on probability and statistics, which provide approximations and at times cannot decipher the way human interact. There are still areas in voice recognition software’s that can be adapted to the industry as a whole and can be customized to the specific company requirements.


Sunday, October 13, 2013

CALL CENTERS - "HOW MAY I HELP YOU?"


Call centers - "how may i help you?"

In 1952, Bell Labs investigators presented a basic system recognizing numbers spoken over a telephone: “Speech Recognition” was born. Nowadays, after significant technological progress, these systems are able to deal with infinite accents and languages.[1]
The capability of the “Speech recognition” system is to recognize words in natural speech and then convert them into a machine-readable format. Call centers use “Speech recognition” software to handle incoming customer calls. Nevertheless, it is essential to distinguish between "speech recognition" and "voice recognition". Indeed, "voice recognition" is recognizes a particular person's voice.[2]
What mainly attracted companies was its cost efficiency. Thus, appointing these machines reduced expenditure on employee staffing and maintenance. It should be underlined that by choosing a BPO company, business expenses related to employee maintenance directly decrease. But if the company uses an answering machine, operational costs could be reduced even more. In fact, this technological system can assist corporations reduce costs but also mechanize the handling of a high percentage of incoming customer calls. Some businesses combine speech recognition with Interactive Voice Response (IVC) to advance service quality. Indeed, as automated systems are available even when call center agents are not, effectiveness and productivity can be progressed, particularly for sales and collections corporations. For instance, the company One Telecom is using speech recognition technology in order to recover customer service in a mixed live agent and self-service setting. This system strengthened the company's data base of frequently asked questions (FAQs) and now tracks callers to 1 of 35 self-service units depending on their wants acknowledged by the speech recognition software throughout the call. 2
However, “Speech recognition” also presents drawbacks. Indeed, background noise and systems' low aptitude to identify accents and vulgarisms reduce its effectiveness, although some advances are still being made, such as with the Dragon software. Even though speech recognition actually requires more effort, conceivable earnings are enormous if implemented effectively, as said by Steve Rutledge, vice president of product marketing at Genesys Telecommunications Laboratories[3]. Still, the main element making answering machines short is their incompetence to attend to precise and complicated inquiries or difficulties. Indeed, numerous problems can only be listened by live agents, where business end consumers or probable clients needed sophisticated and specific clarification. Under such conditions, speech recognition software by itself would not be sufficient.[4]

To conclude, businesses may lower operation costs drastically by reducing staff and maintenance. Nevertheless, delivering ineffective customer service by not answering to specific customer wants could damage the corporation’s image significantly. In that way, speech recognition is a valuable tool but still needs important progress in order to be as effective as firms need it to be.


[1] Borzo, J. (2007). Now You're Talking. Available: Money CNN, http://money.cnn.com/magazines/business2/business2_archive/2007/02/01/8398978/. Last accessed 10th Oct 2013.
[2] Search CRM. (2009). Leveraging speech recognition technology in call centers. Available: Search CRM, http://searchcrm.techtarget.com/report/Leveraging-speech-recognition-technology-in-call-centers. Last accessed 9th Oct 2013.

[3] Bailor. C (2005). Avoiding the Speech Rec. Wreck. Available: destinationCRM.com, http://www.destinationcrm.com/Articles/Editorial/Magazine-Features/Avoiding-the-Speech-Rec.-Wreck-42727.aspx. Last accessed 9th Oct 2013.
[4] Johnson, A. (2012). Pros And Cons Of Automated Answering Service By BPO Manila. Available: Fusion Blog, http://www.fusionbposervices.com/blog/answering-service-by-bpo-manila.html. Last accessed 10th Oct 2013.

Monday, October 7, 2013

The Voice in the Machine


The Voice in the Machine

By Ahmed Ashraf

My friends the best,
with him around I feel so blessed.
He will do as I ask,
Helping me complete my task.
He helps with my chores and work,
hes as precise as a legal Clerk.
He can write down all I say,
He even helps plan my day.

Sometimes he makes me scream,
Making my ears burst with steam.
He makes these silly mistakes,
Which later on cause terrible headaches.
Many times we’ve said enough,
But without him it was simply too tough.

The world is fast evolving,
And having my friend helps me with problem solving,
Sometimes we're lost in translation,
But once that is overcome it's a great sensation.

My friend has a distinct voice,
He can appear on any gadget of choice,
He's half human half machine,
Have you figured who I mean?