Voice Recognition

MEMOBASE ARCHITECTURE
Voice Recognition
SCIENCE
NATURE
DECODING
MemoBase is scalable: We use
the same system for projects
large
and small.
Even theWe use
MemoBase
is scalable:
physical
the
the sameinstallation
system for of
projects
system,
which
may
span
large and
small.
Even
theseveral
servers,
serve anyofnumber
physicalcan
installation
the
of
projects
and may
any number
system,
which
span of
customers.
Each
project
has
its
several servers, can serve
any
own
dataofdictionary,
its own
number
projects and
any
database
and
its
own
rules.
number of customers. Each
These
every
projectare
hasthe
itsheart
own of
data
MemoBase
project.
dictionary, its
own database
and its own rules. These are
the heart of every MemoBase
project.
CONTACT
MemoBase
Ringstedvej 65, 4600 Køge
Telephone: +45 56 64 04 10
Denmark
Mogens Esbech
[email protected]
+45 40 33 76 71
Gert Troensegaard-Petersen
[email protected]
+45 21 48 93 15
Norway
Atle Thorstensen
[email protected]
+47 91 51 77 99
1
THE IDEA
Humans have always been surrounded by sounds, including the ones we generate ourselves. The earliest human innova9on, a:er all, was language and grammar, providing a compe99ve advantage above every other species on Earth. Even now, transforming sounds into informa9on is a powerful tool not only for surviving in the wild, but for maximizing produc9vity now that we have escaped it. These days, the technical aspects of sounds—frequencies, strength and transmission 9me—are well understood. The human brain uses a considerable amount of compu9ng power just to interpret a single word. That’s because every word we hear comprises a vast amount of data that must be deconstructed into a series of frequencies and 9me, then reconstructed once again as a sound, making up our final percep9on.
Every human being is unique in his spoken words, just as every human being is unique with his fingerprint, re9nal paFerns, facial expressions, gait and other biometric proper9es. We produce much more data with our spoken words, however, than our other biometric characteris9cs. This, combined with the cheap and fast audio processing widely used in PCs, tablets and smartphones, explains why speech recogni9on is the best candidate for providing unique commands for computers and, eventually, for electro-­‐
mechanical motors and robots.
Your voice is a tool that is always with you, and always hands-­‐free. Speech is unique to each individual, and each spoken word can be matched with a specific ac9on or inten9on. This means that decoding audio, if done correctly, could replace passwords and commands in applica9ons. If an object is equipped with sensors, relays or electro-­‐motors, you could ‘talk’ with your car, your house, your boat or your desk. That means geOng things done without being physically present—all you need is a smartphone and a connec9on to the Internet.
We’re all familiar with the poten9al of this technology. But what is the path to reality? Smartphones already have the capability—
microphones and speakers, lots of www.memobase.com
MEMOBASE ARCHITECTURE
2015
compu9ng power—to analyze and recognize speech paFerns. The only remaining requirement is 100% recogni9on of the individual and the 100% exclusion of all other individuals. This is harder than it sounds. controls as it changed its route radically mid-­‐flight. We do not yet know which vulnerability 2015 will reveal, but we can be sure that the aFen9on paid to this issue will con9nue to increase.
Speech mingles with other sounds in the environment; not every speech command is given in a recording studio or the departure lounge of an airport. Humans catch colds, they get laryngi9s and, some9mes, they just get a lump in their throat. In all of the above cases, access control based on individual speech recogni9on may have been decisive in preven9ng disaster. But more important than one single change is increasing the number of access controls at all stages of business processes. This is similar to what banks have done to reduce the number of robberies, an ini9a9ve that has been overwhelmingly successful.
Thus, the so:ware must be flexible enough to detect varia9ons in one individual, but not so flexible that it confuses one individual for another. It must also keep track of all the speech it is hearing, and assign each individual different rights and responses. One command—switch off the light!—may for one individual mean a 9ny kitchen light, while for another may mean every light in an en9re town. It’s not just that the computer must be able to hear. It must be able to understand. MemoBase has developed speech recogni9on so:ware that solves this problem. The so:ware is wriFen in the programming language Java and can be integrated into a PC, tablet or smartphone either as a background applica9on or as a front-­‐end device or cloud applica9on. The so:ware is, as of January 1, 2015, available for the Chrome browser in Android OS for smartphones and tablets and on Windows OS for PC. Throughout 2015, it will be expanded to more browsers and pla_orms. ACCESS CONTROL
Throughout the Western world, at every level of society, from individuals to business execu9ves to government leaders, access control has emerged as a major interest in just the last year. It wasn’t just the allega9ons against the NSA for spying on its own ci9zens. In Canada, gaps in access control in public and private buildings revealed vulnerabili9es to terrorist aFacks. The Sony hacks spread terabytes of informa9on across the internet. In Iraq, problems with access control allowed ISIL to take control of American tanks.
The Charlie Hebdo aFacks were partly facilitated by a PIN code lock at the front door. The unsolved disappearance of the Malaysian aircra: in the Indian Ocean means authori9es do not know who was behind the aircra:’s 2
Most high-­‐security en99es in the public and private sector use a number of applica9ons to prevent break-­‐ins, each with their own login func9on. Each of these applica9ons has a '9me out' feature that requires a new login if the applica9on is idle for a set period of 9me. In hospitals, for example, 9me studies have shown that doctors spend up to 1 hour a day logging in to the systems they use to carry out their work. Individual speech recogni9on, by constantly verifying the input of a single individual, will simultaneously reduce waste and increase security. CONTROLS
The use of speech to control access and perform commands is a major breakthrough in automa9on, in both the private and the public sector. In the early years, the focus was on autonomous machinery and equipment, powered by bespoke pla_orms that weren’t linked across suppliers, products or customer needs. In recent years, ISO standards, especially in communica9ons between units and across geography, has begun to be applied. EnOcean technology, the Wi-­‐Fi communica9ons standard that allowed for more advanced communica9ons, has become the interna9onal standard for communica9on between sensors, relays and actuators.
Now, the Internet, mobile networks, Wi-­‐Fi, cloud technology and EnOcean technology make it possible to bind all types of devices together and establish an overall control system. This establishes a universal logic that can be be embedded in all autonomous machinery and equipment for all products, consumers and needs.
The thermostat is a classic example—it controls the internal temperature not based on outside condi9ons, www.memobase.com
MEMOBASE ARCHITECTURE
user needs or usage forecasts, but simply based on the seOngs of an individual. In reality, when outside condi9ons change, so should the thermostat. But in prac9ce, when the temperature outside drops, the user must manually update the thermostat’s seOngs. Seen in this light, machinery and equipment are resources that can not only be switched on and off, but can be fully or par9ally ac9vated. The way they are calibrated and adjusted must be based on logic and data. When it gets colder outside, for example, the user typically turns up the thermostat. 2015
follow-­‐up by professionals in case of a burglary or accident. And all the 9me, family, friends or carers can survey data, in real 9me, to adjust the system according to their own, and residents’, needs. Below are examples of screens whose appearance is controlled by each individual user’s profile, including assigned roles and permissions:
The more current, historical and forecast data each piece of equipment incorporates, the beFer its reac9ons will be
—and the more useful it will be for its users. Both the logic and the data must incorporate user preferences and infrastructure, and incorporate local resources and geography, while allowing for further adjustments via smartphone, tablet or PC.
PROOF OF CONCEPT
To demonstrate all of the concepts described here, MemoBase has established a comprehensive live demonstra9on—a building where all of the aforemen9oned technologies are in prac9ce. The demonstra9on is built around a single integrated control system that receives data from its own weather sta9on, as well as forecast data from DMI. Outside and inside sensors survey temperature, humidity and CO2, while inside sensors monitor light, mo9on, hea9ng, ven9la9on, doors and windows. All of this data drives adjustments to condi9ons inside the building. The building also has func9onal, automa9c plans for fire, burglary and vandalism, as well as support func9ons for residents who fall, have an epilep9c seizure or require emergency services.
In case of an accident, s9ll images from before, during and a:er the alarm are produced with 9me stamps, so responders can diagnose the severity of the incident. For longer-­‐term residents, the demonstra9on building includes video and audio, so that carers, healthcare professionals, family and friends can be called directly from the building in case the sensors forecast a greater risk of injury. All the data from within the building is linked to a logic that includes automa9c plans for communica9on and 3
www.memobase.com
MEMOBASE ARCHITECTURE
4
2015
www.memobase.com
MEMOBASE ARCHITECTURE
VOICE
RECOGNITION
Concepts(
Energy(
Welfare(
Frontend!
"!SmartPhones!
"!Tablets!
"!SmartWatch!
"!PC!
2015
Se#ngs'
Database'informa1on'
Login'informa1on'
Access'control'
Servlet'defini1ons'
Browser(
html5(
frame(
Frontend(
Se*ngss(
Voice((
Recogni1on(
Voice'recogni1on'
Learning'
Re;learning'
Commands'
Login'command'
- Heat
- Ventilation
- Air Condition
- Automation
- Access control
- Alarm & guards
- Communities
Middleware!
Voice!Recogni*on!
servlet!in!cloud!
- Families
- Friends
HTML5$
Web'server'
Servlets'
;'Voice'Recogni1on'
;'Merge'
Html5'
Servlets$
Voice$
Merge$
Web$server$
Data(Entry(
Backbone!
"!Database!
"!Business!rules!
"!Access!control!
"!Cloud!technology!
- Action teams
- Heavy duty applications
- Maritime
- Construction
- Army
- Robots
- Drones
MemoBase'
Database''
PGM'data'model'
Bussines'Rules'
Database(
PGM(model(
Business(
rules(
MemoBase(
Datadictonary(
Access(control(
API’s(
Decoding of a single word in real time
5
www.memobase.com