August 1996 To Learn About the Voice Modem Extensions for Windows 95, Press 1 Now! Charles Mirho Charles Mirho works in Silicon Valley. His new book, "Windows 95 Communication Programming" is available now from Microsoft Press. He can be reached at cmirho@mcimail.com Traditional fax/data modems are capable of sending-what else?-faxes and data over your phone line. Voice modems add the ability to send digitized voice data in real time. For example, you can play an audio file that says, "Hello, this is Ernestine, please leave a message," then record what the person on the other end of the line says. Combine this with the Windows Telephony API (TAPI), which makes it easy to manipulate phone calls, and you have the potential to write some really cool apps. Well, almost. The problem is that while good fax and data standards have been around for a while, the voice modem situation has for the most part been a bit like the way things were when sound cards first came out. Each voice modem has its own proprietary commands and comes with its own special drivers and API. If you wanted to write a telephone answering app, you could-but you'd have to use the API that came with your voice modem and there'd be little hope your program would work with other brands. Well, not any more. Microsoft's new Voice Modem Extensions for Windows¨ 95 provide a standard architecture for integrating voice modems into Windows 95. Now you can write sophisticated voice applications like telephone answering systems and voice mail, just by calling existing TAPI and Win32¨ functions. No more proprietary APIs; your app works with any voice modem that supports the new standard. On the flip side, the new extensions provide a way for voice modem vendors to plug their modems into Windows 95 with minimum fuss. This article has two main parts. First, I'll describe the new standards and internal modem architecture. The internals are important and interesting, but you only really need to understand the details if you're writing a voice modem driver. If you're writing a telephone app that uses voice, you can just tip your hat in thanks that the standards exist, then move promptly on to part two, where I'll show you how to write a simple telephone answering system using the new extensions. AT+VIf you've ever used communication software like WinCIM (Windows CompuServe Information Manager) or a Web browser, you've probably seen some kind of modem setup dialog box similar to the one in Figure 1, which shows various AT (ATtention) commands that WinCIM sends to the modem to tell it to do things. For example, to call Microsoft headquarters, your comm program would send
to your modem. ATA answers the phone and ATH hangs up. You get the idea. The back pages of your modem manual are filled with all sorts of AT commands for doing stuff. The Hayes AT standard helped promote widespread acceptance of data modems because programs could just send the appropriate AT-mumble-this and AT-mumble-that, and any modem that speaks the AT standard will know what to do. A similar standard, AT+V, appears to have emerged for voice modems as well. The AT+V command set consists of Hayes AT-prefixed commands and +V-prefixed voice commands. AT+V is documented as ANSI/TIA/EIA standard IS-101 entitled "Facsimile Digital Interfaces-Voice Control Interim Standard for Asynchronous DCE." A follow-up to this specification is PN-3131 by TIA Technical Subcommittee TR-29.2. Figure 1 Modem Setup If you're in the market for a voice modem, you should be warned that not all voice modems are 100 percent compliant with the AT+V standard. The standard has only just begun to gain acceptance, so some modems still use competing command sets. In particular, modems that use chips produced by Rockwell Corporation may use a different standard called AT#V. Unimodem VMicrosoft recently produced an upgrade to the Unimodem driver, which is a standard part of TAPI and Windows 95. This upgrade, Unimodem V, supports features of voice modems that implement the AT+V, AT#V, and other voice command sets. Unimodem V should do much to promote the AT+V standard. In practice, of course, no one ever sends raw commands like ATDT to the modem. Instead, you call TAPI functions. A full discussion of TAPI and Unimodem is beyond the scope of this article, but a brief review is in order. TAPI is a collection of Win32 APIs that manage phone calls. TAPI function calls typically end up sending AT (or AT+V) commands to your modem. The TAPI APIs are contained in tapi.dll, which in turn calls the Telephony Service Provider (TSP) API. Windows 95 modem functions are contained in a TSP, implemented in a DLL called unimdm.tsp. Unimodem translates TSPI calls into AT or AT+V commands, and sends the commands to unimodem.vxd, a virtual device driver that actually talks to the modem. It's all very confusing, but Figure 2 should help make things clear. Figure 2 Simplified Unimodem Architechture TAPI doesn't get into the data business; all it does is manage phone calls. When it's time to send data over the phone line, you must invoke some other API such as the Win32 comm API for sending ASCII or binary data (see "Create Communications Programs for Windows 95 with the Win32 Comm API," MSJ, December 1994), or the multimedia Wave API for sending voice data. One of the functions in TAPI is lineGetID, which gets the device ID for the wave device associated with a telephone call. You can use this ID just like any other wave ID to play or record sounds to or from the phone line, using standard multimedia functions like waveOutOpen and waveOutWrite. Instead of coming out of your PC's speakers, the sound goes out over the phone line. I'll come back to lineGetID later; right now, I want to take you on a little tour of the grungy stuff that goes on behind the scenes with Unimodem. It's actually a bit more complicated than Figure 2 suggests. Modem Hardware HeadachesSince audio data arrives from the Wave API instead of TAPI, some provision must be made to coordinate and synchronize it with commands arriving from TAPI. This is where things get a little tricky, because hardware differences come into play. From the point of view of a telephone application running on Windows 95, a modem is a modem is a modem. But that's only because Unimodem shields programmers from hardware differences. (The Uni in Unimodem stands for universal.) There are two basic types of modems: internal and external. External modems typically attach to the PC's COM port and communicate with the computer by way of a serial cable. Internal modems plug into the PC's internal expansion slots and emulate a serial port adapter (UART). Some internal modems use the serial port for voice, while others have a separate audio hardware port. To get around the first problem (internal versus external), Unimodem calls back into the Windows 95 virtual communication driver (VCOMM) to talk to the modem through a serial port driver, as shown in Figure 3. Unimodem.vxd sends both command strings and data to the modem via VCOMM and the port driver. The commands come from TAPI via unimdm.tsp; the data comes from either the Win32 comm API or the multimedia wave API. Figure 3 Unimodem Routes all Commands Through VCOMM The second major difference among voice modems-serial voice versus hardware audio port-is a little more tricky. Unimodem V uses a clever synchronization mechanism to support both types. For modems that use the serial port for voice, Microsoft provides a standard serial wave driver servdrv.drv (see Figure 4). The serial wave driver supports IMA (International Multimedia Association), ADPCM (Adaptive Delta Pulse Code Modulation), and Rockwell ADPCM audio formats at sample rates of 4.8KHz, 7.2KHz, and 8.0KHz. It also supports 8KHz single-channel 16-bit PCM data. Most important, the serial wave driver works with any voice modem that supports the AT+V or AT#V standards. This means that if you're a serial modem vendor, all you have to do is implement AT+V and use the Microsoft¨ wave driver to plug into Windows 95. Internally, the serial wave driver doesn't talk to Unimodem directly, but goes through a special DLL called vmodctl.dll. Figure 4 Unimodem Works with Wave API Through a Serial Modem For modems that use a separate hardware audio port, Microsoft provides a "wave wrapper" DLL for synchronizing the audio data. The multimedia system calls wavewrap.drv, which in turn calls Unimodem to send AT+V commands that support audio data transfer (for example, to place the modem in voice transfer mode). After the commands are completed, the wrapper calls back into mmsystem.dll to play the audio. mmsystem.dll calls a vendor-supplied modem wave device driver to transfer the audio data (see Figure 5). The modem wave device interfaces to the audio hardware interface only. Modem vendors take note: if you have an existing modem wave driver, you must remove any direct access to the serial comm port and let the wave wrapper send AT commands to the modem. Figure 5 How Unimodem Coordinates with Wave API Through a Hardware Audio Port In addition to the major modem types just described, there are often minor variations in command syntax among different modems. For example, some voice modems use AT+V; others use AT#V. To get around these differences, Unimodem views the modem as a box that executes logical functions like answering a call and hanging up. The exact sequence of characters required to perform each function is stored in the Windows 95 system registry. The modem vendor provides an INF file with information about the commands; the installation program loads this file into the registry when the user installs the modem. COMDIALSo much for internals. Now it's time to write a real app. COMDIAL, my sample app, is a Windows 95-based telephone "answering machine" that answers the phone, plays an outgoing message, and gives the caller a chance to record a message (see Figure 6). It also lets you listen to your messages from a remote phone if you enter your password. To use the new voice modem features provided by Unimodem V, you first have to initialize TAPI, open a logical line device, and negotiate an API version number. (These details are described in the article "Reach Out and Touch Someone's PC: The Windows Telephony API," MSJ, December 1993), which is available on the MSDN CD. I won't repeat them here. In the function telephonyOpen, I store important information about the logical line device and the communication session in a global MYTAPI structure defined in COMDIAL.H. When you call lineInitialize to initialize TAPI, one of the arguments you pass is the address of a callback function. TAPI calls this function when things happen, like when the phone rings or someone at the other end presses a keypad button. The callback function is where all the action happens. It's the TAPI equivalent of a window procedure and tends to have one of the old mother-of-all switch statements. The callback function has the following signature:
dwDevice is either the handle of the logical line or the handle of a call in progress, depending on whether the message is line or call related. dwMessage identifies the specific event. dwInstance specifies application-defined instance data to accompany the message. The last three parameters contain details about the event. As I mentioned earlier, lineGetID provides the link between TAPI and the Wave API. lineGetID returns the device ID of the wave device that corresponds to the phone line. Playing or recording audio with this device ID results in audio being sent or received through the phone line. lineGetID retrieves the ID of the wave device associated with a particular phone line, address, or call.
hLine is a handle to an open line device, as returned by lineOpen; dwAddressID specifies an address on the given open line device; and hCall is a handle to a
This structure is used because lineGetID retrieves IDs of many devices associated with phone lines, not just the wave device. For example, you might want to get the ID of the COM port associated with the modem, which is a string like "COMM/DATAMODEM". Whichever the case, the ID is returned at the end of the VARSTRING struct; its size and offset are specified by dwStringSize and dwStringOffset. The offset is relative to the beginning of the structure. Since the ID of a wave device is a DWORD, the size is four bytes. COMDIAL gets the device ID by copying it from the VARSTRING struct.
COMDIAL allocates space for VARSTRING, plus four bytes for the wave ID DWORD at the end.
Don't forget to set dwTotalSize to the amount of memory allocated so Unimodem knows how much memory it has to work with. The last argument to lineGetID, lpszDeviceClass, tells Unimodem which device ID you want. If you want to play audio over the phone line, you should specify "wave/out"; if you want to record audio, use "wave/in". COMDIAL uses both so it calls lineGetID twice and stores the in/out IDs separately. Most modems use the same ID, but there's nothing in the spec that says they must, so you should store separate IDs for wave in and out. The function mylineGetWaveID in mytapi_.c shows how to call lineGetID to retrieve a Wave device ID (see Figure 6). There, I call lineGetID twice: first to determine how much memory to allocate, then again to actually retrieve the ID. Tuning in to Touch TonesA key feature of voice mail systems is the user's ability to navigate by pressing buttons on the telephone. Each button generates a unique dual tone multi-frequency (DTMF) pair of tones, corresponding to the digits 0 to 9, A to D, * or #. (A to D were included in the original DTMF specification, but do not appear on standard telephones today.) COMDIAL detects and responds to these digits. If I call my answering system and press 1, then enter my password, COMDIAL plays back any messages in my mailbox. A more sophisticated app might manage several different mailboxes for different users of the system. Of course, before you can listen for digits, you have to answer the phone! But I'm going to skip that for the moment and come back to it later. The initial release of Unimodem did not support digit detection, but Unimodem V supports both detection and generation through two functions: lineMonitorDigits and lineGenerateDigits. As with lineGetID, these functions were always part of TAPI, but they didn't work until now. lineMonitorDigits has two arguments: hCall, the handle to the call, and dwDigitModes. Use LINEDIGITMODE_DTMF for DTMF digit detection or LINEDIGITMODE_PULSE for pulse digit detection (an older technology still used in some remote areas and foreign countries). Some modems also provide DTMF edge detection; they can detect the down-edge of a DTMF tone, indicating the tone has ended. This is useful if you want to detect digit tone and duration. The flag LINEDIGITMODE_DTMFEND enables down-edge detection if the modem supports it. To disable digit detection entirely, call with dwDigitModes set to 0. Once you've turned on digit detection, Unimodem sends a LINE_MONITORDIGITS message to your line callback function each time a digit is detected. For a DTMF digit, the dwParam2 parameter of the message is LINEDIGITMODE_ DTMF; for a pulse-mode digit, dwParam2 is LINEDIGITMODE_PULSE. The digit itself is passed in dwParam1. The low byte contains the digit, which will be ASCII 0 to 9, A to D, * or #. COMDIAL traps these digit events to implement a simple state machine as in Figure 7. Assuming for the moment that COMDIAL has answered a call, played the greeting, and prompted the caller to press either 1 or 2, COMDIAL is in the idle state. Pressing 1 lets me enter a password and hear my messages; pressing 2 lets the caller record a message. If the caller presses 2, COMDIAL goes to rec state, records the caller's message, and goes back to idle. Figure 7 State Machine for COMDIAL Voice Mailbox If the caller presses 1, COMDIAL goes to the password 0 state. At that point the caller must enter a password. The example recognizes a single hardwired password: 6727. (Not very customizable, but this is just a demo program.) If the caller enters 6727, COMDIAL goes to playback state, plays the messages in the voicemail box, then returns to idle state. The implementation appears in the line handler function in the switch case for LINE_MONITORDIGITS. In several places I call playSound to play a message, passing the device ID obtained from lineGetID. For example, to play the "Please enter your password" message, I call
playSound is a helper function that contains standard code for opening and playing wave files. You can find it in SOUND.C, part of COMDIAL. The function takes a wave device ID, the name of the sound file to play, and a handle to the application window for displaying message boxes in case of errors. I use message boxes to display errors because it's easy, but a commercial program should probably spool the errors to a log file or use some other mechanism that doesn't require human intervention, since in general a voice mail app will run unattended. After the sound file plays, Windows sends a MM_WOM_ COMDIAL doesn't use it, but TAPI has a lineGenerateDigits function to generate (as opposed to monitoring) digits.
You could use lineGenerateDigits to write a program that talks to other answering systems, so your computer could call my computer and leave a message. The digits are generated in-band, so you will normally call lineGenerateDigits after the call is connected, though this is not a requirement. The dwDigitMode parameter is the same as for lineMonitorDigits, except LINEDIGITMODE_DTMFEND is not supported. You specify the digits in the string lpszDigits. Valid DTMF digits are 0 to 9, A to D, *, and #. A comma adds an extra delay between digits it separates. The delay varies depending on the modem configuration-you can check MinDialParams and MaxDialParams in the LINEDEVCAPS structure filled by lineGetDevCaps for the delay associated with a comma. dwDuration specifies the duration of the digits generated. Once all the digits are generated-or when digit generation is aborted by calling lineGenerateDigits with a NULL buffer-Unimodem sends a LINE_GENERATE message to your app. Answering CallsSo far I've shown you how to get the wave device, play sounds and monitor digits, but I skipped over one little detail: answering the phone. A voice mail system isn't During its initialization, COMDIAL calls lineSetNumRings to set the number of rings it will wait before answering an incoming call. This function is designed to help telephony apps cooperate in implementing "toll-saver" features. When no messages are waiting, COMDIAL picks up after five rings. When messages are waiting, it picks up after three rings. This way, when I call to listen to my messages and the phone rings four times, I know there are no messages waiting. I can hang up before COMDIAL answers and avoid those hefty long distance charges. The function lineGetNumRings returns the minimum number of rings set by all apps. COMDIAL uses this to respect the toll-saver settings of other apps. Remember: your app may not be the only telephony app running! For example, there could also be a fax program running at the same time. (More on this later.) So when COMDIAL opens the line, it sets the number of rings to for either RINGCNT (five) or the value set by another app, whichever is less. This setting corresponds to the state in which no messages are waiting. After messages are recorded, it sets the number of rings to wait for either RINGCNT1-2 or the value set by another app, whichever is less. lineSetNumRings doesn't actually do anything except store a TAPI system global variable that apps can share. You still have to answer the phone yourself. Each time the phone rings, Unimodem calls my callback function with a LINE_LINEDEVSTATE message.
dwParam1 is LINEDEVSTATE_RINGING and dwParam3 is the ring count. COMDIAL compares the ring count to the number of rings to wait. When they're equal, I call lineAnswer to answer the phone, then reset the number of rings to zero. Note that lineAnswer is an asynchronous function that returns immediately. If TAPI/Unimodem are able to answer the call successfully, Unimodem notifies my line callback by sending a LINE_CALLSTATE message with dwParam1 set to LINECALLSTATE_CONNECTED. At this point, there is an end-to-end voice connection with the caller. Time to play my greeting, go into IDLE state, and start listening for digits.
New Calls and HandoffsWhen COMDIAL is the only telephony app running, it receives a LINE_CALLSTATE message on or before the first ring, with dwParam1 set to LINECALLSTATE_ You can be certain of one thing: a call handle will always accompany a LINE_CALLSTATE message, whether dwParam1 is LINECALLSTATE_OFFERING or LINECALLSTATE_CONNECTED, or any other LINECALLSTATE_ You can only answer and control calls for which you have owner privileges, so you should check dwParam3 for the LINECALLPRIVILEGE_OWNER flag before saving the call handle. (If you call lineAnswer without owner privileges, nothing happens and you get an error.) If the call handle is for a new call, COMDIAL saves it and also calls lineGetID (through a wrapper function, mylineGetWaveID) for both the wave/in and the wave/out devices. When COMDIAL gets a LINE_CALLSTATE message, it also updates its menus by calling mylineGetCallStatus, which in turns calls the TAPI function lineGetCallStatus to retrieve LINECALLSTATUS information:
dwCallFeatures is convenient for managing menus. It contains flags that specify which TAPI features are available for the call in its current state. COMDIAL checks that LINECALLFEATURE_ANSWER is set; if not, the Auto Answer menu item is disabled because the call cannot be answered in its current state. Recording Voice MessagesYou can record voice messages from the phone line the same way you record audio from other wave devices. Just open the wave/in device corresponding to the phone line by using waveInOpen with the device ID returned from lineGetID. You must provide a buffer for holding the recorded information and wait for the Wave system to fill it with audio data. The recordMessage function in my example program shows one way to do this. When recording is complete, Windows sends a MM_WIM_DATA message to COMDIAL's main window procedure. There, I save the recorded data from the supplied buffer, then free the buffer. COMDIAL unceremoniously disconnects the caller after exactly one minute by calling lineDrop (see Figure 10). In addition to the new voice functionality, Unimodem V has other new features that are not related directly to voice. These include support for call forwarding, logical phone devices, caller ID, G3 fax media, pass-through bearer modes, and flashhook in canonical addresses. Support for the G3 fax media mode is especially significant because, prior to the release of Unimodem V, TAPI apps could not process fax calls. Instead, fax calls were answered and processed through the Messaging API (MAPI). Now TAPI apps can directly make, answer, and process fax calls. The Operator AgentUnimodem V comes with a new program called the Operator (see Figure 11). Operator Agent resides in the Accessories folder. When you run it, a telephone icon appears on the Windows 95 system tray. The Operator Agent performs centralized call routing in heterogeneous telephony environments. In plain English, this means the Operator Agent will answer all incoming calls and route them to the appropriate application depending on the type of call: fax, voice mail, or data (terminal and file transfer apps). Figure 11 "Is this the party to who I am speaking?" The Operator Agent takes on one of the thorniest problems in computer telephony and does a reasonably good job of solving it. Current modem and fax communication protocols don't take into consideration the possibility of multiple devices sharing a phone line. These protocols assumed the line was used by either a fax machine, a modem, or a human. Before there was line sharing, each type of device was free to implement whatever protocol worked best for that device. Most modems and fax machines implemented a protocol whereby the answering party would send a tone indicating whether it was a modem or fax machine. Since the line was dedicated to either modem or fax, it was safe for each device to assume that only a modem or fax was calling on the line. Things aren't so simple in heterogeneous environments. It isn't practical for the answering party to send special tones when the caller might be a modem, fax, or a human. Instead, the caller should identify itself as a modem, fax, or human so the call can be routed to the appropriate application. There are several ways of implementing this; each has drawbacks. First, you can make assumptions about the caller. For example, when the call is first answered respond with a fax tone. If there is no response from a fax machine, try a data modem tone. If there is no response from a data modem, assume the caller is a human and take a message. The disadvantage is that data modems may hang up when they receive the fax tone, and human callers will often hang up when they hear fax and modem tones in their ears. Another option is to use the telephone ring pattern to determine the nature of the call before answering. For example, when a fax machine is calling, the phone could use a long-long ring pattern; when a modem calls, the phone could use a short-short ring pattern; and when a human calls, the phone could use a long-short pattern. This is an elegant solution because the computer can confidently answer the phone with the correct tone or with no tone if the caller is human. The problem is that modulating the ring pattern (known as distinctive ringing) requires a special phone service from the phone company, and most users currently do not have this service. Also, not all modems support distinctive ring detection. Finally, you can have the caller identify itself with a touch tone. The answering app must wait for a tone from the caller and respond in the appropriate manner. For example, a tone for digit 1 could indicate that the caller is a fax machine and the answering app could send out a fax tone. The problem with this approach is that there is no standard for what digit corresponds to what type of caller. None of these solutions is perfect, but the Operator Agent doesn't force you to use any one of them. Instead, it lets you choose whichever scheme works best in your environment. The Operator Agent may answer calls on behalf of all running telephony applications. It can then either play a greeting (prompting for touch tones to identify the caller as a fax, modem, or human) or route the call immediately according to a selected routing priority. The routing priorities are selected by clicking the Properties button and then the Call Routing Priorities button on the Properties dialog (see Figure 11). The Operator Agent does not get involved in calls identified by a distinctive ring pattern because the type of call is known by Unimodem before the call is answered and Unimodem can route the call to the appropriate app. ConclusionWith the release of Unimodem V, Windows takes a major step toward becoming an industrial-strength telephony platform. The voice extensions enable an entirely new category of apps: voice mail and answering machines. They also include solutions, albeit imperfect ones, for call routing in heterogeneous telephony environments through support for distinctive ringing and the Operator Agent. While support for heterogeneous telephony environments has come a long way, there is still room for improvement. Currently, there is no easy way for an app to toggle the media mode of an existing call. A caller might want to end a voice message and send a file to be included with the message. To support this functionality, you have to answer the call in voice mode, record the message, toggle the modem into data mode, receive the file, and attach the file to the voice message. Such functionality would be tricky to implement, especially the part that synchronizes the activities of modems on both ends of the connection as the media mode changes between voice and data. However, computers are good at making the tricky and complex appear simple and elegant. It will be interesting to see what the next release of the Unimodem driver offers. If you want to start coding with Unimodem V, or you just want to see the full specification, you can download it via the Internet at ftp.microsoft.com in the \developr\TAPI directory. From the August 1996 issue of Microsoft Systems Journal. |
from Hacker News https://ift.tt/mpRxEDl
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.