Research Technical WhitePaper

Note  This is a draft version of the documentation for the Microsoft Office Research Service SDK and may contain inaccuracies. The complete, detailed version of the documentation for the Microsoft Office Research Service SDK will be available in late 2003.

Image of office logo

Abstract:    The Research and Reference feature in Office 2003 applications provides rich, integrated search functionality. Out of the box, Office 2003 includes a number of Microsoft and third party services. Research and Reference is also a platform for organizations to build their own research and reference services and for third party research providers to build subscription services. This paper describes the technology in more detail and discusses how administrators can ensure that their organizations get its maximum value.

Published: April 2003

Research and Reference in Microsoft Office 2003 Technical White Paper

Key Concepts and Definitions

Service/Source -    A service or source is a collection of research data displayed in the Research and Reference task pane. In this document, we will use the term 'service.

Provider -    A provider is a source of research and reference content typically accessed via an external URL or internal server address. A provider may offer one or more research sources.

Query -    Query is the general term for a Research and Reference search.

XML -    eXtensible Markup Language is a metadata definition language used to describe data in a structured open format.

SOAP -    Simple Object Access Protocol is an XML/HTTP-based protocol for accessing services, objects and servers in a platform-independent manner.

Research and Reference Overview

The Research and Reference feature in Microsoft® Office 2003 lets information workers quickly locate and use the information they need without leaving the application in which they are working. Research and Reference expands upon the search implementation in Microsoft Office XP, which provides integrated Windows® Explorer-based search functionality from within Office XP applications. It is powerful and broad enough to provide information workers with a search experience that they will choose over web-based research and reference sites.

Before Office 2003, information workers had to do similar research and quick reference tasks by switching repeatedly between applications. For research, information workers could open a browser, retrieve relevant information, and return to an Office application to incorporate that information. For quick reference tasks, such as looking up a definition, information workers could, again, use a browser-based dictionary service, or they could use Microsoft's Bookshelf®. The first design priority for Research and Reference within Office 2003 was to dramatically streamline this process by combining and improving existing tools and by providing:

  1. Aggregation.    Information workers can search multiple services at the same time. Results are gathered from multiple sources of information, which can be extended to add many more sources
  2. A Modeless, Integrated User Interface.    The interface is tightly coupled with all Office 2003 applications, allowing the tool to gather context from the application, insert content into it, and so on. For example, information workers can click a word or phrase while holding down the ALT key to automatically run a search on that word or phrase. After viewing the results in the task pane, which is displayed side by side with the document within the Office 2003 application window, the information worker can interact with search results, which may be formatted in any way the provider specifies or tagged for action with smart tags.

The Office 2003 product includes numerous sources of Research and Reference "right out of the box", including Dictionary, Thesaurus, MSN Search, and Encarta® Encyclopedia, and a number of third party services. Research and Reference is also a platform for organizations to build their own research and reference services and for third party research providers to build subscription services.

This paper describes the Research and Reference feature in more detail, including its architecture and infrastructure requirements. The paper goes on to describe how organizations and third parties can create custom services, and finishes with a discussion of the options for configuring clients to use both built-in and custom services.

Using Research and Reference Service

All Office 2003 applications provide the same Research and Reference task pane. Figure 1 shows the task pane in Microsoft Office Word 2003, with a few results from a key-word search. This search was accomplished by right clicking the word "remedy" within the document and clicking Look Up, one of several methods for initiating a search.

Notice in the figure that the Thesaurus results provide smart tags that allow the information worker to copy a word, insert it into the document (replacing the current selection and taking on the proper formatting), or do a look up of that word. Developers can integrate smart tag functionality into their research and reference services in a variety of ways. Given the dramatic improvements in smart tag technology in Office 2003, this integration is a powerful aspect of Research and Reference.

When using Research and Reference, one specifies via a pull-down menu which services to search. In this example, All Reference Books was selected. Selecting All Research Sites returns results from all installed internally and externally hosted research services. Again, the results are well organized (with collapsible sections) and, as mentioned above, can have smart tag intelligence built-in.

Figure 2 shows a simple query for the term "Blood Pressure", with results from eLibrary, MSN® Search, Factiva News Search, and Encarta Encyclopedia.

Image of screen Research

Figure 1: Task pane

Image od screen NewProducts

Figure 2: Simple research search

Research and Reference Architecture

Research and Reference uses XML for all communications and for the display and manipulation of search results. The layout of results is very flexible, because developers can use XML and smart tags to provide rich formatting, collapsible lists, intelligent content-based actions, and so on.

Since Research and Reference is built into Office 2003 it works right "out of the box" with no specific customization required. From a network perspective, all communications are done over HTTP (via XML, or, more specifically, SOAP), so there is no special firewall configuration required. Research and Reference services can be hosted either internally or externally.

The following sections describe the architecture from the perspective of IT professionals concerned with how Research and Reference functions within their existing IT infrastructure:

  1. Client/Provider Communications - describes the simple process by which clients add services.
  2. Client/Service Communications - explores the salient points of Research and Reference's XML-based client query and provider service response.
  3. Developer Options for Results Layout - explores key aspects of the platform that allow developers to customize the XML results data stream to appear in the task pane in the most usable way.

Client/Provider Communications

Research and Reference services are made available through a provider, which can host multiple services. Office 2003 applications connect to a provider via its URL and receive from the provider a list of available services. By default, all Office 2003 clients are configured to check Microsoft's provider (http://office.microsoft.com/research/query.asmx) for new Microsoft services and for third party services that Microsoft lists. Organizations can also create their own providers, exposing whatever services they wish.

All client/provider communications, as well as client/service communications, take place over HTTP. Hence, as far as clients are concerned, it makes no difference whether the provider or service is located within the firewall or on the Internet (see figure 3).

There are three basic scenarios where a research service can be used: in an intranet, through the Internet, or on a client machine (running a service locally on a machine has limitations that are discussed in the SDK),. Information workers positioned behind a corporate firewall can access services on the Internet directly through a client application, such as Word 2003 or Microsoft Office Excel 2003, or can access research services indirectly through a server within their corporate intranet.

Image of client/provider locations

Figure 3: Possible client/provider locations

Service Installation

The basic sequence of events for service installation is as follows:

  1. A client connects to a Provider by URL (methods of getting the URL to clients are described below).
  2. The Provider sends the client a list of available services.
  3. The information worker chooses from the list of available services.
  4. The Provider installs the service to the client by writing per-user registry entries that point to the service (keys are written to HKEY_CURRENT_USER\Software\Microsoft\Office\11.0\Common\Research\Sources\<servicename> and consist of entries such as those shown in figure 4 (the MSN Search Service).
Image of registry entries

Figure 4: Registry entries for a service

For smart tag integration, the service provider incorporates a separate setup process from within a search result in the Research task pane.

Note that IT professionals can and should incorporate Research and Reference service installation into their deployment strategy for the company. See the "Service Deployment" section below for more details.

Client/Service Communications

Once a service is registered, information workers can initiate searches for that service. During a search, Office 2003 sends query packets to the service, which sends a response packet containing search results. All communications take place with formatted XML packets, and each segment of the communication adheres to a schema. Figure 5 shows the order of the XML schema packets that pass between client and service:

Image of client server communications

Figure 5: client/service communications

When the Office 2003 application receives a response from a service with the results of the search, it displays the results in the Research and Reference task pane.

Developer Options for Results Layout

Research service providers are able to specify custom actions for the content they return. These actions are presented to information workers using the same mechanism used to expose the built-in content actions such as inserting and copying.

The actions themselves are carried out via smart tags provided by the service provider. This way the Research and Reference framework does not need to maintain information about smart tags. For example, a research service may return a response containing a smart tag that gives information workers the ability to grab additional live data, transform the response text, or some other action. An Insert action can also place content into Word 2003 and Excel 2003 documents as XML, and then, for example, additional intra-document actions may become available.

Built-In Research and Reference Services

Office 2003 includes a rich offering of research and reference services out of the box. There are a number of Microsoft services as well as third party services provided by partners. Figure 6 shows the Research Options pane, which displays installed services and allows information workers to activate and deactivate services.

Image of installed research and references

Figure 6: Installed research and reference services

The following services are installed by default:

The Thesaurus and Translation services, listed above in the Reference Books section, are locally installed, which means that offline searches will yield results. All other default services are not locally installed, so that offline searches will not yield results.

Custom-Built Research and Reference Services

In addition to providing better, broader, and more integrated searching via the built-in services described above, Research and Reference is also a platform for organizations to build their own research and reference services and for third parties to build subscription services.

For example, a pharmaceutical company with a huge internal database (or multiple databases) containing information on their products (R&D information, insurance information, sales information, and so on) could create a service that makes all this information available to designated information workers in a powerful way while they work within Office 2003 applications. The service would consist of a SOAP function named "Query" that handles client queries, retrieves the information from the database, and returns the information to the client. The Research and Reference Solution Developers Kit provides detailed information on building a custom service.

Continuing the example from figure 2 above, our sample company (Contoso Pharmaceuticals) has created a custom-built research service with information on its products. An information worker can add the service from the Research Options pane by clicking Add Services and typing in the URL, as shown in figure 7. The company could also use one of the deployment methods described in the "Service Deployment" section below.

Image of adding a service manually

Figure 7: Adding a service manually

After adding the new service, the information worker's search for "Blood Pressure" yields the results shown in figure 8. Key corporate data is now available where the information worker needs it.

Service Deployment

All services are defined by registry entries (figure 4 shows the registry entries for the MSN Search Service), and service deployment consists in getting those registry entries onto the client machine. There are a few ways to do this:

  1. By having information workers manually type in the URL (as in the example above). No administrator action is required.
  2. By including registry entries in the desktop image as part of a desktop rollout.
  3. By registering the service with the Microsoft Office Marketplace. Users will have easy access to a directory of available services through a link in the Research task pane.
  4. By hosting the service on an internal discovery server. Organizations can provide the specific URL to clients, or they can configure an Arbitrary Discovery Server, which will prompt information workers when a new service is available. Up to five Arbitrary Discovery Server pointers can be configured on a client machine. Imaage of results from the customer service

    Figure 8: Results from the custom service

    With these pointers in place, Office 2003 will frequently check whether new services are available and, based on how the administrator has configured clients, either notify the user or automatically install the new services. See "Controlling Service Installation Options" for more details.

If information workers are to have full control over which services they want to install, administrators do not need to do anything more.

Controlling Service Installation Options

Optionally, administrators can control which services are installed by default, whether information workers will be able to add services manually, and whether they will automatically connect to an internal Arbitrary Discovery Server.

First, to specify which services are installed by default, the administrator can create a set of keys under HKEY_LOCAL_MACHINE that define the services. By default, no keys exist at HKEY_LOCAL_MACHINE\Software\Microsoft\Office\11.0\Common\Research\Sources\. Creating a list of keys, each representing a service (as shown in figure 9), will cause Office to bypass its normal procedure of initially installing services. Specifically, instead of looking to Microsoft's discovery server and installing default services, the Office client will find these keys written to HKEY_LOCAL_MACHINE and copy these to HKEY_CURRENT_USER, with the result that the services specified by the administrator are now installed for the user. This process is called "propagation."

Image of HKEY_LOCAL_MACHINE keys

Figure 9: HKEY_LOCAL_MACHINE keys will propagate to users

There are a few additional registry entries that allow administrators to further control user options. The three available keys (which do not exist by default) are:

Figure 10 shows the new keys, with NoAdd turned on and NoDiscovery turned off.

Image of New Keys

Figure 10: New keys

Administrators may use the Office Resource Kit to deploy these registry settings, or they may manually install them or create a batch file.

Security

There are no special security considerations for Research and Reference usage, per se. Since a registered service may only transmit data that adheres to the Research and Reference schema, there is no danger of malicious content within the XML stream that is displayed in the task pane. However, a response may contain a link to an installation program for integrated smart tag functionality - as with any code, it is highly recommended that only signed code be allowed.

Some services may require authentication, for which Research and Reference provides the following models:

As mentioned in the Architecture section, communications take place over HTTP, but developers may also use HTTPS for secure connections. Developers can place all communications over HTTPS or they can specify that some specific action uses HTTPS (for example, submitting a form).

Conclusion

Research and Reference provides a powerful, integrated, and extensible solution for gathering information. As more and more companies build their own services and more and more third parties build subscription services, Research and Reference will become even more powerful.

Frequently Asked Questions

What is the difference between Smart Documents and the Research and Reference features of Office 2003?

Smart documents are another feature of Office 2003. Both features use XML to make data available to information workers. One key functional difference is that smart documents are more suited to manipulating data (both in retrieving data and in saving data back to a database or other location), while Research and Reference is suited more to gathering information. Another key difference is that smart document solutions are attached to a specific document, while Research and Reference is independent of any specific document.

What if I don't have an internet connection?

Of the built-in services, the Thesaurus and Translation services store their information on the client so that, even without an internet connection, they will work. Other built-in services will not yield results offline. Depending on how they are developed, third party services may also store data locally and work without an internet connection.

Can I restrict default services when I deploy Office 2003 throughout my organizations?

Yes. Administrators can prevent users from adding services, and they can configure Office's server side sources (including third party ones).

Find Additional Information

For more information, see the following web sites

Office http://www.microsoft.com/office

©2003-2004 Microsoft Corporation. All rights reserved. Permission to copy, display and distribute this document is available at: http://msdn.microsoft.com/library/en-us/odcXMLRef/html/odcXMLRefLegalNotice.asp