Tài liệu Module 6: Adding and Managing External Content doc

54 425 0
Tài liệu Module 6: Adding and Managing External Content doc

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Contents Overview 1 Components of a SharePoint Portal Server Search 2 Adding Content Sources 13 Managing Content Sources 28 Lab A: Adding External Content to a Workspace 42 Review 48 Module 6: Adding and Managing External Content Information in this document is subject to change without notice. The names of companies, products, people, characters, and/or data mentioned herein are fictitious and are in no way intended to represent any real individual, company, product, or event, unless otherwise noted. Complying with all applicable copyright laws is the responsibility of the user. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Microsoft Corporation. If, however, your only means of access is electronic, permission to print one copy is hereby granted. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.  2001 Microsoft Corporation. All rights reserved. Microsoft, Active Directory, Active X, FrontPage, JScript, MS-DOS, NetMeeting, Outlook, PowerPoint, SharePoint, Windows, Windows NT, Visio, Visual Basic, Visual SourceSafe, Visual Studio, and Win32 are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other countries. Other product and company names mentioned herein may be the trademarks of their respective owners. Module 6: Adding and Managing External Content iii Instructor Notes This module provides students with the information necessary to add and manage a Microsoft ® SharePoint ™ Portal Server content source. After completing this module, students will be able to:  Describe the components that are used in the searching and indexing features of SharePoint Portal Server.  Define content source and describe the types of content that are supported, how a content source is used, and how to add a content source.  Manage a content source by setting schedules, scope, and rules, and describe additional functions that apply to content sources. Materials and Preparation This section provides the materials and preparation tasks that you need to teach this module. Required Materials To teach this module, you need the Microsoft PowerPoint ® file 2095a_6.ppt. Preparation Tasks To prepare for this module, you should:  Read all of the materials for this module.  Complete the lab. Instructor Setup for a Lab This section provides setup instructions that are required to prepare the instructor computer or classroom configuration for a lab. Lab A: Adding External Content to a Workspace  To prepare for the lab • Classroom configured according to the setup guide for course 2059a. Presentation: 60 Minutes Lab: 30 Minutes iv Module 6: Adding and Managing External Content Module Strategy Use the following strategy to present this module:  Components of a SharePoint Portal Server Search Describe the five components of a SharePoint Portal Server search, which includes the Gatherer, IFilters, word breakers and noise words, plug-ins, and indexing databases. Describe the function of each of these components and then briefly explain how each component works.  Adding Content Sources Explain that SharePoint Portal Server provides access to content that is stored outside the workspace and that this content is referred to as a content source. Describe the basic features of content sources and then explain how to add various content sources to a Content Sources folder.  Managing Content Sources Explain that once a content source has been added, it must be managed to ensure that it used effectively during searches. Discuss how to manage a content source by configuring crawl settings, search scopes, index updates, rules, gatherer log files and discussion settings as well as other management functions. Customization Information This section identifies the lab setup requirements for a module and the configuration changes that occur on student computers during the labs. This information is provided to assist you in replicating or customizing Training and Certification courseware. The lab in this module is also dependent on the classroom configuration that is specified in the Customization Information section in the Classroom Setup Guide for Course 2095A, Implementing Microsoft ® SharePoint ™ Portal Server 2001. Lab Setup The following list describes the setup requirements for the lab in this module. Setup Requirement 1 The lab in this module requires no additional configuration. To prepare student computers to meet this requirement, perform the following actions:  Configure the instructor computer according to the classroom setup guide for course 2095a.  Configure the student computers according to the classroom setup guide of course 2095a. Lab Results There are no configuration changes on student computers that affect replication of customization. Importan t Module 6: Adding and Managing External Content 1 Overview  Components of a SharePoint Portal Server Search  Adding Content Sources  Managing Content Sources *****************************I LLEGAL FOR N ON -T RAINER U SE ***************************** Microsoft ® SharePoint ™ Portal Server 2001 stores content that is both internal and external to the workspace. A content source is used to specify a set of content that is stored outside the workspace. The Microsoft Search (MSSearch) service is a full-text indexing and search engine that is used to crawl, retrieve, create and update indexes for this content. This module discusses this process and examines the use of content sources for accessing content that is external to the SharePoint Portal Server computer. After completing this module, you will be able to:  Describe the components that are used in the searching and indexing features of SharePoint Portal Server.  Define content source and describe the types of content that are supported, how a content source is used, and how to add a content source.  Manage a content source by setting schedules, scope, and rules, and describe additional functions that apply to content sources. Topic Objective To provide an overview of the module topics and objectives. Lead-in In this module, you will learn about adding and managing content with SharePoint Portal Server. 2 Module 6: Adding and Managing External Content    Components of a SharePoint Portal Server Search  The Gatherer  IFilters  Word Breakers and Noise Words  Plug-Ins  Indexing Database *****************************I LLEGAL FOR N ON -T RAINER U SE ***************************** This topic provides an overview of the technology that is used in the searching and indexing features of SharePoint Portal Server. These components are used to create and manage content sources. Topic Objective To outline this topic. Lead-in In this topic, we will examine the components of MSSearch. Module 6: Adding and Managing External Content 3 The Gatherer Accessing Accessing Indexing Indexing Filtering Filtering Filter Daemon Process  Core Component of MSSearch  Manages How Content Is Accessed, Filtered, and Indexed  Includes Native and Registered Protocol Handlers *****************************I LLEGAL FOR N ON -T RAINER U SE ***************************** The Microsoft Gatherer performance object is the core component of MSSearch. As SharePoint Portal Server processes transactions on your system, it generates performance data that Windows 2000 can track and log. This data is described as a performance object and is typically named for the component generating the data. The Gatherer manages the way that content is accessed, filtered, and indexed. How the Gatherer Works The Gatherer runs inside MSSearch and interacts with a separate filter daemon process (mssdmn.exe) that performs data access and content filtering. The following steps describe how the Gatherer works: 1. The filter daemon uses protocol handlers and IFilters to extract data. These filters are data type–specific components that SharePoint Portal Server uses to communicate with and filter the documents in the content source. 2. The Gatherer runs the data through a series of plug-ins to process and filter the data. Plug-ins are used to interpret the data and properties as it is pulled from the documents in a content source. 3. The data passes through the plug-ins before the index is created and the document properties are saved to an index database (Microsoft Jet property store). A Jet property store is separate from the Microsoft Web Storage System used by SharePoint Portal Server. Topic Objective To explain the function of the Gatherer. Lead-in In this topic we will examine the Gatherer, a core component of SharePoint Portal Server MSSearch. Note 4 Module 6: Adding and Managing External Content Using Protocol Handlers to Access Data Store Content The Gatherer accesses documents in a data store by using the appropriate protocol by way of a protocol handler interface. The protocol handler, which has no relation to network protocol, is an interface between the index and SharePoint Portal Server. When the Gatherer processes a Uniform Resource Locater (URL) during indexing, the filter daemon determines which protocol handler to use based on the URL prefix, loads the associated dynamic link library (DLL), and passes the URL and security credentials to the protocol handler. Native Protocol Handlers SharePoint Portal Server includes native protocol handlers, or handlers that ship with the product, for Hypertext Transfer Protocol (HTTP), file, Microsoft Exchange 5.5, Microsoft Exchange 2000 Server, and Lotus Notes. Exchange 2000 and SharePoint Portal Server share the Web Storage System technology and the same protocol handler. This protocol handler accesses a local Web Storage System by using Microsoft OLE DB Provider for Exchange 2000 Server (EXOLEDB) and uses Web Distributed Authoring and Versioning (WebDAV) to access the Web Storage System on a remote Exchange or SharePoint Portal Server computer. Registered Protocol Handlers The following table lists the registered protocol handlers that are included with SharePoint Portal Server. Prefix DLL ProgID File Mssph.dll MSSearch.FileHandler.1 HTTP Mssph.dl MSSearch.HttpHandler.1 Exch Mssexph.dll MSSearch.MapiHandler.1 PKM Exstore Pkmexsph.dll PKM.ExstoreHandler.1 Notes Notesph.dll MSSearch.NotesHandler.1 Gatherer Project A search application can have one or more Microsoft Gatherer Projects performance object. Gatherer Projects are located inside a search application, such as SharePoint Portal Server. SharePoint Portal Server has one Gatherer Project for each internal or external workspace. These workspaces have their own settings, such as indexing schedules. The Search services uses Gatherer Projects to keep each workspace separate so it can have its own schedule. A SharePoint Portal Server workspace is a Gatherer Project with its own index. Each Gatherer Project contains its own set of build parameters, crawl restrictions, and plug-ins. Each Gatherer Project contains its own run-time transaction log containing all URLs to be crawled and maintains its own statistics. Module 6: Adding and Managing External Content 5 IFilters Office (offfilt.dll) Office (offfilt.dll) HTML (nlhtml.dll) HTML (nlhtml.dll) Text (query.dll) Text (query.dll) MIME (mimefilt.dll) MIME (mimefilt.dll) TIFF (mspfilt.dll) TIFF (mspfilt.dll) Null Filter (tquery.dll) Null Filter (tquery.dll)  Extract Content and Properties from Documents  Open Data Streams and Expose the Data as Indexable Chunks  SharePoint Portal Server Provides IFilters for: *****************************I LLEGAL FOR N ON -T RAINER U SE **************************** IFilters are the components of MSSearch that extract a document’s content and its properties. How IFilters Work During the filter daemon process, IFilters open data streams and expose the data so that it can be indexed. In particular, the Hypertext Markup Language (HTML) filter strips a document of all HTML tags and emits various HTML syntactic elements as properties, such as author or title, and also emits the body text. Each file type, indicated by its file extension, has an IFilter associated with it. SharePoint Portal Server provides IFilters for HTML, Microsoft Office, text, Multipurpose Internet Mail Extensions (MIME) and Tagged Image File Format (TIFF). You should convert documents created using Office applications to Office 95 or later. The office IFilter would not expose document properties of older Office documents. Topic Objective To explain the function of IFilters. Lead-in In this topic we will examine how filters extract content and properties from documents for indexing. Note 6 Module 6: Adding and Managing External Content IFilter DLLs The following table lists the IFilters that are included with SharePoint Portal Server. Prefix DLL Office offfilt.dll HTML nlhtml.dll Text query.dll MIME mimefilt.dll TIFF mspfilt.dll Null filter tquery.dll [...]... lists and shadow indexes now exists only in the master index Module 6: Adding and Managing External Content 13 Adding Content Sources Topic Objective To outline this topic Lead-in In this section, you will learn about the basic procedure for adding a content source Adding a Content Source Adding a Web Content Source Adding an Exchange 5.5 Content Source Adding an Exchange 2000 Content Source Adding. .. basic features of content sources and how to add them to your Content Sources folder 14 Module 6: Adding and Managing External Content Adding a Content Source Topic Objective To describe the function of a content source as well as how to add a content source Lead-in Index Management In this topic, you will learn how to prepare for adding a content source Content Sources ~~~ ~~~ ~~~ Content Users *****************************ILLEGAL... you crawl content or create an index 16 Module 6: Adding and Managing External Content Adding a Content Source to the Workspace To add a content source, you use the Content Source Wizard in the Content Sources folder under the Management folder Before you can add a content source to your workspace, you must have read access to the source, know where the content source files are stored, and know how... workspace index and is available for users to search for and view on the dashboard site Note For information about content access accounts, see Module 9, Managing SharePoint Portal Server,” in Course 2095A, Implementing Microsoft® SharePoint™ Portal Server 2001 Module 6: Adding and Managing External Content 17 Adding a Web Content Source Topic Objective To describe how to add a Web content source... database The address, a URL containing the host name and a path, that is required to locate the content Additional parameters that control how the index of the content is created Module 6: Adding and Managing External Content 15 Types of Content Sources When you add a content source to the Content Sources folder, you must provide an address or URL for that content The following table lists the types of... http://serverA/public/folderA and it is redirected to http://serverB/public, you must create an additional site path rule for http://serverB Module 6: Adding and Managing External Content 23 Adding a Lotus Notes Content Source Topic Objective To describe how to add a Lotus Notes content source Lead-in In this topic, we will explore how to add a Lotus Notes content source Special Planning and Configuration Special Planning and. .. site to follow all links, make sure that you are aware of the depth and size of the site You might use excessive bandwidth and not have enough disk space to crawl large sites Module 6: Adding and Managing External Content 19 Adding an Exchange 5.5 Content Source Topic Objective Required Required To describe how to add an Exchange 5.5 content source The Outlook 2000 client must be installed Lead-in The... Exchange 5.5 content source However, Site Server required MSSearch to run in the context of the Exchange Administrator account With SharePoint Portal Server, the service runs as the local system account and impersonates the Exchange account only when crawling and performing security validations on search results Module 6: Adding and Managing External Content 21 Adding an Exchange 2000 Content Source... stop and start MSSearch for the changes to be effective 28 Module 6: Adding and Managing External Content Managing Content Sources Topic Objective To outline this topic Lead-in In this topic, you will learn how to manage a content source to facilitate effective searching Configuring Crawl Settings Configuring Search Scopes Configuring Index Updates Configuring Rules Configuring Gatherer Log File and. . .Module 6: Adding and Managing External Content 7 Word Breakers and Noise Words Topic Objective To explain the function of word breakers and noise words Lead-in In this topic, we will examine how word breakers and noise words are used to facilitate indexing Word Breakers Loem Ipsum arnet Break words apart Remove punctuation and symbols Follow language-specific rules . index. Module 6: Adding and Managing External Content 13    Adding Content Sources  Adding a Content Source  Adding a Web Content Source  Adding. basic procedure for adding a content source. 14 Module 6: Adding and Managing External Content Adding a Content Source Content Management Content Sources

Ngày đăng: 10/12/2013, 16:15

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan