There are a number of search engines listed on the TCS web page. Some are of a specialized nature, searching for businesses, people, etc, but most are of a general nature, and it is those that we will focus on here.
There are really two categories of tools that people can use to search for information on the Internet: Web Directories and true Search Engines. Yahoo (http://www.yahoo.com) is the best known Web Directory; it is a subject-tree style catalog that organizes web pages into 14 major topics, each with subtopics, and each of those having additional subtopics, as one moves from the more general to the more specific. When a person wants his web page listed by Yahoo, he can look at this "outline-structure" of the world and pick the two places in that structure where he feels his page most "belongs", and he provides a limited set of search terms that he feels people searching for his page would be likely to choose. Thus we have human intelligence selecting the search terms and positioning in the catalog. A person searching with Yahoo will probably not have to wade through a number of pages that really have nothing to do with what he is looking for, but he will be less likely to find pages that just touch on his subject. If there are a lot of pages that specialize in the topic he is interested in, that is fine, but if there are no pages that specialize in the topic, then the Yahoo searcher may be left with "no pages found", when there are pages out there that touch on the subject.
A true search engine uses software programs known as robots, spiders, worms, or crawlers, that are either asked to search a page by its author, or they may find the page as they follow hyperlinks from one document to another around the web. Once a search engine finds a page it reads all of the information on the page, and selects those words that it feels are important, and uses them to construct its index. Some search engines ignore commonly occurring stop words such as "a", "an", "the", "is", "and", etc. (I would not suggest using one of them to search for the phrase "To be or not to be") while others may include every word, including the stop words. Words that are mentioned toward the top of a document, and words that are repeated several times throughout the document are more likely to be deemed important, as are words that are used in titles, headings, subheadings, etc. Some web page authors try to take advantage of that fact; I have run across adult-oriented pages that may have a page with the word sex over and over and over again, because the author is hoping that will cause the search engines to make their page appear first when someone does a search on that term. I would not have thought that many people would be using the search engines to try to locate adult-oriented pages, when there are pages like www.persiankitty.com that seem to list so many of them, but I once ran across a page that listed the words most often used as search terms by search engines, and I was surprised to find that more than half of the words that appeard in the list of the most frequently used search terms, appeared to be looking for naughty pages. I hope those people found what they were searching for; I know that I have found the search engines very helpful searching for the things I have looked for (mostly things that would not have upset self-appointed web censors like State Rep Fred Perry and Senator Exon).
Most search engines basically use keyword indexing, where if the exact word, or perhaps a varient like the plural, or the past tense, is used on a page you will find it, but if you did your search on heart, and the page used the term cardiac, you would not find it. This means that you need to think about other words that might be used to describe what you are looking for. Some search engines attempt to use concept-based indexing, and if they found heart used in the same pages as words like coronary, artery, lung, stroke, cholestrol, pump, blood, attack, and arteriosclerosis would associate heart as being in the medical/health context, and would be likely to pull pages that referenced one of the related concept terms, while a page that used the word heart along with words like flowers, candy, love, passion, and valentine, would associate it with the context of romance, and would be likely to pull terms associated with that context. Concept-based indexing is a good idea, but it frequently falls down in practice, since the concept associations are made by a computer program, and not by a human being, and with those search engines it is even more important that one include several terms so that the search engine will know which subject one is looking for.
For more information on the Tulsa Computer Society click here