04 Search Techniques And Strategies
We go over several standard search techniques and strategies.
Class held on 09/15/2008. Student notes are available on this page. Possible questions are available on this page.
Class structure
- Go through “At beginning of class” info
- Lecture through the slides.
- Talk through the examples
- Go through “At end of lecture”
At beginning of class
- Today's office hours are cancelled; I have to take my wife to the hospital for a CT scan.
- Check who is doing class notes for today
- If you have questions, please post them to the class's discussion forum; I check it frequently — much more frequently than I check email on the weekends.
- Students should go over announcements made since the previous class
- Collect assignments due today
- Remind about assignment due in the next class
- Your first possible blog entry (on today's exercises) could be turned in next class (see the schedule for details on the timing of blog entries)
- Sentence added to the Search Tool Data Analysis assignment: Proper use of statistical tests certainly would strengthen your arguments.
- Industry updates:
- Search industry update: Lesley's industry-update-9-15-2008
- Brian's Google update
My notes
- Special search syntax — This is the tool that you have at your disposal that allows you to target your searches on specific parts of documents. Since different text in different parts means different things and perform different functions, you can use these operators to raise the precision of your queries.
- Full text search engines
- Title — intitle:
- Site — site:
- Top-level domain — site:
- URL contents — inurl:
- Links — link:
- Full text search engines
- Unique words and phrases — The use of multiple unique words and phrases are a key both to reducing the number of documents that are retrieved and raising the precision of your queries. Further, using multiple words and phrases increases the chances of retrieving content-filled documents (that is, increasing the number of “meaty” documents).
- They can be used to focus in on more specialized pages that would use those terms
- Gather related words using summaries
- Use search engines to find related words
- Example at Ask.com (both “Narrow your search” and “Expand your search”)
- Google
- Google Suggest feature
- “Related searches” at bottom of search results window
- Yahoo
- Yahoo Search Assist feature
- “Also try” at top or bottom of search results window
- Yahoo Directory (we'll cover this in a future class) can point in the right direction
- Use means queries
- Query specificity
- Narrow to more general: this is when you have a real good idea of what you're looking for.
- More general to narrow: this is when you don't know what you're looking for.
- Alternative naming
- People
- Using different name forms can return different information
- Sometimes you have to use other information to differentiate two identically named people
- Also, search specifiers can help target the information (intitle, site type, include, exclude)
- Places
- Use addresses (streets, zips, area codes, phone numbers)
- Use "official"
- People
In-class examples
Special search syntax
- Tigers
- Tigers but not Detroit Tigers.
- Information from an organization
- Information from an organization or a government
- Information from a zoo
Unique words and phrases
- Bunch of birds example
- Use "means" and "definition" queries: Hydrocephalus
- Ask — hydrocephalus — look at "Narrow your search" and "Expand your search"
- Yahoo directory — hydrocephalus
- Google — hydrocephalus — 2.34 million documents (2.26 in 2007); note the "Refine results" part of the page. Also note the “definition” link near the top of the page.
- Google — hydrocephalus means — 385,000 documents (789,000 in 2007)
- Google — 'hydrocephalus means' — 844 documents (415 in 2007)
- Google — intitle:hydrocephalus (intitle:means OR intitle:definition) — 470 documents (200 in 2007)
- Google — 'hydrocephalus means' (site:edu OR site:org OR site:gov) — 44 documents (131 in 2007).
- Related words: Investment guidance
- investment guidance — 487,000 documents (4.48 million in 2007)
- 'investment guidance' — 82,800 documents (71,700 in 2007)
- investment guidance financial goals stocks bonds portfolio — 235,000 documents (1.62 million in 2007)
- 'investment guidance' financial goals stocks bonds portfolio — 13,100 documents (10,900 in 2007)
- Fun with quotes
- 'statistical analysis' means — 26 million documents (21.5 million in 2007)
- 'statistical analysis' 'means' — 4.73 million documents (7.04 million in 2007)
- Lyrics
- Google — 'big rock stars' nickelback lyrics 'we all just' 'drugs come cheap' — 34 lyrics (6 results in 2007, and they were all good)
Query specificity
- Dog breed information
- Google — dog breed cavalier king charles spaniel — 355,000 documents (888,000 in 2007)
- Google — dog breed 'cavalier king charles spaniel' — 890,000 documents (535,000 in 2007)
- Google — dog breed intitle:'cavalier king charles spaniel' — 26,200 documents (15,400 in 2007)
- Yahoo Directory — dog breed 'cavalier king charles spaniel' — 69 documents (same as in 2007)
- Dog breed disease information
- Google — 'cavalier king charles spaniel' 'heart problem' OR 'heart murmur' OR 'mitral valve' — 7,710 documents (22,900 in 2007)
- Google — intitle:'cavalier king charles spaniel' 'heart problem' OR 'heart murmur' OR 'mitral valve' — 250 documents
- Yahoo — dog breed 'cavalier king charles spaniel' 'heart problem'= — no documents
Alternative naming
People
- George Washington information
- 'George Washington' biography -site:com -'Carver' — this returns 1.22 million documents (1.06 million in 2007).
- intitle:'George Washington' biography -site:com -'Carver' — 218,000 documents (240,000 in 2007)
- "George Washington" biography -Carver': — one whole category on George Washington, plus 76 other categories (74 in 2007) that are related to this query.
- Stephen Hawking (as a name example)
- Stephen Hawking — 3.61 million documents (2.27 million in 2007)
- 'Stephen Hawking' — 3.86 million documents (2.12 million in 2007)
- Note that this makes no sense when compared with the previous result. At least not given my understanding of how Google should operate.
- intitle:'Stephen Hawking' — 61,300 documents (63,100 in 2007)
- intitle:"Stephen * Hawking" — 9,310 documents (9,190 in 2007)
- intitle:"Stephen * Hawking" OR intitle:"Stephen Hawking" — 62,900 documents (75,200 in 2007)
- "Hawking, Stephen" — 535,000 documents (241,000 in 2007) — library and books, mostly
- "Hawking, Stephen W." — 72,100 documents (53,200 in 2007) — again, library and books, mostly.
- "Hawking, Stephen William" — 20,400 documents (13,900 in 2007) — lots of encyclopedia type entries.
- Levi Strauss (since there are two/three of them)
- "Levi Strauss" — 3.97 million documents (2.24 in 2007)
- "Levi Strauss" -french -france -philosopher — 2.21 million documents (2.06 in 2007)
- intitle:"Levi Strauss" — 78,100 documents (68,200 in 2007)
- intitle:"Levi Strauss" -french -france -philosopher — 66,300 documents (53,700 in 2007)
- intitle:"Levi Strauss" (french OR france OR philosopher) — 9,680 documents (15,900 in 2007)
- intitle:"Levi Strauss" bavaria germany — 241 documents (48 in 2007)
- intitle:"Levi Strauss" claude (french OR france OR philosopher) — 1,160 documents (556 in 2007)
Places
- Pizza places in Ann Arbor
- pizza "ann arbor" — 1.19 million documents (887,000 in 2007). This brings up the following sites right at the top:
- pizza "ann arbor" william — 547,000 documents (629,000 in 2007)
- (734) 669-6973
- pizza 734 — 1.81 million documents (1.36 million in 2007)
- The Sears Tower (as a landmark)
- "sears tower" — 1.44 million documents (1.49 million in 2007)
- "sears tower" official — 1.11 million documents (257,000 in 2007)
- intitle:"sears tower" — 28,800 documents (19,000 in 2007)
- intitle:"sears tower" official — 28,700 documents (1,440 in 2007)
At end of lecture
- Start working on today's exercises. The exercises are on this page. You should work on them for no more than another hour outside of class; we will have more time in the next class after the lecture to continue working on them before going on to that day's exercises.
- If you are late turning in today's assignment, you still should go through the effort of posting the information to the results page — the analysis assignment that you will be doing depends on having this information.
- If you are going to write a blog related to today's exercises, be sure to review the blogging guidelines before doing so.
page revision: 38, last edited: 16 Sep 2008 20:00