The due date for this assignment is 9/17. This looks like a ton of work but it's actually not that bad — I was just really obsessivecompulsive with the directions. (I hope they end up being clear enough for you; if not, I hope you'll let me know sooner rather than later.)
In this assignment you will determine where the top 5 (and then top 10) results of one search engine appear in the results of the other search engine. The idea is that we are trying to determine if it's more likely that a top result (compared to a lower result) in one search engine appears in another search engine. It's a reasonable conjecture but we want to see if it's actually the case.
 Web Search Engines
 Google, and Yahoo Web
 Blog Search Engines
 Google Blog Search, and Bloglines
1. Defining the question and queries
You will use the same questions and the same queries that you used in the previous assignment.
2. Gathering your data
You will use the same data and the same data printouts that you used in the previous assignment.
3. Analyzing your data
Report on the results in the following way:
Web search engines
 For the first 5 results in Google, if the result appears in the Yahoo results, then write G5 next to the Yahoo result.
 For the next 5 results in Google (that is, results 610), if the result appears in the Yahoo results, then write G10 next to the Yahoo result.
 For the next 10 results in Google (that is, results 1120), if the result appears in the Yahoo results, then write G20 next to the Yahoo result.
 For the first 5 results in Yahoo, if the result appears in the Google results, then write Y5 next to the Google result.
 For the next 5 results in Yahoo (that is, results 610), if the result appears in the Google results, then write Y10 next to the Google result.
 For the next 10 results in Yahoo (that is, results 1120), if the result appears in the Google results, then write Y20 next to the Google result.
 Pick up the Yahoo results list. You are going to determine values for the GY table.
 In the top 5 results of the Yahoo results,
 over(5,5): count the number of times G5 appears.
 over(10,5): count the number of times either G5 or G10 appears.
 over(20,5): count the number of times either G5 or G10 or G20 appears.
 In the top 10 results of the Yahoo results,
 over(5,10): count the number of times G5 appears.
 over(10,10): count the number of times either G5 or G10 appears.
 over(20,10): count the number of times either G5 or G10 or G20 appears.
 In the top 20 results of the Yahoo results,
 over(5,20): count the number of times G5 appears.
 over(10,20): count the number of times either G5 or G10 appears.
 over(20,20): count the number of times either G5 or G10 or G20 appears.
 In the top 5 results of the Yahoo results,
 Pick up the Google results list. You are going to determine values for the YG table.
 In the top 5 results of the Google results,
 over(5,5): count the number of times Y5 appears.
 over(10,5): count the number of times either Y5 or Y10 appears.
 over(20,5): count the number of times either Y5 or Y10 or Y20 appears.
 In the top 10 results of the Google results,
 over(5,10): count the number of times Y5 appears.
 over(10,10): count the number of times either Y5 or Y10 appears.
 over(20,10): count the number of times either Y5 or Y10 or Y20 appears.
 In the top 20 results of the Google results,
 over(5,20): count the number of times Y5 appears.
 over(10,20): count the number of times either Y5 or Y10 appears.
 over(20,20): count the number of times either Y5 or Y10 or Y20 appears.
 In the top 5 results of the Google results,
Blog search engines
For the blog search engines you'll go through the same steps.
 For the first 5 results in Google Blog Search, if the result appears in the Bloglines results, then write G5 next to the Bloglines result.
 For the next 5 results in Google Blog Search (that is, results 610), if the result appears in the Bloglines results, then write G10 next to the Bloglines result.
 For the next 10 results in Google Blog Search (that is, results 1120), if the result appears in the Bloglines results, then write G20 next to the Bloglines result.
 For the first 5 results in Bloglines, if the result appears in the Google Blog Search results, then write B5 next to the Google result.
 For the next 5 results in Bloglines (that is, results 610), if the result appears in the Google Blog Search results, then write B10 next to the Google result.
 For the next 10 results in Bloglines (that is, results 1120), if the result appears in the Google Blog Search results, then write B20 next to the Google result.
 Pick up the Bloglines results list. You are going to determine values for the GB table.
 In the top 5 results of the Bloglines results,
 over(5,5): count the number of times G5 appears.
 over(10,5): count the number of times either G5 or G10 appears.
 over(20,5): count the number of times either G5 or G10 or G20 appears.
 In the top 10 results of the Bloglines results,
 over(5,10): count the number of times G5 appears.
 over(10,10): count the number of times either G5 or G10 appears.
 over(20,10): count the number of times either G5 or G10 or G20 appears.
 In the top 20 results of the Bloglines results,
 over(5,20): count the number of times G5 appears.
 over(10,20): count the number of times either G5 or G10 appears.
 over(20,20): count the number of times either G5 or G10 or G20 appears.
 In the top 5 results of the Bloglines results,
 Pick up the Google Blog Search results list. You are going to determine values for the BG table.
 In the top 5 results of the Google Blog Search results,
 over(5,5): count the number of times B5 appears.
 over(10,5): count the number of times either B5 or B10 appears.
 over(20,5): count the number of times either B5 or B10 or B20 appears.
 In the top 10 results of the Google Blog Search results,
 over(5,10): count the number of times B5 appears.
 over(10,10): count the number of times either B5 or B10 appears.
 over(20,10): count the number of times either B5 or B10 or B20 appears.
 In the top 20 results of the Google Blog Search results,
 over(5,20): count the number of times B5 appears.
 over(10,20): count the number of times either B5 or B10 appears.
 over(20,20): count the number of times either B5 or B10 or B20 appears.
 In the top 5 results of the Google Blog Search results,
4. Summarize your data
Create two tables with the following structures:
This table provides a measure of how much of Google's responses are reproduced by Yahoo.

This table provides a measure of how much of Yahoo's responses are reproduced by Google.


This table provides a measure of how much of Blogline's responses are reproduced by Google Blog Search.

This table provides a measure of how much of Google Blog Search's responses are reproduced by Bloglines.

over(a,b) is the overlap between the top a results of the left search engine with the top b results of the top search engine. In this case, we're not going to do percentages — the overlap values should simply be the count. It should be the case that the diagonal terms in the top two tables are the same; the diagonal terms in the bottom two tables should be the same as well. Further, the over(20,20) term should be the same as the overlap figure that you calculated in the previous assignment.
5. What you need to do
You need to add your data to this page. You also need to turn the following couple pages in to me:
 A title page with the following:
 Search Tool Overlap Data
 BIT330: Fall 2008
 The date
 Your name
 Your uniqname
 One page containing the following information. Keep a copy of this for your records as you will need them for your analysis writeup.
 The text description of the query that you submitted to the Web search engines.
 The search queries that you submitted to each of the three Web search engines.
 The text description of the query that you submitted to the blog search engines.
 The search queries that you submitted to each of the three blog search engines.
 An appendix to this report containing the three sets of your original (and fullymarkedup) query results. This is the key piece of this assignment that you must turn in so that you can get credit for this assignment.
 It should be stapled in the upper left corner. I do not want any clips or folded papers or report covers. I will bring a stapler to class that day.