Search Tool Overlap Data

The due date for this assignment is 9/17. This looks like a ton of work but it's actually not that bad — I was just really obsessive-compulsive with the directions. (I hope they end up being clear enough for you; if not, I hope you'll let me know sooner rather than later.)

In this assignment you will determine where the top 5 (and then top 10) results of one search engine appear in the results of the other search engine. The idea is that we are trying to determine if it's more likely that a top result (compared to a lower result) in one search engine appears in another search engine. It's a reasonable conjecture but we want to see if it's actually the case.

Web Search Engines
Blog Search Engines

### 1. Defining the question and queries

You will use the same questions and the same queries that you used in the previous assignment.

You will use the same data and the same data print-outs that you used in the previous assignment.

Report on the results in the following way:

#### Web search engines

1. For the first 5 results in Google, if the result appears in the Yahoo results, then write G5 next to the Yahoo result.
2. For the next 5 results in Google (that is, results 6-10), if the result appears in the Yahoo results, then write G10 next to the Yahoo result.
3. For the next 10 results in Google (that is, results 11-20), if the result appears in the Yahoo results, then write G20 next to the Yahoo result.
4. For the first 5 results in Yahoo, if the result appears in the Google results, then write Y5 next to the Google result.
5. For the next 5 results in Yahoo (that is, results 6-10), if the result appears in the Google results, then write Y10 next to the Google result.
6. For the next 10 results in Yahoo (that is, results 11-20), if the result appears in the Google results, then write Y20 next to the Google result.
7. Pick up the Yahoo results list. You are going to determine values for the GY table.
• In the top 5 results of the Yahoo results,
• over(5,5): count the number of times G5 appears.
• over(10,5): count the number of times either G5 or G10 appears.
• over(20,5): count the number of times either G5 or G10 or G20 appears.
• In the top 10 results of the Yahoo results,
• over(5,10): count the number of times G5 appears.
• over(10,10): count the number of times either G5 or G10 appears.
• over(20,10): count the number of times either G5 or G10 or G20 appears.
• In the top 20 results of the Yahoo results,
• over(5,20): count the number of times G5 appears.
• over(10,20): count the number of times either G5 or G10 appears.
• over(20,20): count the number of times either G5 or G10 or G20 appears.
8. Pick up the Google results list. You are going to determine values for the YG table.
• In the top 5 results of the Google results,
• over(5,5): count the number of times Y5 appears.
• over(10,5): count the number of times either Y5 or Y10 appears.
• over(20,5): count the number of times either Y5 or Y10 or Y20 appears.
• In the top 10 results of the Google results,
• over(5,10): count the number of times Y5 appears.
• over(10,10): count the number of times either Y5 or Y10 appears.
• over(20,10): count the number of times either Y5 or Y10 or Y20 appears.
• In the top 20 results of the Google results,
• over(5,20): count the number of times Y5 appears.
• over(10,20): count the number of times either Y5 or Y10 appears.
• over(20,20): count the number of times either Y5 or Y10 or Y20 appears.

#### Blog search engines

For the blog search engines you'll go through the same steps.

1. For the first 5 results in Google Blog Search, if the result appears in the Bloglines results, then write G5 next to the Bloglines result.
2. For the next 5 results in Google Blog Search (that is, results 6-10), if the result appears in the Bloglines results, then write G10 next to the Bloglines result.
3. For the next 10 results in Google Blog Search (that is, results 11-20), if the result appears in the Bloglines results, then write G20 next to the Bloglines result.
4. For the first 5 results in Bloglines, if the result appears in the Google Blog Search results, then write B5 next to the Google result.
5. For the next 5 results in Bloglines (that is, results 6-10), if the result appears in the Google Blog Search results, then write B10 next to the Google result.
6. For the next 10 results in Bloglines (that is, results 11-20), if the result appears in the Google Blog Search results, then write B20 next to the Google result.
7. Pick up the Bloglines results list. You are going to determine values for the GB table.
• In the top 5 results of the Bloglines results,
• over(5,5): count the number of times G5 appears.
• over(10,5): count the number of times either G5 or G10 appears.
• over(20,5): count the number of times either G5 or G10 or G20 appears.
• In the top 10 results of the Bloglines results,
• over(5,10): count the number of times G5 appears.
• over(10,10): count the number of times either G5 or G10 appears.
• over(20,10): count the number of times either G5 or G10 or G20 appears.
• In the top 20 results of the Bloglines results,
• over(5,20): count the number of times G5 appears.
• over(10,20): count the number of times either G5 or G10 appears.
• over(20,20): count the number of times either G5 or G10 or G20 appears.
8. Pick up the Google Blog Search results list. You are going to determine values for the BG table.
• In the top 5 results of the Google Blog Search results,
• over(5,5): count the number of times B5 appears.
• over(10,5): count the number of times either B5 or B10 appears.
• over(20,5): count the number of times either B5 or B10 or B20 appears.
• In the top 10 results of the Google Blog Search results,
• over(5,10): count the number of times B5 appears.
• over(10,10): count the number of times either B5 or B10 appears.
• over(20,10): count the number of times either B5 or B10 or B20 appears.
• In the top 20 results of the Google Blog Search results,
• over(5,20): count the number of times B5 appears.
• over(10,20): count the number of times either B5 or B10 appears.
• over(20,20): count the number of times either B5 or B10 or B20 appears.

Create two tables with the following structures:

This table provides a measure of how much of Google's responses are reproduced by Yahoo.
Yahoo Google 5 10 GY 5 over(5,5) over(5,10) over(5,20) 10 over(10,5) over(10,10) over(10,20) 20 over(20,5) over(20,10) over(20,20)
This table provides a measure of how much of Yahoo's responses are reproduced by Google.
Google Yahoo 5 10 YG 5 over(5,5) over(5,10) over(5,20) 10 over(10,5) over(10,10) over(10,20) 20 over(20,5) over(20,10) over(20,20)
This table provides a measure of how much of Blogline's responses are reproduced by Google Blog Search.
Google Bloglines 5 10 BG 5 over(5,5) over(5,10) over(5,20) 10 over(10,5) over(10,10) over(10,20) 20 over(20,5) over(20,10) over(20,20)
This table provides a measure of how much of Google Blog Search's responses are reproduced by Bloglines.
Bloglines GBlog 5 10 GB 5 over(5,5) over(5,10) over(5,20) 10 over(10,5) over(10,10) over(10,20) 20 over(20,5) over(20,10) over(20,20)

over(a,b) is the overlap between the top a results of the left search engine with the top b results of the top search engine. In this case, we're not going to do percentages — the overlap values should simply be the count. It should be the case that the diagonal terms in the top two tables are the same; the diagonal terms in the bottom two tables should be the same as well. Further, the over(20,20) term should be the same as the overlap figure that you calculated in the previous assignment.

### 5. What you need to do

• A title page with the following:
• Search Tool Overlap Data
• BIT330: Fall 2008
• The date