CPS 49S Freshman Seminar
Google: The Computer Science Within and its Impact on Society

Course information
Outline of topics
Schedule and notes
Readings
Assignments
Resources

  1. The Internet and World Wide Web Today
    • Basics of Internet architecture, Web sites, and Web pages.
    • Main uses and users of the Web. Different types of online
      data sources available.

  2. Web Search: The Gateway to Instant Knowledge
    • Different ways to search the Web: simple keyword-based interfaces, advanced interfaces, queries in a natural language, directories, catalogs, meta-search.
    • Web search before and after Google.
    • The role and impact of Web search on e-commerce, media, dissemination of scientific knowledge, health, dating, travel, job hunting, civil liberties, and just about any other sphere of human interest.

  3. Google's Technology
    • Crawling Web pages.
    • Google's cache of Web pages.
    • Indexing billions of Web pages and documents for efficient access.
    • Ranking search results.
    • Google services: Media (news, images, video), Desktop, Earth, and Maps.
    • Service-oriented computing, the new paradigm spearheaded by Google as an alternative to the desktop-computing paradigm that is dominant today.
    • Managing structured, semistructured, and unstructured data sources.
    • Relevant advertisements and similar pages.
    • Massively parallel system of more than 150,000 servers (Google as a supercomputer).
    • Limitations of Google's technology.

  4. The Computer Science behind Google
    • Modeling: The Web as a graph. Web pages as word vectors. Web pages as semistructured data.
    • Data structures, organization, and storage: Web indexes based on Inverted Lists. Library classification and search systems (e.g., Dewey decimal system).
    • Algorithms for ranking: Information-retrieval-based techniques for ranking text documents (e.g., term frequency, inverse document frequency). Exploiting the information in Web links for ranking. Google's PageRank algorithm. Recursive computation of PageRank.
    • Distributed Systems: Notions of clustering, scalability, availability, resilience to failures, and parallel processing in a cluster of computers.
    • Personalized search: The role of machine-learning and data mining in Web search.

  5. Impact of Search Technology on the Economy
    • Behavior of Web searchers: Who is searching, what are they searching for, why are they searching?
    • Search as a new sales channel: Shopping as an application of search. The small head and long tail of the Internet: few big and many small online stores. The role and impact of search in businesses. How Google has made and broken businesses.
    • The evolution of advertising and marketing. Web search has much lower customer acquisition costs compared to banner advertisements, email campaigns, catalogs and direct-mail marketing, and television.
    • The problems looming: Search-engine spam, click fraud, aggressive affiliate networks.

  6. Privacy in the Google Era
    • The massive evolving store of personally-identifiable information in Google's search logs. Google's search logs aggregate the current thoughts and intentions of our society.
    • Drawing the line: Gmail's placement of targeted advertisements alongside emails. Privacy implications of Google's desktop search. The privacy and personalization tradeoff. Corporate privacy policies and their enforcement. Google' corporate motto of "Don't Be Evil".
    • The Government's ability and responsibility: PATRIOT Act, Electronic Privacy Information Center (EPIC), Google's dealings with the U.S. Department of Justice over access to search data.

  7. Regulating User Access to Information: Editing and Censorship
    • Controlling the contents of Google's search index. The "Google profile" of any person or organization.
    • Relevant search results versus paid listings.
    • Ramifications of Google's cache.
    • The role of the Government: Google's history in China.

 

Shivnath Babu 2007.