\documentclass[12pt]{article}
\usepackage{html}
\latex{\usepackage{fullpage}}
\usepackage{/u/mlittman/papers/tex/newalg}
\usepackage{epsfig}
\newcommand{\ul}{\begin{itemize}}
\newcommand{\li}{\item}
\newcommand{\el}{\end{itemize}}
\renewcommand{\ni}{\noindent}
\newcommand{\href}[2]{\htmladdnormallink{#2}{#1}}
\begin{document}
\title{Ordered Trees (8)}
\author{Michael L. Littman}
\date{September 25th, 1997}
\maketitle
\section{ANNOUNCEMENTS}
\ni First exam is a week from Tuesday.
\ni We'll have a review session next Thursday on recurrences and
stuff. Bring questions. If there aren't questions, I'll talk about
B-trees.
\section{DYNAMIC PROBLEMS}
\subsection{Dynamic Problems}
\ni The applications we've talked about so far have the basic form:
input, output, thank you computer.
\ni Many applications have a different flavor: computer maintains some
data in a data structure, various operations are performed on the
data:
\ul
\li adding new information,
\li removing outdated information,
\li querying the current set of data.
\el
\ni Examples: elevator simulation, customer lists.
\subsection{Elevator Simulation}
\ni We are trying to design an algorithm to control a bank of 10
elevators in a 100-floor office building. At any moment in time, each
elevator is on some floor (or between floors), and contains some
number of passengers with some set of buttons pushed.
\ni The elevators need to make a decision each time a button is
pressed and/or an elevator stops and opens its doors.
\ni The simulator needs to keep track of all this and correctly
simulate all events in their proper order.
\ni This is a kind of {\em discrete event simulator\/}.
\subsection{Dynamic Operations}
\ni Some operations our elevator-simulator data structure needs to
support:
\ul
\li schedule an event for a time in the future,
\li find next event,
\li remove current event,
\li merge two simulations together.
\el
\ni And these will need to happen fast because there will be a lot of
events (button pushes and elevator arrivals) over the course of the
simulation.
\subsection{Customer Lists}
\ni Earlier, we discussed sorting as an important operation in
telephone-book publishing. This application is kind of special in
that the telephone book is only published once or twice a year, so all
the work can be done right before the publishing date.
\ni Consider, on the other hand, the database kept by
\href{http://www.americanexpress.com/cards/}{American Express} for
tracking ``cardmembers.'' This needs to be much more dynamic.
\ni Each customer has a record: name, customer id number, address,
perhaps some billing information (!).
\subsection{Dynamic Operations}
\ni We need to be able to quickly:
\ul
\li add new customers to the database,
\li look up a customer's record by name,
\li delete a customer,
\li list all customers sorted by name,
\li list all customers in a range of names,
\li add a group of customers,
\li split an entire group of infinitives (oops).
\el
\subsection{Ordered Trees}
\ni It turns out that binary trees are a wonderful way to flexibly
store dynamic information.
\ni Here's a tree $T$. Some terminology:
\ul
\li root: top of the tree
\li node: some place in the tree
\li key: the value stored in a node
\li left child, right child: the nodes immediately beneath a node
\li parent: the node immediately above a node
\li leaf: childless node
\li depth: distance from node to root (depth of a tree is the depth of
the deepest leaf)
\li height: distance to furthest leaf (height of a tree is the height
of the root)
\el
\ni What do we know about the height and depth of a tree?
\subsection{Representing Trees}
\ni There are a few ways to represent a tree $T$ with $n$ nodes.
\ni One particularly flexible way is in terms of three arrays. For $1
\le i \le n$:
\ul
\li $T_v[i]$: The key for node $i$.
\li $T_p[i]$: The index of the parent of node $i$ (0 for root).
\li $T_l[i]$: The index of the left child of node $i$ (0 if there is none).
\li $T_r[i]$: The index of the right child of node $i$ (0 if there is none).
\el
\ni How can we test if node $i$ is a leaf?
\section{HEAPS}
\subsection{Priority Queue}
\ni Rule \#3 of Good Algorithm Design: Keep complex information
organized in a good data structure.
\ni A data structure that supports insert and delete-min is called a
``priority queue.''
\subsection{Priority Queue for Elevator}
\ni These are the core operations we need for the elevator-simulator
application. In the elevator simulator:
\ul
\li The records are events.
\li The keys are the time at which those events will occur.
\li We always want to execute the event with the ``smallest time''
(closest to the present).
\li Executing an event may result in other future events (elevator
arriving at lobby and picking up passengers results in elevator
planning to arrive at 5th floor at time $t+20$ seconds).
\el
\ni One nice implementation of a priority queue is the {\em heap\/}.
\subsection{Heap Definition}
\ni Complete $n$-node binary tree.
\ni All nodes (other than root) satisfy the ``heap property'':
$\mbox{key}[i] \ge \mbox{key}(\mbox{parent}[i])$.
\ni Example: 1, 3, 7, 9, 10, 9, 14, 15, 13, 16.
\subsection{Some Properties of Heaps}
\ni Height is $\Theta(\log(n))$.
\ni Smallest item easy to identify.
\ni Where's the largest item? Second smallest? Third smallest?
% (CLR Exercise 7.1-4)
\ni Like a bunch of sorted lists knitted together.
\ni How many nodes have one child? Just a left child? Just a right
child?
\subsection{Representing a Heap}
\ni Can use the multiple-array representation.
\ni Because binary tree is ``complete'' (packed to the left), there is
an easier way.
\ni Read off the keys by depth, left to right.
\ni Where are the two children of a node $i$? How about the parent of
$i$?
\ni How recognize leaf? Parent?
\subsection{Maintaining the Heap Property}
\ni A heap is a dynamic data structure, so we're going to have to
contend with changes.
\ni The most important changes can be handled by two basic operations
that I call:
\ul
\li Up-heapify: The entire heap is valid, except for item $n$, which
may violate the heap property.
\li Down-heapify: The entire heap is valid, except for the root, which
might be too big. (Who violates the heap property, then?)
\el
\ni These don't change the shape of the tree, just which records are
at which nodes.
\subsection{Down-Heapify}
\ni Look at node $i$ (initially the root).
\ul
\li If $\mbox{key}[i] \le \mbox{key}[\mbox{left}[i]]$ and
$\mbox{key}[i] \le \mbox{key}[\mbox{right}[i]]$, all is well. (Why?)
\li Similarly if $i$ is a leaf. (Why?)
\li Otherwise, swap $i$ and the smaller of its children.
\li Start again on the child that changed (i.e., move $i$ down).
\el
\ni We know that the only node that may violate the heap property is a
child of the new $i$. (Why?)
\ni Kind of like bubblesort or insertion sort in a tree. The root
value trickles down until it stops.
\ni Running Time? (Maximum number of comparisons and swaps.)
\subsection{Up-Heapify}
\ni A bit simpler.
\ni Look at node $i$ (initially $i=n$).
\ul
\li If $\mbox{key}[i] \ge \mbox{key}[\mbox{parent}[i]]$, all is well. (Why?)
\li Similarly if $i$ is the root. (Why?)
\li Otherwise, swap $i$ and its parent.
\li Start again on the parent that changed (i.e., move $i$ up).
\el
\ni We know that the only node that may violate the heap property is
the new $i$. (Why?)
\ni Kind of like bubblesort or insertion sort in a tree. The last
leaf value trickles up until it stops.
\ni Running Time? (Maximum number of comparisons and swaps.)
\ni
\subsection{Building a Heap from a List}
\ni Let $A$ be an $n$ element list. We will turn $A$ into a heap (in
place).
\begin{algorithm}{Build-Heap}{A,i}
\begin{IF}{i \ge n}
\RETURN
\end{IF}\\
\mbox{\sc Build-Heap}(A,\mbox{left}[i]) \\
\mbox{\sc Build-Heap}(A,\mbox{right}[i]) \\
\mbox{\sc Down-Heapify}(A, i) \\
\RETURN
\end{algorithm}
\ni Recall that $\mbox{left}[i]$ is $2i$ and $\mbox{right}[i]$ is
$2i+1$.
\ni Also, $\mbox{\sc Down-Heapify}(A, i)$ performs the down-heapify
action starting at node $i$.
\subsection{Build-Heap Analysis}
\ni Rough analysis... we call down-heapify for about half the elements
in the list: $O(n \log n)$.
\ni Actually tighter than that: $T(n) = 2 T(n/2) + \log n = O(n)$!
\subsection{Extract Min}
\ni Now that we can create a heap, how can we remove things from it?
\ni Clearly the minimum element is at the top.
\ni Now the number of elements in the heap has decreased by one.
\ni Which node should we move to the root?
\ni How do we restore the heap property?
\ni Running Time?
\subsection{Insert}
\ni Now we'll look at putting a new element into the heap.
\ni Where should it go?
\ni How do we restore the heap property?
\ni Running time?
\ni We can use insert repeatedly to build a heap from a set of $n$
elements. What recurrence do we get for the running time? How does
the running time of this compare to that of build-heap?
\subsection{Heap Sort}
\ni We can use heaps to sort.
\begin{algorithm}{HeapSort}{A}
\mbox{\sc Build-Heap}(A,1) \\
\begin{FOR}{i \= 1 \TO n}
\mbox{print } \mbox{\sc Extract-Min}(A)
\end{FOR} \\
\RETURN
\end{algorithm}
\ni Running Time? How is this nicer than merge sort? Quicksort?
\section{BINARY SEARCH TREES}
\subsection{Binary Search Tree Definition}
\ni An $n$-node binary tree.
\ni All nodes (other than leaves) satisfy the ``binary-search-tree
property'': $\mbox{key}[\mbox{left}[i]] \le \mbox{key}[i] \le
\mbox{key}(\mbox{right}[i])$.
\ni Example...
\subsection{Some Properties of Binary Search Trees}
\ni Height is $O(n)$ (imbalanced) and $\Omega(\log n)$ (balanced).
\ni Where are largest and smallest elements?
\ni Like the sequence of pivots in quicksort (how do we know we have a
partition?).
\subsection{Finding a Key}
\ni Search for node with key $x$. Start with $i$ as the root.
\ul
\li Stop if $x = \mbox{key}[i]$.
\li Go left if $x < \mbox{key}[i]$.
\li Go right if $x > \mbox{key}[i]$.
\el
\ni Running time in terms of height of tree? How does this compare to
doing the same thing in a heap?
\ni Easily extended to insert $x$ if it is not found.
% Handle tree-successor next time.
\subsection{Listing Items in Order}
\begin{algorithm}{Sort-Tree}{A, i}
\begin{IF}{i = 0}
\RETURN
\end{IF}\\
\mbox{\sc Sort-Tree}(A, \mbox{left}[i]) \\
\mbox{print }\mbox{key}[i] \\
\mbox{\sc Sort-Tree}(A, \mbox{right}[i])\\
\RETURN
\end{algorithm}
\subsection{Where is the Successor?}
\ni Can we derive a simple (?) rule for determining the element in the
binary search tree that immediately follows element $i$?
\ni Running time in terms of the height of the tree?
\ni How could this be used to delete an element from the tree?
\subsection{Successive Insertions and Deletions}
\ni We know that nearly balanced trees are best because finding an
element, insertion, and deletion, all run in $O(h)$, which is $O(\log
n)$ if the tree isn't too stringy.
\ni But even if we start off with a nice balanced tree, it might not
be balanced anymore after a sequence of insertions and deletions.
\ni Next time: look at a way of implementing these operations so that
the tree stays balanced (on average).
\section{LISTING LARGEST ELEMENTS}
\ni Let's think about solving the following problem. A web search
engine matches queries against all the documents in a database and
computes a score for each. It then needs to present the best
documents to the user, in order.
\ul
\li[Step 0]: Formalize problem.
\li[Step 1]: Propose algorithms.
\li[Step 2]: Prove correctness.
\li[Step 3]: Prove running-time bounds.
\el
\section{HOMEWORK}
\subsection{Homework 4}
\ni Background: Ch. 10, 7.
\ni Due October 2nd:
\ul
\li 4.1: Given a list $L$ of $n$ numbers, we want to compute $\min_x
\sum_{i = 1}^n |L[i] - x|$. Prove that setting $x$ equal to the median of $L$
minimizes the expression. [Extra Credit!]
\li 4.2: Given a list $L$ of $n$ numbers, we want to compute $\min_x
\sum_{i = 1}^n (L[i] - x)^2$. Prove that setting $x$ equal to the mean of $L$
minimizes the expression. [Extra Credit!]
\li 4.3: Give an algorithm for efficiently computing the mode of a
list of numbers (not necessarily sorted).
\li 4.4: Give an algorithm for efficiently computing the rank of GPA
$x$ both when the list $L$ of GPAs is sorted and when it is not.
Give tight worst-case running-time bounds.
\li 4.5: Show that the expected running time of {\sc Select} is
$O(n)$. Hint: Modify the quicksort recurrence.
\li 4.6: CLR Exercise 10.3-3 (pg. 192).
\li 4.7: CLR Exercise 10.3-7 (pg. 192).
\li 4.8: Say we have two heaps, $A$ and $B$, each of size $n$, and we
want to create a third heap $C$. Here are two ways to do that.
Method 1 runs through the elements of $A$ and {\sc insert}s each one
into $B$. Method 2 simply copies all the elements of $A$ and $B$ into
a new array $C$ and calls ``build-heap'' on the result. Give
asymptotic upper bounds on the running time of these two choices.
Which do you prefer? (Note that there are alternate implementations
of heaps that support this ``merge heap'' operation in $O(\log n)$
time!)
\li 4.9: We've got a set of $n$ distinct positive integers, e.g., (7,
10, 9, 19, 8, 6). There are $2^n$ subsets. Here's one: (7, 10,
19, 8). We define the {\em sum\/} of
a subset to be the sum of the integers in that subset (59 and 44 for
the two subsets here). Give an efficient algorithm to list the $n$
subsets with the largest sums. Hint: use a priority queue to store
the subsets that might have the largest sums. Give a bound on the
running time of your algorithm.
\li 4.10: CLR Exercise 7.5-2 (pg. 150).
\li 4.11: CLR Exercise 7.5-5 (pg. 151).
\li 4.12: CLR Exercise 13.2-1 (pg. 249).
\el
\end{document}
dynamic order statistics...