Math 176 - Homework #2
Set Operations using a Hash Table with Linear Probing
Due date: Friday, October 27, Midnight.
This programming assignment is covered by special Academic Integrity Guidelines.
Overview: This homework assignment requires you to
implement the basic operations for a hash table with linear probing including the ability
to create an iterator that allows sequential access to the elements stored in the hash
table. You must implement the hash table so that it extends the Java class
AbstractSet
. The associated interator must implement the Java
specification of Iterator
. You will also implement non-lazy
deletion from the hash table. You will compare your hash table with linear probing
implementation of a set with the built-in Java 1.2 classes for data structures. A
special class CountComparable extends Comparable
has been written that will
allow you to easily gather statistics on the number of comparisons performed during the
data structure operations, allowing you to indirectly compare the performance of your hash
table against the built-in Java data structures. For hash tables, we shall be using
the number of collisions as the measure of how good the algorithm is.
These instructions may change somewhat or be augmented: please watch for announcements on this, or check back to this page.
The outline of the homework assignment is follows:
Write a hash table with linear probing implementation of a set. This must a Java class named HashSetLinear and it must extend AbstractSet. The basic operations it must support are:
Your implementation must obey the Java implementation standards for AbstractSet
's:
namely, the return code for add and remove indicate
whether the set was changed as a result of the operation. The add
function will not add another copy of an object that is already present. The
iterator must implement hasNext() and next()
exactly as specified by the Java 1.2 specifications for Iterator
. For a
little extra credit, you may implement the remove() method in the
iterator if you wish.
Use the supplied CountComparable
class to
wrap objects so as to keep track of the number of comparisons (equals() tests)
used when inserting and removing objects from Set
's. Run tests
with the supplied data file, first with the Java TreeSet
class (which is
based on Red-Black trees), with the the supplied class HashSetSeparateChaining
,
and then with your HashSetLinear
implementation of a hash table with linear
probing.. Gather statistics and prepare a table reporting the average number of
comparisons used per operation. You will need to run your tests with
several different load factors and with large data sets.
To do the homework you should do the following steps:
In the directory ../public/ProgHomework2, there is a
main programs MainHw2. The MainHw2 shows examples
of how use the Java classes HashSet and TreeSet. You should learn how to use these
classes if you are not already familiar with them: good ways to learn this is to
look at the appendix in the text book and to read the online Sun java documentation at www.javasoft.com. MainHw2
is very similar to the former MainHw1, but with several enhancements to
make it a little easier to use.
Important: You should get the new
class CountComparable from the same directory: it overloads hashCode()
and must be used for this assignment in place of the old CountComparable
class.
There are classes HashSetSeparateChaining and HashMapSeparateChaining
in the same directory. You should copy this over to your own directory for use in
comparing your implementation's efficiency with the efficiency of a hash table based on
linear chaining. For the "SeparateChaining" classes, copy all the .class
files (there are six of them) to your own directory.
Documentation for these classes in HTML format is available from the
directory ../public/ProgHomework1, or on the web via ftp, at http://math.ucsd.edu/~sbuss/Math176/ProgHomework1/,
or go directly to the following HTML files for documentation: MainHw2.html,
CountComparable.html, HashSetLinear.html, HashSetSeparateChaining.html, and HashMapSeparateChaining.html.
The program MainHw1 should provide you with a good
skeleton for a main program for testing your Hash Table with Linear Probing
implementation.
Later, you will need to read commands from a file to gather statistics
on the behavior of your HashSetLinear, and on the Java classes of red-black trees and hash
sets with separate chaining. Look at the program MainHw1IO from
programming assignment #1 which shows how to read from files and how to parse an input
line into tokens with a StringTokenizer
.
Write and debug your HashSetLinear class and iterator. At first,
do not try to implement any remove methods. The iterator must be implemented as an
inner class named HslIterator.
Use the built-in Java k.hashCode()
to generate
hash codes. When you rehash a table of size S, make the new table have size 2*S+1.
Extend your HashSetLinear class to support the remove operations. Implement non-lazy deletion as described in class. The algorithm is also available on line.
This step should not be left until the very last minute! Once
you have completed step 3 (or step 2, if you are unable to finish step 3), gather
statistics. There is a file named hw2Data in the same
directory ../public/ProgHomework2. This contains a series of lines
with the format: "A xxxxx" or "D xxxxx" where "xxxxx"
denotes a string of symbols. These lines are commands to either add or delete the
corresponding string from the set. (If you have not implemented remove methods, then
just skip over the delete commands.) Sometimes, the delete commands will ask to
delete a word that is not present (this happens about 10% of the time): in this case the
set is not to be changed. Sometimes an add command will ask you to add a string that
is already present in the set: again, in this case the set is not to be changed, since
sets do not support the presence of duplicate objects.
Run these commands on the data structures of (1) HashSetSeparateChaining,
and (2) TreeSet, and (3) your implementation of
HashSetLinear. Do this for the first N add commands (and the delete
commands which appear before the N-th add command), letting N equal 100, then 1000, then
10000, then 100000, then 1000000 --- but stop whenever the algorithms
become so slow as to require more than 5-10 minutes of total running time. You can
use larger data sets by increasing the heap size of the Java virtual machine which is
controlled by the -Xmx command line option to java. (Run java
-help and java -X for information on the java machine command
line options).
In addition, for the larger size (500,000 or 1,000,000 if possible),
run the test for four different load factors --- select load factors that range from about
0.2 to 0.75 or even higher. Include the results of these tests in your table, or
make a second table with the results of these tests.
Write a short report or tables giving for each test: the number of adds
attempted, the number adds which failed due to trying to insert duplicates, the number of
delete attempted, the number of deletes which failed since the element was not present,
the total number of comparisons performed, the average number of comparisons per attempt
to add or delete (i.e., per line processed from the file), and the number of collisions in
total and the average number of collisions per operation. You may include
additional information in the table if you wish, but you must include at least the items
mentioned. Your tables/report should be prepared as a plain text file.
You must turn in:
Testing suggestions. We will provide you with a
program that checks whether your class definitions are correct. It will be called CheckHslSignature.
You should be able to test your HashSetLinear by comparing it against a
HashSetSeparateChaining dictionary or a TreeSet (red-black tree) from the Java library and
checking whether it gives the same results as your hash set with linear probing
implementation. However, note that the iterators will return the elements in
different orders. (If you are using large sets, try sorting them. Alternately,
compare the sums of the hash codes of the objects returned by the iterators.)
I am looking into the possibility of writing code that will do a more
thorough testing of your code (like the code used to auto-grade the first programming
assignment). The code still needs to be more robust than what I have written so far,
so it is not yet available. Watch this space for further announcements.
It is OK to do your program development on another machine other than
ieng9, however, the final version must run on ieng9 and it would behoove you to allow a
day or two extra time to make sure it runs there. It is also OK to report your
results in step 4 as run on another machine, but in this case, please report also on the
machine type and especially on how much RAM memory it has.
Extra credit: For 10 points extra credit, devise a method for implementing remove() in the iterator. Describe your method in the README file. Implement the remove. What is the big-O runtime of your remove() code. How much extra runtime is needed? How much extra memory does it use?
All programming work must be your own. You may get help from TA's, from fellow students, etc., but must do your own work, and especially must "internalize" all advice, i.e., be able to understand everything well enough that you could re-implement it on your own. In particular, you should not use code either verbatim from any source or which is a straightforward translation of some one else's code. More information on what kinds of assistance are permitted can be found in the Academic Integrity Guidelines. If you are not sure what kind of outside assistance is allowed, discuss it with me or a TA.
Grading: The grade for your programming assignment will be based on the following (percentages and categories are preliminary and I reserve the right to change them based on the class performance).