Compsci 101, Fall 2012, Data Transfromation Assignment

See the howto pages for details on creating projects, files, and so on. The pages here describe in broad strokes what this assignment is about.

See the Assignment in a Nutshell for an explanation that will help in understanding what to do.

Data transformation plays a key role in many applications ranging from identifying audio/music files by tranforming audio data into identifiable fingerprints, e.g., SoundHound or Shazam to using DNA/genomic data to identify diseases to encryption and so on. In this assignment you'll use three simple string transformations to learn about transforming and untransforming data as well as learning about interacting Python modules.

This paragraph has appeared for a while in places on the Internet, the video to the left debunks it. According to Researchers at Cambridge University (see, e.g., snopes for information):

Aoccdrnnig to rscheearch at Cmabrigde uinervtisy, it deosn't mttaer waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteres are at the rghit pclae. The rset can be a tatol mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe.

Alternatively, consider Piglatin: ig-pay atin-lay has a long history. According to editors Lewis and Onuf in the book: Sally Hemings & Thomas Jefferson (see Amazon for book) Jefferson used pig-latin to code passages in his diary that he wanted "hidden", though others apparently disagree.

Google will provide a pig-latin interface if you use the special URL http://www.google.com/intl/xx-piglatin.

There are three parts of this programming assignment:

  1. Complete the transform_ functions to transform and untransform words using piglatin and rot13 (see the howto for details). Each function is in a section of FileTransform.py that you'll add to and modify.

  2. Complete the function write_words in module FileTransform.py to write to a file and not just to the console. Complete the function transform as well to apply a transforming function to every word in a file.

  3. A-credit/challenge write a module UntransformFile.py that tries to recognize whether a file has been encoded and then decodes it accordingly. This means you write code to recognize whether a file has been encoded via piglatin or via rot13 and depending on your analysis, you untransform the file accordingly. Please document in your README file the general idea behind your method for determining how the file you read is encoded. Your program should write the output file and print a message to the user indicating what transform it decided on (including none if the file hasn't been transformed).

See the howto pages for specifics on the rules we are using for Piglatin and the other transforms.

Submit and Grading

Submit your source code FileTransform.py and optionally UntransformFile.py. You must submit a README.txt file, information you should include in the README file is specified in the general assignment page.

You should also submit a file named "Analysis.txt" in which you document any bugs or problems you notice in your program. For example, if there are words that you don't handle properly (according to the piglatin specification in the howto) then document that in your "Analysis.txt" file. If you have bugs that you can't fix, but that you tried to fix, document those in your file. Known and documented bugs will affect your grade far less than bugs we discover in running your program.

Your grade will be based on how well your program runs, whether it conforms to the Piglatin specs, and whether you've included appropriate README and Analysis.txt files.

Submit using the submit name transform.