Compsci 101, Fall 2014, Data Transformation

See the REVISED HOWTO page for more details on the using files, modules, and rules about the specific transforms you will write. This revised howto includes suggested steps if you don't know how to get started with this assignment.

Here is the OLD HOWTO page which you probably shouldn't need any more.

Data transformation plays a key role in many applications ranging from identifying audio/music files by transforming audio data into identifiable fingerprints, e.g., SoundHound or Shazam, to using DNA/genomic data to identify diseases to encryption and more. In this assignment, you will simply transformation strings to learn generally about transforming and un-transforming data as well as learning about interacting Python modules.

This paragraph has appeared for a while in places on the Internet, the video to the left debunks it. According to Researchers at Cambridge University (see, e.g., snopes for information):

Aoccdrnnig to rscheearch at Cmabrigde uinervtisy, it deosn't mttaer waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteres are at the rghit pclae. The rset can be a tatol mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe.

Alternatively, consider pig-latin. According to editors Lewis and Onuf, in the book Sally Hemings & Thomas Jefferson, Jefferson used pig-latin to code passages in his diary that he wanted "hidden", though others apparently disagree. Amazingly, Google will provide a pig-latin interface if you use this special URL.

 

Your Task

Snarf the files for assignment4. You can also see the files here:

There are five parts to this programming assignment:

  1. Create a simple small file called simple.txt that will be useful in testing your program. This file should be small (perhaps 4-10 lines of code) and should have data in it that will be helpful in testing the functions you are to write. You will find that good thought should go into what you put into this file as it will likely be the most helpful file in debugging your program.

  2. Complete the module Transforms.py by writing functions to transform, and un-transform, strings in two different ways (pig-latin and rot13).

  3. Complete the module FileTransform.py to encode any file using one of the encodings you wrote in Transforms.py (currently, it prints its output to the Console, you will change this to write to a file).

  4. Write a text file ANALYSIS.txt that documents the general idea behind recognizing whether a file has been encoded and how to decode it. That is given a file, how would you determine if it had been encoded via pig-latin or rot13, and then how would you un-transform it back. Do not write code here, just describe in words how you would tackle this problem.
  5. Extra credit challenge: write a module FileUntransform.py that tries to recognize whether a file has been encoded and then decodes it accordingly. This means writing code to recognize whether a file has been encoded via pig-latin or rot13 and, depending on your analysis, un-transform the file accordingly. In addition to writing the untransformed words to a new file, your program should print a message to the Console indicating what un-transform was used (including "none" if the file cannot be un-transformed).

DO NOT modify the other python files we have provided.

What to Submit

Please submit the following:

Submit the items to the folder assign4-transform using eclipse/ambient or the websubmit.

Your grade will be based on how well your program functions and whether you have included the appropriate additional files listed above.