Compsci 6/101, Fall 2011, Data Transformation

Data transformation plays a key role in many applications ranging from identifying audio/music files by transforming audio data into identifiable fingerprints, e.g., SoundHound or Shazam, to using DNA/genomic data to identify diseases to encryption and more. In this assignment, you will simply transformation strings to learn generally about transforming and un-transforming data as well as learning about interacting Python modules.

This paragraph has appeared for a while in places on the Internet, the video to the left debunks it. According to Researchers at Cambridge University (see, e.g., snopes for information):

Aoccdrnnig to rscheearch at Cmabrigde uinervtisy, it deosn't mttaer waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteres are at the rghit pclae. The rset can be a tatol mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe.

Alternatively, consider pig-latin. According to editors Lewis and Onuf, in the book Sally Hemings & Thomas Jefferson, Jefferson used pig-latin to code passages in his diary that he wanted "hidden", though others apparently disagree. Amazingly, Google will provide a pig-latin interface if you use this special URL.

 

There are three parts to this programming assignment:

  1. Complete the module Transforms.py by writing functions to transform, and un-transform, strings in two different ways (pig-latin and rot13).

  2. Complete the module FileTransform.py to encode any file using one of the encodings you wrote in Transforms.py (currently, it prints its output to the Console, you will change this to write to a file).

  3. A-credit/challenge: write a module FileUntransform.py that tries to recognize whether a file has been encoded and then decodes it accordingly. This means writing code to recognize whether a file has been encoded via pig-latin or rot13 (or via base64 for more credit) and, depending on your analysis, un-transform the file accordingly. In addition to writing the untransformed words to a new file, your program should print a message to the Console indicating what un-transform was used (including "none" if the file cannot be un-transformed).

See the HOWTO page for more details on the using files, modules, and rules about the specific transforms you will write.

Submit and Grading

Submit your source code: Transforms.py, FileTransform.py, and optionally FileUntransform.py; as well as a README file and an ANALYSIS file using the submit name transform.

In your ANALYSIS file, document the general idea behind your algorithm for determining how the file you read is encoded so it can be un-transformed (do this even if you were not able to code your idea). Additionally, document any bugs or problems in your program that you were not able to reslove (i.e., there may be certain kinds of words that you know are not handled properly). If you document bugs that you cannot fix, and how you tried to fix them, they will affect your grade far less than bugs we discover in running your program.

Your grade will be based on how well your program functions and whether you have included appropriate README and ANALYSIS files. You can get bonus points for well-designed functions in Transforms.py.