Compsci 101, Spring 2016, Transform HOWTO

It is recommended that you do the Piglatin APT and the Caesar Cipher APT first before doing this assignment!

You can see the files here.

Part 1: Piglatin Howto

Getting Started

Snarf the assignment and run it. This program does several things. It reads from the file romeo.txt, it counts the number of words in romeo.txt, and it writes the file pig-romeo.txt, which is a copy of the file with extra white space removed from the file. The files to start the project are also available via the following, in addition to being accessible via snarfing.

After running the program the first time, you will need to right click on the PyDev project spring16-compsci101-assign4 and select refresh to see the file pig-romeo.txt listed in your project.

Pig-latin rules

These are the rules you should use to convert a word into pig-latin. We're using a hyphen to facilitate translating back from pig-latin to English. In creating pig-latin you will not be concerned with punctuation, so treat every character as either a vowel or not-a-vowel, and punctuation falls into the second category. The rules below refer to lowercase characters, but your code must work with uppercase and lowercase letters. In the rules below, 'A' is a vowel just as 'a' is, for example. Your code should not change the case of any letters.

  1. If a word begins with 'a', 'e', 'i', 'o', or 'u', then append the string "-way" to form the pig-latin equivalent. Examples:
    Word pig-latin Equivalent
    anchor anchor-way
    elegant elegant-way
    oasis oasis-way
    isthmus isthmus-way
    only only-way

  2. If a word begins with a non-vowel (we will call this a consonant, but it could be a number, punctuation, or something else), move the prefix before the first vowel to the end with "ay" appended. Use a hyphen and treat 'y' as a vowel. If 'y' is the first letter of a word it should be considered a consonant.
    Word pig-latin Equivalent
    computer omputer-cay
    slander ander-slay
    spa a-spay
    pray ay-pray
    yesterday esterday-yay
    strident ident-stray
    rhythm ythm-rhay

  3. Words that begin with a 'qu' should be treated as though the 'u' is a consonant.
    Word pig-latin Equivalent
    quiet iet-quay
    queue eue-quay
    quay ay-quay

A few words will not conform to these rules, but the rules should always be used. If a word contains no vowels it should be treated as though it starts with a vowel. For example "zzz" will be translated to "zzz-way".

It is possible that different words will be transformed to the same pig-latin form. For example, "it" is "it-way", but "wit" is also "it-way" using the rules above.

Required functions

In the Python module Pigify.py you will need to write the following functions.

NOTE: Functions below were renamed 4/11 due to typos. pigifyworld should be pigifyword and unpigifyworld should be pigifyword. That should make much more sense!

  1. def pigifyall(phrase) - This method has one parameter phrase of type string. It returns a string that is the piglatin translation of the original phrase.
  2. def pigifyword(word) - This method has one parameter word of type string. It returns a string that is the piglatin translation of the original word.
  3. def unpigifyall(phrase) - This method has one parameter phrase of type string that is in pig-latin. It returns a string that is the translation of the pig-latin phrase to English.
  4. def unpigifyword(word) - This method has one parameter word of type string that is in pig-latin. It returns a string that is the translation of the pig-latin word into English.

Assume words in the phrases are separated by whitespace.

Here is the body of pigifyall, for example, which will work if you write the function pigifyword.

def pigifyall(phrase):
    all = []
    for word in phrase.split():
        all.append(pigifyword(word))
    return ' '.join(all)

You should be sure you understand how pigifyall works, the purpose of .join, what the function pigifyword does, and by extension how unpigifyall and unpigifyword will work too.

The program you submit should have the main block below (part of the Pigify.py module you'll snarf) that you will have to add some code to:


if __name__ == '__main__':
    # start with reading in data file
    words = readFile("romeo.txt")
    print "read",len(words),"words"
    result = ' '.join(words)
    # convert to piglatin and write to file
    pigstr = pigifyall(result)
    writeFile(pigstr.split(),"pig-romeo.txt")
    print "PIGIFIED romeo.txt"
    print pigstr[0:100]

    # ****** replace comments with code ******
    # read in pigified file
    # ADD CODE HERE
    # unpigify file that was read
    # ADD CODE HERE
    # write to file "unpig-romeo.txt"
    # ADD CODE HERE
    print "UNPIGIFIED romeo.txt"
    # ADD CODE HERE

PART 2 - Caesar Cipher Encryption

You'll need to be able to encode (or decode) using a Caesar cipher. That is, you will read a file, encrypt it and then write a new encrypted file. You'll also read an encrypted file, decrypt it and write a new decrypted file. This is similar to what you did with piglatin above. Details on encrypting and decrypting are given below. You should start by creating a new Python module Caesar.py. Create a main block like the one that came in Pigify.

if __name__ == '__main__':

You should copy the functions readFile and writeFile from the Pigify.py module. When you're done creating the functions below you should include in main code code as defined below.

We include basic instructions and hints here, but you should absolutely read more about Caesar ciphers with some online reading including:

For full credit you must use the approach outlined here, in this document, which uses the string method .find, and does not use the concept of adding numbers to characters and the chr and ord functions.

To convert a letter character to its rotated Caesar-equivalent create a string of letters in order of the alphabet, and then a version of this string that represents a shifted-version. We show an example below with a shift of four, but it generalizes to any shift. You can create the shifted version automatically with the slicing operator as shown.

Using a shift of 4 you'll get the strings below.

Note that alph[:4] == "ABCD" and alph[4:] == "EFGHIJKLMNOPQRSTUVWXYZ"


  alph   = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
  caesar = "EFGHIJKLMNOPQRSTUVWXYZABCD"

To encode the letter 'S', you would be able to use alph.find('S') which returns the value 18, then using caesar[18] you see that the encoding of 'S' with a shift of 4 is 'W'. Using this approach will help you write code to translate any alphabetic character.

There are other ways to do the CAESAR cipher using the functions chr, ord and the % operator, but the approach suggested by using indexing and the strings above is much easier to get working. You must use the approach above with .find and [ ] for full credit.

You can determine if a string is a alphabetic character using the boolean method .isalpha(). For example:

In your program you should only transform alphabetic characters, not others.

To encrypt all the words in a string, write a function encrypt:

def encrypt(str, shift)

where str is a string, shift is an index in range 0-25. The function encrypt should return a string that represents the words of str, but each word encrypted using a Caesar cipher. Here's the code, it assumes you have a helper function shiftword that does the work of encrypting one word. This code breaks the string into a list of words, encrypts each word, and re-assembles the encrypted words into a string.

Note: You can change the names of parameters/variables to something different if you want. For example, you could use answer instead of all and phrase instead of str. Both str and all are reserved words in Python, so you get a warning but they should still work. But if you have any issues, then change their names.

def encrypt(str, shift):
    all = []
    for word in str.split():
        all.append(shiftword(word,shift))
    return ' '.join(all)

Summary: you'll write code to encrypt a string using a Caesear cipher. This will be in the module Caesar.py, you'll include at least these functions.

  1. encrypt
  2. shiftword

The function encrypt has been started for you.

Caesar Cipher Decryption

You'll need to write code to decrypt a file that's been encrypted with a Caesar cipher. You'll do this using two different methods. The first method will require you, an intelligent person, to determine the shift needed for decrypting. You'll do this by trying each possible shift from 0-25, and looking at the output. For the second method, you'll implement a completely automated approach to finding the shift to decrypt text that was encrypted using a Caesar cipher.

Eyeball Decryption

You'll be able to find what the shift-value is for decrypting a Casesar-encrypted string by simply eye-balling the results of trying each possible shift. For example: Suppose you were trying to decrypt this string:

"Bxvncrvnb rc'b njbh cx lxdwc oaxv 1-10, kdc wxc jufjhb"

Using each shift-value from 1-25 generates the output below (a shift of zero results in no change to the string being encrypted). Can you find the original string that was encrypted?

1 Cywodswoc sd'c okci dy myexd pbyw 1-10, led xyd kvgkic
2 Dzxpetxpd te'd pldj ez nzfye qczx 1-10, mfe yze lwhljd
3 Eayqfuyqe uf'e qmek fa oagzf rday 1-10, ngf zaf mximke
4 Fbzrgvzrf vg'f rnfl gb pbhag sebz 1-10, ohg abg nyjnlf
5 Gcashwasg wh'g sogm hc qcibh tfca 1-10, pih bch ozkomg
6 Hdbtixbth xi'h tphn id rdjci ugdb 1-10, qji cdi palpnh
7 Iecujycui yj'i uqio je sekdj vhec 1-10, rkj dej qbmqoi
8 Jfdvkzdvj zk'j vrjp kf tflek wifd 1-10, slk efk rcnrpj
9 Kgewlaewk al'k wskq lg ugmfl xjge 1-10, tml fgl sdosqk
10 Lhfxmbfxl bm'l xtlr mh vhngm ykhf 1-10, unm ghm teptrl
11 Migyncgym cn'm yums ni wiohn zlig 1-10, von hin ufqusm
12 Njhzodhzn do'n zvnt oj xjpio amjh 1-10, wpo ijo vgrvtn
13 Okiapeiao ep'o awou pk ykqjp bnki 1-10, xqp jkp whswuo
14 Pljbqfjbp fq'p bxpv ql zlrkq colj 1-10, yrq klq xitxvp
15 Qmkcrgkcq gr'q cyqw rm amslr dpmk 1-10, zsr lmr yjuywq
16 Rnldshldr hs'r dzrx sn bntms eqnl 1-10, ats mns zkvzxr
17 Sometimes it's easy to count from 1-10, but not always
18 Tpnfujnft ju't fbtz up dpvou gspn 1-10, cvu opu bmxbzt
19 Uqogvkogu kv'u gcua vq eqwpv htqo 1-10, dwv pqv cnycau
20 Vrphwlphv lw'v hdvb wr frxqw iurp 1-10, exw qrw dozdbv
21 Wsqixmqiw mx'w iewc xs gsyrx jvsq 1-10, fyx rsx epaecw
22 Xtrjynrjx ny'x jfxd yt htzsy kwtr 1-10, gzy sty fqbfdx
23 Yuskzosky oz'y kgye zu iuatz lxus 1-10, haz tuz grcgey
24 Zvtlaptlz pa'z lhzf av jvbua myvt 1-10, iba uva hsdhfz
25 Awumbquma qb'a miag bw kwcvb nzwu 1-10, jcb vwb iteiga

You can determine that a Caesar shift of 17 was used by examining the output from trying each shift from 1-25. You can see that the only output that looks like English is the one whose shift is labeled 17. You'll do this by writing a function that takes a string (encrypted) as a parameter and tries all possible shifts from 0-25, printing the results (you should only print he first 80 characters of the result using slicing, that's enough to print and eyeball/examine). We'll add the zero shift in case the string passed in wasn't encrypted.

In Caesar.py, write this function.


def eyeball(encrypted)

This function should NOT return a value, but should print 26 rows, labeled with the value of the shift being applied, an int from 0-25 inclusive, and the first 80 characters of the string that results from applying a Caesar-cipher shift to the string parameter. You'll then use your own cryptological judgment in determining the original message, that is in decrypting the encrypted text. You'll find the original text from the files file1.txt and file2.txt that you snarfed for this assignment The original string, which you can see is found with a Caesar shift of 17 in the list above, can be determined by eyeballing all shifts, then running encrypt with the identified shift.

Decryption with Word Comparison

To automate decryption, you will read in a file of words, melville.txt, creating a list of English words from the file. Then you will try to decrypt a phrase by trying every shift-value from 1-25 and for each resulting string, you will check to see if the words are real words from melville.txt. The shift-value that generates the most real words is likely the key.

In Caesar.py, write the function decrypt that has one parameter encrypted.

def decrypt(encrypted):

This function uses the method described above to determine the correct shift value and then returns the decrypted phrase.

Try these out on file1.txt or file2.txt or a phrase that you have encrypted.

OUTPUT

You should have clearly understandable output that demonstrates that your program works.