Compsci 101, Fall 2012, Transform Howto

See assignment in a nutshell

Snarfing

This assignment provides some starter files you can use, specifically FileTransform.py and some data files to convert to Piglatin. See snarfing help for information on how to use the Ambient/Snarf functionality in Eclipse. Note that the snarf URL for this semester is http://www.cs.duke.edu/courses/cps101/fall12/snarf

code is here and data is here

The howto has two parts: information on transforms and information on writing to files.

Overview

This section provides a general overview of FileTransform.py in which you'll write several functions.

You can run the program and it will print the contents of a file to the Eclipse console window. You'll need to transform the data read from a file and print to a file rather than to the console window. Then you'll need to write new transforms for piglatina and rot13. These are described after the Getting Started section below.

Getting Started

You'll run the provided program
FileTransform.py which will make calls to the transform_ functions at the top of the file. You'll need to add a transform_rot13 here.

The main function transform_file will do three tasks/things by calling functions: you will write code that helps the second and third of these tasks.

  1. Get a file to open (using a file dialog)
  2. Get a transform to apply to each word and apply it
  3. Get a file to save transformed data to and save it

Each task/phase of the program is described below:

Open a file

When a file dialog appears a small rocketship icon will likely appear on your screen and if the file-dialog below doesn't appear, choosing that icon will enable the file dialog:

open file

Choose a transform and transform

You'll choose a file to transform -- the data directory that comes with this assignment has five files in it, you can choose to transform those or other files you have. You can also create your own files for testing.

Reading the file creates a list of the lines in the file. Each line is represented by a list of strings/words on that line. This list of lists is returned by the function get_words which is written. The code then prompts the user for a transform by displaying a text-based menu in the Eclipse console:

transform choice

This transform should be applied to each word in the file being transformed. This means you must write code in the function transform to create a new list of lists, with each word in the parameter words being transformed by applying func. For example, if the parameter words is the two lines from a file represented by the two lists below:

[ ["This", "is", "the", "story"], ["of", "a", "streetcar", "named", "Desire"]]

Then if the parameter func represents piglatin, you'll write code to return this list:

[ ["is-Thay", "is-way", "e-thay", "ory-stay"], ["of-way", "a-way", "eetcar-stray", "amed-nay", "esire-Day"]]

The code you're given in transform returns a copy of the list of words. You should modify it to return a transformed copy.

The suggestion given in a program comment is to reference copy[i], which is a list of words on one line from the read file, and replace this with a transformed list, e.g.,

    copy[i] = transformedcopy

Where the right-hand-side of the assignment statement could be a function call, could be a list comprehension, or could be something else. But the right-hand-side (RHS) is a list in which each word has had func applied to it: func("apple") would be "apple-way" if func is the pig-latin function.

Writing the file

After the data from the file has been transformed, a new transformed file should be written. The program uses a file-dialog to ask the user for a filename to save the data in. This is done by this code and results in the dialog box shown below after the code. file = get_file_to_save() if file == None: return write_words(file,twords) file.close();

save file

The program calls write_words with a file open for writing (parameter file, that's a bad name, but we'll leave it for now, it's a type in Python) and a list of transformed data/words, twords. You must complete write_words so that it writes the transformed data to a file.

There's information below on writing to a file, you'll need to test the functions you write in FileTransform.py that writes a file.


Transform functions

You'll need to modify the transform_ functions at the top of the FileTransform.py. In this module you must ultimately write functions transform_pig and transform_unpig that each have a single string parameter and return a string that is either the pig-latin equivalent (pigify) or that is reversed from piglatin to normal text (unpigify). You're given some transform functions: named transform_identity and transform_upper.

You'll need to test the module yourself, by running code you write to see that the two functions dealing with piglatin work as intended.

APT testing

You can test both piglatin and rot13 via APTs (piglatin for APT-2, rot13 in APT-3). Be sure to get credit for the APTs by doing both the pig-latin and ROT13 apts as part of doing this assignment.

When you're reasonably sure that your Piglatin functions work, you can start working on the rot13 function. This function both encodes and decodes a string, so one function fills two roles. See the rot13 section for details on this encoding.

You'll need to use FileTransform.py to test whether your transforms work with files too. This means you'll need to add your transform functions, e.g., Transforms.piglatin to the list of functions in choose_transform and you'll need to add a corresponding string so the user can choose the function. These will be added in the lists funcs and names respectively.

Adding Functions for the User to Choose

After you write pigify, unpigify, and rot13 functions you'll need to add these functions and a text prompt for them to FileTransform.py. You may need to add functions and prompts to the code in choose_transform so that the transforms you write can be called to transform data


Part I: Transforms

Piglatin rules

These are the rules you should use to convert a word into piglatin. We're using a hyphen to facilitate translating back from piglatin to English.

In creating piglatin you will not be concerned with punctuation, so treat every character as either a vowel or not-a-vowel, and punctuation falls into the second category.

  1. If a word begins with 'a', 'e', 'i', 'o', or 'u', then append the string "-way" to form the piglatin equivalent. Examples:

    Word Piglatin Equivalent
    anchor anchor-way
    elegant elegant-way
    oasis oasis-way
    isthmus isthmus-way
    only only-way

  2. If a word begins with a non-vowel (we'll call this a consonant, but it could be a number, punctuation, or something else), move the prefix before the first vowel to the end with "ay" appended. Use a hyphen and treat 'y' as a vowel. If 'y' is the first letter of a word it should be considered a consonant.

    Word Piglatin Equivalent
    computer omputer-cay
    slander ander-slay
    spa a-spay
    pray ay-pray
    yesterday esterday-yay
    strident ident-stray
    rhythm ythm-rhay

  3. Words that begin with a 'qu' should be treated as though the 'u' is a consonant.

    Word Piglatin Equivalent
    quiet iet-quay
    queue eue-quay
    quay ay-quay

A few words won't conform to these rules, but the rules should always be used. If a word contains no vowels it should be treated as though it starts with a vowel --- for example "zzz" will be translated to "zzz-way".

It's possible that different words will be transformed to the same piglatin form. For example, "it" is "it-way", but "wit" is also "it-way" using the rules above.


Rot13

You'll write a function named rot13 to use a (Wikipedia) ROT13 cipher to encode/decode a string, and then use this to encode every word in a file.

The function rot13 returns a rotated form of its string parameter:

def rot13(w): s = "" # write code to concatenate characters to s return s

To convert a letter character to its ROT13 equivalent we suggest using these strings, the .find method that returns an index, and the string indexing operator.

   a = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
   b = "NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm"

For example, the letter 'S' in the first string (labeled a above) is found at index 18. Note that a.find("S") will evaluate to 18. Then b[18] represents the encoding for 'S', namely 'F'. You can reverse the roles of the strings a and b since the ROT13 cipher is symmetric.

There are other ways to do the ROT13 cipher using the functions chr, ord and the % operator, but the approach suggested by using indexing and the strings above is much easier to get working.

You can identify non-letters by using the return value of the string find method which will be -1 for non-letters. Alternatively you can use the Python string module to identify letters. For example, the code below generates the output that follows it.

import string for a in "ABCDefg123!,#": if not a in string.letters: print a OUTPUT
    1
    2
    3
    !
    ,
    #
Note that you must import the string module to use its functions and constants, see the Python string docs for full information on the module.

Writing to Files

Whenever you print something to the console, you should also write it to the file open for output. This way the transformed words will be written to the console in Eclipse and saved to a file. The current version of function write_words writes to the console only, but it takes a file parameter that you'll use when you modify the function. This is the code you're given: for line in words: for w in line: print w+" ", print Note the inner loop has a comma in the print statement, that keeps the output on a single line. The print statement after the inner loop moves to the next line, because one sub-list of words, which is one line of the transformed file, has been written completely to the console.

You must write the words to a file as well as to the console. To write to a file, you use the file .write method which takes a string as a parameter, two uses are shown below for a variable named outfile:

outfile.write(word) # to create a line, write the newline character outfile.write("\n") To make sure the output file is completely written, the last line of your code must close the file as below: outfile.close() This will ensure that all writing to the file happens, that the file is flushed and closed properly.

This means that completing write_words requires mirroring the print calls to also write to a file using file.write as follows:

Advice

Don't do the output file stuff until you've successfully written piglatin to the console. You can always create your own files to test your program, you don't need to run on large files. Create a file just like you create a README, with New>Other>General>Untitled Text File from the Eclipse "new" menu.