CS 1 : Lecture 7

Lecture 7: Memory and Primitive Java Variables

Memory in a computer & data representation in Java
Introduction to Variables and Identifiers
Primitive Java Types
Data Type Conversion

Computer Memory & Java Data Representation

Today we'll start talking about the use of variables in Java - an essential part of any programming language. Before we start that, a quick look at computer memory and how data is represented in Java should provide a better foundation for understanding how things work.

Inside a computer's memory, everything is represented as a big collection of 1's and 0's. This means that computers operate on a base-2 or binary number system (with only 0 and 1 as the digits) rather than the base-10 or decimal system you're used to (with 0-9 as the digits). Since computers use binary digits, a single 0/1 entry in memory is called a bit. A collection of 8 bits is called a byte and some larger numbers of bits and bytes also have special names, but I won't go into them here.

One big thing to remember about these bits in computer memory is that the same set of 0's and 1's can represent lots of different things. It all depends on how you interpret them. The same group of bits could denote a number, a word or sentence, or it might not be data at all. It could actually be describing code to be executed for some program. Properly interpreting bits in memory can be a big hassle. Thankfully, Java pretty much handles memory management for us, and so you'll only have to worry about the specific data type of each variable you create (more on data types later in this lecture). You can check out this page at HowStuffWorks.com to get a more detailed introduction to bits and bytes. Their explanation of the topic is also where I got some of my material.

One last thing to note: how your code is read by the computer. As just mentioned above, today's computers speak the language of 0's and 1's. Since you're writing Java code, something has to eventually translate that down into a binary format the computer will accept. This is the job of something called the compiler. The some of the details of how this is done will be covered later in this course (specifically, in the topics of computer architecture or computer compilers).

Variables and Identifiers

Now that we've seen a little bit about how computer memory works, we can now see how that memory is used for variables in Java. The analogy I used before was to think of a variable like a labeled bucket that holds values for you. You can take the value out, and put different values in. Now we can see that for a "bucket" the computer is actually setting aside a place in memory for our variable. As quoted from an old textbook of mine: "A variable is a name for a location in memory used to hold a data value." (Textbook: Java Software and Solutions by Lewis and Loftus.)

I'll go into detail about all the different variable data types in just a second. First, let's take a quick aside to look at variable names and "Identifiers" in general. Each of the words used in computer programs are called identifiers. Recall the first Java program that was introduced: Basic1.java. Each of the words in that program are identifiers: public, class, Basic1, static, void, main, String, args, System, out, println. These words can fall into 3 categories:

Identifiers that we choose: Basic1, args
Identifiers that some other programmer chose: String, System, out, println, main
Identifiers that are reserved for special purposes in this programming language: public, class, static, void

The reserved words have special meanings in the Java language and so you cannot use them for anything that you would like to name (like a class, method, or variable). The textbook has a complete listing of all of the reserved words in the Java language.

The words that were chosen as identifiers by other programmers are just like the names that you will choose for things you create. It's just that these other programmers have already created a lot of useful code to perform all sorts of functions for you. In order to use the classes, methods, and variables that those programmers created, we've got to use the names that those programmers chose.

When you go to choose your own identifiers there are still a set of guidelines to follow. Identifiers in the Java language can be made up of any combination of letters, digits, the underscore character "_", and the dollar sign "$". Identifiers cannot, however, begin with a digit. So, my_new_variable or all_4_u are valid identifiers, but 2_cool or new&invalid are not valid identifiers. Also note that Java is case sensitive, so each of these variable names unique, Unique, and UnIqUe are all... unique. Finally, also note that there are some "suggested" guidelines for naming certain things: like captializing the first letter of class identifiers. These coding conventions will not cause your program to fail if you don't follow them. However, not following them can make your program much harder for other programmers to read and understand.

Another thing to note is that in all of the programming examples we've seen so far, a variable always had to be declared before it was used. For example the line

int answer;

declares a new variable of type int with the name "answer". Java requires that all variables be declared before they are actually used. So what happens if you suddenly start using a variable without declaring it first? The compiler that tries to make sense of your code will not be happy and will let you know about it. This is a good place to mention that a common programming mistake is to accidentally mistype the name of a variable when you try to use it in a program (remember that Java is case sensitive). So if your compiler complains about a variable not being declared but you thought you had done so, check to make sure you've typed the variable name exactly the same every time you've used it.

Primitive Data Types vs. Objects

The data types in the Java language mostly fall into 2 main categories: (1) primitive data types and (2) objects. Technically using objects also means using a third type, references, but we'll igore objects and references for the moment. When starting out, why not start with the primitive types.

Primitive data types are basic things like numbers, letter characters, and true/false values. There are 8 primitive data types total: byte, short, int, long, float, double, char, and boolean. There are 4 types for holding integer values (byte, short, int, long), 2 types for holding floating point or real numbers (float, double), 1 type for holding a single character (char), and 1 type for holding a single true/false value (boolean). While I'll cover the data types here so that you know why they exist, you probably will only use 3 types: int, double, and boolean.

Integers and Floating Point Numbers

Let's begin by looking at the 6 number types. As mentioned above, Java keeps two basic numeric values: integers (whole numbers without fractions), and floating point values (with fractions/decimals). The difference between each of the 4 integer types (byte, short, int, long) is that they each reserve a different amount of space in memory for that particular data type. This is also the difference between the two floating point types (float, double). As you may recall from the brief discussion about computer memory given at the start of this lecture, a byte is 8 bits (with each bit in memory being able to hold a single value of 1 or 0). So, it should come as no surprise that the byte value type reserves 8 bits of memory. By comparison, the other integer data types continue to double in size: short 16 bits, int 32 bits, and long 64 bits.

The reason for all of the different sized data types is a tradeoff between how much memory is used and what size value that data type can store. Since a single bit can have 2 diffent values, each extra bit that a data type uses doubles the number of different values it can represent. So, 2 bits would allow for 2*2 = 4 different values, 3 bits allows 8 different values, etc. In general, this means that n bits can store 2ⁿ different values. With the byte data type using 8 bits, it is able to store 2⁸ = 256 different values.

One other thing to take into account with these numeric data types is that they all store signed values. This means that they cover both positive and negative values. Since the byte data types can store 256 different values, this gives a number range from -128 to +127 (remember that the "0" value is also included in that range). So, you can see that while the byte data type succeeds in using only a small amount of memory, its range of values is also small. If you need to use as little space as possible and you know that the value for some integer variable would never exceed this range, then a byte data type is for you. Often memory space is plentiful, though, and figuring the range of values for each variable can be very tedious. So, you'll usually see programs using the int data type (with a range from -2,147,483,648 to 2,147,483,647).

Between the two floating point data types float uses 32 bits and double uses 64 bits. Most programs tend to use the double data type, but again you can make the choice. The key difference between the integer numberic types and the floating point numeric types is the way in which they store their value. The integer data types simply store their values in binary notation using all of their bits to make a 1-to-1 correlation to decimal numbers. Floating point data types, however, divide their bits up to store 3 different parts:

a sign,
a mantissa,
and an exponent.

You've probably seen this form before with scientific notation, like 2.99792458x10⁸ for example. The sign is positive or negative (+ or -), the mantissa is the numberic value in the middle (here, 2.99792458), and the exponent is the power of 10 (in this case, 8).

Finally, one last bit of terminology. As taken from the same old textbook reference before, "a literal is an explict data value used in a program". In the case of the number data types described above we would have integer literals and floating point literals. Note that a literal would be an actual number, like 7041776, which is different from a variable like importantDate which might store a number value.

Arithmetic Expressions and Operator Precedence

One common and essential part of most programs is expressions. Expressions are simply a collection of operands (things like literals and variables) and operators that act on them. Arithmetic expressions in Java for both integer and floating point values work much like expressions you would see in a math class. The typical operators of addition, subtraction, multiplication and divison (+, -, *, /) are defined for both integer and floating point numbers. Each of these take a value on either side (a first and second operand) and act left to right. For example, in the following line of code

int answer = 10 - 4;

the subtraction operator (-) would subtract the second operand from the first (4 from 10), resulting in a value of 6. That value of 6 is then placed in the int variable answer. Note that the division operator (/) acts slightly differently depending on whether we are dealing with integer or floating point numbers. If both of the values given to the operator are integers (byte, short, int, or long) then we have integer division. This means that any remainders or fractions left over by the division operation are truncated or dropped. So, in this example:

int answer = 5 / 2;

the operation of 5 / 2 would result in the value 2, with that value being placed into the variable answer. However, if one or both of the values given to the division operator are floating point values, then the operator performs floating point division. This means that the fractional part of the division is saved as part of the resulting value. So, in this expression:

double answer = 5 / 2.0;

the expression 5 / 2.0 evaluates to 2.5, and that value is placed in the variable answer. The different values you get from floating point vs. integer division are subtle, but you will likely come across some case in your programming career where it matters. So, if you're supposed to get a floating point answer out of a arithmetic expression, always be sure to check your division operations in that expression to make sure at least one of operands for each division operator is a floating point value. If you've got a division with two integers stuck in there somewhere, your remainder will be lost (and when trying to find that bug, your patience may be lost as well).

One other common operator that is a little less known is the remainder operator (%), which is sometimes also called the modulo or mod operation. This divides the first operand by the second operand, but then returns the remainder. (In a sense, this means that it is performing integer division, but is keeping the remainder instead of the quotient.) So this line of code:

int answer = 10 % 3;

evaluates 10 % 3 to give the value of 1. This value is then placed into the variable answer.

One other important note about arithmetic operators in expressions is the order in which they are evaluated. The operator precedence hierarchy is again very similar to what you would find in a math class, with multiplication (*), division(/), and the remainder operator(%) all having precedence over addition (+) and subtraction (-). At the bottom of the hierarchy (meaning that it is evaluated last) is the equals sign (=) which actually assigns the completely computed value to something (usually a variable). In general, to avoid confusion about how expressions are evaluated, it is very helpful to use parentheses to designate portions of the expression to be evaluated first. So, in the following example:

double answer = ((5.0 - 3.0)) / 4 + (8 * 2);

we've got a complicated arithmetic expression involving parentheses. Just as with an expression in a math class, we evaluate the expression left-to-right and we start by evaluating the innermost parentheses. This means that we first take 5.0 - 3.0 to get the value 2.0, then divide by 4 to get the value 0.5. For the last two operators, the multiplication is evaluated first, then the addition. This gives 0.5 + 16 = 16.5 for our final value, which is then finally placed into the variable answer. Note that removing the parentheses around 8 * 2 would not change the value of the expression, but they do make it simpler for a human to understand the expression. Lastly, note that left and right parentheses do have to match up and be properly nested (just as with left and right curly braces, {}, which are used to define the bodies of methods, classes, etc).

Overflow

We've mentioned that each of the different data types have different sizes. This means that a certain numeric data type can only store values up to a certain point (like the byte data type only holding numbers as large as 127). So what happens if we try to store a number that's too large into a variable that can't hold it? Overflow. Assume that I had variable that could only store one decimal digit, 0-9. What happens if I try to store the value 10? No matter how I try, I can't accurately represent the value 10 with just a single 0-9 digit.

Since the data types in Java are working on a base-2 number system, the boundaries of what each type can hold are a little harder to remember. The affect is just the same, though. If I've got a byte variable holding the value 127, and I try to increment it, my variable can't accurately represent the value 128. What actually happens, at least with integer variables, is that the value wraps around and becomes negative. You may have notice this if you play around with the Math4.java factorial code from the first assignment. To more clearly see a comparison of the actual values and the wrap-around values check out Math5.java.

The `char` data type

Now a word about the char data type. A character set is just a list of characters given in a specific order, and the Java language supports the Unicode character set. The ASII character set is a better known one that you may have heard of, and that is basically included as part of the Unicode character set. This accounts for most or all of the characters you're used to: upper- and lowercase letters, digits, punctuation, and special characters (usually made by holding down shift or option on your keyboard). There are also lots of other characters included in this set, and your book describes this a bit more in chapter 2 as well as in an appendix at the back of the book.

You can declare and initialize a char variable using the following lines of example code:

char myChar;
myChar = 's';

The myChar variable could also be declared and initialized on the same line as we've seen earlier. Also, note that single quotation marks around the letter s are used to denote a character literal (remember that a literal is just an explicit data value). The use of single quotation marks for characters is different from using double quotation marks which are used to delineate String literals. We saw examples of string literals with each use of the println method in the first lecture. Since the String data type is not a primitive data type I won't explain everything about it right now. It will be covered a few lectures from now when objects and references are introduced.

The `boolean` data type

The boolean data type is very simple - it only takes the values true or false - but it can be very useful. We'll see the boolean type being used a lot when we cover program statements. For now, just know that the reserved words true and false can be used at boolean literals and are used to initialized a boolean variable like this:

boolean areWeThereYet = false;

Data Type Conversion

We've just been introduced to each of the 8 primitive data types in Java. These different types are particularly important because the Java language is a strongly typed language, meaning that each data value is associated with a particular data type. What happens in cases where we need to mix types, like in arithmetic expressions involving both integer and floating point variables? The answer is data type conversion.

In data type conversion, we temporarily treat a value of one data type as though it were another type so that we can perform some operation. More specifically, a copy is actually made of the value of the first type. This copy that is made has the same type as the second value. Since each data type uses a different amount of memory, and thus can save data with more or less precision, we have two types of data conversion: narrowing and widening conversions. Narrowing conversions go from a data type that uses more space, to one that uses less space while widening conversions go the other direction (from less space to more space).

When you're narrowing you're losing some bits of space, so the narrowing conversion is more likely to lose some of your data information. Here's a glance at the conversions (widening left to right): byte < short < int < long < float < double. Note that there is no danger of losing precision or data when you're widening from one integer type to another, or from one floating point type to another. However, there is a chance of losing data or precision when you convert from one of the integer data types to one of the floating point data types. This is because the two types store their information differently (floating point types dividing their bits into a sign, mantissa, and exponent while integer types use all of their bits for a single number).

Still, this may sound confusing. So here are some examples for those who'd like it. The long data type uses 64 bits while the float data type uses only 32 bits. While the conversion from long to float data types is regarded as 'widening' (that's just what the sources say), it's clear that you could have a value stored in a long that is far too much information for a float to handle. The same is true of going from long to double. Both types use 64 bits of storage space, but assume that a value in a long data type variable uses up all of its bits. An example would be if it stores the maximum possible value for a long: 9,223,372,036,854,775,807. The double data type also uses 64 bits, but because some of those bits go to a sign and an exponent (and not just the mantissa) the double data type will have to drop some of the least significant digits of our great big number.

Data type conversion occurs in one of 3 forms: assignment conversion, arithmetic promotion, and casting. Assignment conversion occurs when you assign a value of one type to a variable of another type. Java only allows widening conversions of this type. So, a conversion from an int variable to a double variable as in this example is permissible. Here Java will automatically convert the int value to a double value before placing the value in the double variable.

The arithmetic promotion conversion type occurs when you use an arithmetic operator, like division (/), on two operands of different types. One example of this occurs in the following program Convert2.java. Here we have one variable of type double and one variable of type int. When the division operation is performed, because one of the two values is a floating point numeric value, the result of the division is a floating point value. This means that the int value has to first be temporarily converted to a double, then the division operation may be executed.

Finally, the most general form of data type conversion is casting. This Java operator is used by placing the intended data type name in parentheses in front of a value as in this example line taken from Convert3.java:

int smallGuy = (int) bigGuy;

Here the value of the bigGuy variable is turned into an int value before that value is stored in the smallGuy variable. Since bigGuy was a variable of type double and its value had a reminder, that remainder is truncated and lost when the value is turned into an int. Note, however, that this does not change the value that is stored in the bigGuy variable. The end result is that smallGuy = 100 and that bigGuy = 100.36 after the cast and assignment takes place.

Program Statements and Flow of Control

After discussing several different object types, we now shift gears to talk about program statements. These important tools help control the flow of control for your program. They give it the ability to do more than just execute code in a straight top-down, line-by-line manner. One way to alter the top-down flow of control is to call a method, which causes the flow to jump to the code for that method. Another is to use program statements to make decisions about what piece of code should be run next.

We've already seen one program statement in Math5.java: the for-loop. The for-loop program statement had a header section and a body section. The header section updated a counter variable and ran through the body section of code any number of times. It also made a decision about whether to run that body of code each time or drop out of the loop. This was a restricted use of the for-loop. What if I wanted to have my variable count down instead of up? What if I wanted it to do more than just count up by one? Could I have a different condition operator than just <=? To work up to all of the capabilities of for-loop's, we'll first start out with the most basic program statement: the if statement.

The `if` statement

The if statement is fairly simple in that it controls a single block of code, deciding if that code should be executed or skipped. The following code example shows this:

   if(value1 > value2)
      System.out.println("value1 is larger");

Here we just make a decision of true or false for the statement that value1 > value2. The statement inside the parentheses must always evaluate to a true/false boolean value. This could be done with a boolean variable, or with an expression that evaluates to a true/false value as in the above example. We say that these are boolean expressions simply because they are expressions that evaluate to boolean values. Along with the greater than operator (>), we could also use the operators for less than (<), equal to (==), greater than or equal (>=), and less than or equal (<=). The only tricky operator in this group is the equal to operator (==) since it is similar to assignment operator (=). These two operators act quite differently. Using two equals signs causes us to compare two values or variables while the assignment operator takes the value on its right side and assigns it to the variable on its left size. A common mistake is to accidentally use the assignment operator when you meant to compare two operands with the equal to operator.

What happens if the operators I just provided aren't enough? What if you want to execute a line of code only if two different conditions were both met? How about if we want at least one of two different conditions were met? How about if want the negative of some condition, meaning that it didn't evaluate to true? These cases are taken care of by the logical operators for AND (&&), OR (||), and NOT (!). This could allow us to create a complicated expression like the following:

   if((value1 == 0) && ((value1 > value2) || !(value1 <= value3)))
      System.out.println("Either value2 or value3 is negative.");

The boolean expressions that we've been creating so far seem to make sense for integer variable comparisons. What if we're dealing with char variables, or floating point values, or String objects? String's are determined to be equal if all of their characters are equal. The methods equals and equalsIgnoreCase can be used to test the equality of two different String's.

In the first example given previously, we printed out a string literal if value1 was larger than value2. If we also wanted to print out a different statement if value1 was not larger (meaning that value1 <= value2) we could do this using an if-else statement as shown below.

   if(value1 > value2)
      System.out.println("value1 is larger");
   else
      System.out.println("value1 <= value2");

If the condition for the if statement evaluates to true, then the line of code immediately below it is run. Otherwise, the line of code immediately below the else reserved word is run. If we wanted to run more than a single line of code, we would have to use curly braces.

   if(value1 > value2) {
      System.out.println("Which value is larger?");
      System.out.println("value1 is larger.");
   }
   else {
      System.out.println("Actually...");
      System.out.println("value1 <= value2.");
   }

These curly braces create something called a block statement. This is the same as for the body definitions of methods and classes as we've seen earlier. If we leave out the curly braces, odd things can happen.

   if(value1 > value2) {
      System.out.println("Which value is larger?");
      System.out.println("value1 is larger.");
   }
   else
      System.out.println("value1 <= value2.");
      System.out.println("This is always printed.");

Without the curly braces to mark a block statement of grouped code the last two println statements are separated. If the value1 > value2 statement evaluates to false, only the first line of code immediately following the else reserved word will be run. The other lines coming after this are unaffected by the if-else statement, so the last println statement will always be run and print the text: This is always printed.

One case where if-else statements can get particularly trickey is with nested if-else statements. This code example illustrates that:

   if (num1 < num2)   
      if(num1 < num3) 
         min = num1;
      else          
         min = num3;
   else              
      if (num2 < num3) 
         min = num2;
      else               
         min = num3;

The goal is to find the minimum of the three "num" variables and assign its value to the min variable. The rule to follow when figuring which else statements go with which if statement is just like we would have for nested parentheses or nested curly braces. An else statement is always matched up with the nearest (in the code above it) unmatched if statement. Comments with matching numbers are used to match up if-else pairs below.

   if (num1 < num2)      // pair 1
      if(num1 < num3)    // pair 2
         min = num1;
      else               // pair 2
         min = num3;
   else                  // pair 1
      if (num2 < num3)   // pair 3
         min = num2;
      else               // pair 3
         min = num3;

While knowing the rule for matching up if-else pairs is important (you should know it) this confusion can be avoided. Just like using parentheses in arithmetic expressions reduces the confusion of operator precedence, curly braces can be used to show which lines of code go with which if or else statement. You're encouraged to try out this if-else code for yourself to be certain that everything makes sense to you.