Data types and statements

Data types and statements#

In this session, you will learn about the different data types Python can handle, about storing data in variables and modifying their contents through assignment operators, and about some of the essential basic functions in Python. At the end of this session, you should be able to write simple programs that assign, manipulate and return different types of data.

Data types#

When speaking of data types, we are refering to different classes that a value can belong to. Values always belong to exactly one data type, that is, if a value is of the integer type, for instance, it cannot simultaneously be of the float type. Integer values can be transformed into float values, however, as we will see below.

Boolean#

The Boolean data type only encodes the logical values True and False, encodable in bytes as 1 and 0, respectively. The Boolean data type is highly useful in programming. Many operators and functions return Boolean values to indicate whether a statement fulfills a certain condition (e.g., is x bigger than y?)

a = False
print(type(a))

<class 'bool'>

3 > 2 #the 'bigger than' operator is an example for an operator return a Boolean value

True

Question: Why does the following not work?

false = 1
print(a == 1)

False

Capitalization matters! Just as natural languages have rules on capitalization (e.g. requiring capitalization of nouns in German), Python will only recognize True and False, with capital T/F as boolean values.

Here, false is assumed to refer to a variable named false, not the Boolean value False, and it sets the value of that new variable to 1. So the variable a (whose value is False) does not have the same value as the variable false.

Numbers#

Values that are numbers can belong to three distinct categories: integers, floating-point numbers, or complex numbers.

Integer#

Integer values are whole numbers, e.g. -300, 0, 300.

Integer values can be stored in bits. In a 32-bit encoding, integers up to the size of 2.147.483.648 can be respresented.

Binary encoding uses 2’s complement. Let’s take a look at binary encodings of integers in a 16-bit system:

decimal	16-bit binary
	32768	16384	8192	4096	2048	1024	512	256	128	64	32	16	\(2^3\) = 8	\(2^2\) = 4	\(2^1\) = 2	\(2^0\) = 1
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
10	0	0	0	0	0	0	0	0	0	0	0	0	1	0	1	0
150	0	0	0	0	0	0	0	0	1	0	0	1	0	1	1	0
32000	0	1	1	1	1	1	0	1	0	0	0	0	0	0	0	0

b = 300

print(type(b))

<class 'int'>

Float#

Positive and negative numbers with a decimal point are called floating-point numbers. Floating-point numbers are expressed by their sign (positive/negative), their mantissa, and their exponent to the power of two.

15.5 = +1.9375 \(\times\) \(2^3\)

This type of representation allows for efficient encoding and representations of very large numbers in binary form. Let’s see how this is achieved:

In a 16-bit encoding, 1 bit is allocated to the sign (0 = positive, 1 = negative), 6 are allocated to the exponent (in binary encoding using 2’s complement), and 9 are allocated to the mantissa (in binary encoding using 2’s complement with negative exponent).

Since there may be multiple matching mantissa and exponents for representing a number like 15.5, we assume a normalized mantissa that always carries a 1 before its decimal point, i.e. 1 \(\leq\) mantissa < 2. Since it is redundant to encode the 1 and decimal point, they are left out of the encoded form, but can always be understood to be present at the left of the mantissa (see below).

Given x = 15.5

Determine the sign: positive sign = 0
Determine the exponent: find the largest 2^e \(\leq\) x
Determine the mantissa: m = x / 2^e with e according to step 2.
Transform to binary encoding:

sign	(pre-decimal)	reduced mantissa									exponent
pos.	(1.)	2^-1 = \(\frac{1}{2}\)	2^-2 = \(\frac{1}{4}\)	2^-3	2^-4	2^-5	2^-6	2^-7	2^-8	2^-9	32	16	8	4	2	1
0		1	1	1	1	0	0	0	0	0	0	0	0	0	1	1

c = 15.5

print("c is of type",type(c))

c is of type <class 'float'>

#Note that we can transform integer numbers into float numbers and the other way around.

#1. We can transform our int variable b into type float as follows:

d = float(b)
print("d equals",d)
print("d is of type",type(d))

d equals 300.0
d is of type <class 'float'>

#2. We can transform our float variable c into an integer as follows:
e =int(c)
print("e equals",e)
print("CAREFUL! Note that tranforming floats into integers may change the value of your variable due to the loss of decimals. You will not receive a warning when this happens!")

e equals 15
CAREFUL! Note that tranforming floats into integers may change the value of your variable due to the loss of decimals. You will not receive a warning when this happens!

Complex#

As you may remember from maths classes, complex numbers employ the specific element i, called the imaginary unit, where i² = −1; every complex number can be expressed in the form a + bi, where a and b are real numbers.

For some reason (feel free to go down the rabbithole on that one…), Python uses j instead of i to represent the imaginary unit. Complex numbers can thus be written as x = a + bj.

f = 1 + 2j
print("f is of type",type(f))

print(f)

f is of type <class 'complex'>
(1+2j)

What happens if we leave out either the real or imaginary part of the numbers? Let’s try it out:

g = 2j
print("g is of type",type(g))

print(g)

g is of type <class 'complex'>
2j

h = 1 + 0j
print("h is of type",type(h))

print(h)

h is of type <class 'complex'>
(1+0j)

Maths operators#

We can perform a range of simple mathematical operations on numerical data types, such as subtraction, addition, multiplication, and so on…

Operator	Operation	Example	Evaluates to
-	Subtraction	5 - 2	3
+	Addition	2 + 2	4
*	Multiplication	2 * 2	4
/	Division	5 / 2	2.5
**	Exponent	2 ** 3	8
%	Modulus/remainder	22 % 8	6
//	Integer division/floored quotient	22 // 8	2

End of lecture 1 (October 25, 2022).#

Strings#

String values are sequences of characters (length \(\geq\) 0). Internally, they are stored as a sequence of letters, each of which has a specific bit encoding. Strings have to obey a number of rules:

must be surrounded by single ( ‘string’ ) or double quotes ( “string” )
only the other type of quotes is allowed inside a string:
- ‘I am a “string”’ is a string
- “I am a ’string’” is also a string
- “I am a “string”” is not a valid string
the backslash has special significance as an escape character, “some\time\ago” will look different from what you think
to produce an actual backslash, you need to have it twice: “some\time\ago”
Other common escape characters are:
- \b - Backspace
- \r - Carriage Return
- \n - New Line
- \’ - Single Quote
- \t - Tab

i = "Hello world!"

print(i)
print(type(i))

Hello world!
<class 'str'>

Simple string operations#

You cannot perform mathematical operations on strings, with two exceptions:

The + operator represents concatenation of strings, not addition. Concatenation means joining two strings by their ends.
Strings can be replicated by mutiplication with an integer number.

j = "\nWhere\nare\nyou?"

print(i+j)

Hello world!
Where
are
you?

print(j*2)

Where
are
you?
Where
are
you?

There are, however, a number of functions with specific functionalities for string operations.

len() : returns the length of a string in number of characters. len() is an example of a built-in function which can also be applied to other objects representing sequences

len(i)

startswith() and endswith(): tests whether a string starts/ends with a specific string.
- Note that these are methods specific to string objects. They are therefore called on the object itself, typed as str.startswith(“x”).
- They are sensitive to capitalization.
- You can feed a string of arbitrary length into startswith()/endswith(), even the full string whose contents you are testing.

print(i.startswith("He"))

True

print(i.startswith("he"))

False

print(j.endswith("!"))

False

print(j.endswith(j))

True

in and index: in is an operator that asks whether an object is contained in a sequence, whereas index() is a method that returns the position of that object in a sequence
- in will only return True or False, but will not tell you how many times an object is contained within the sequence
- index() starts counting at 0

print("mäh" in "Rasenmäher")

True

print("Rasenmäher".index("mäh"))

lower() and upper(): methods than return lower- or uppercase copies of the strings they operate on

print(i.lower())

hello world!

print(j.upper())

WHERE
ARE
YOU?

replace(): method that replaces all instances of the first type with instances of the second
- an optional third argument can limit how many instances of the first type should be replaced

k = "Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun."

print(k)

Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.

print(k.replace(" ","_"))

Far_out_in_the_uncharted_backwaters_of_the_unfashionable_end_of_the_western_spiral_arm_of_the_Galaxy_lies_a_small_unregarded_yellow_sun.

print(k.replace("a","AAA",2))

FAAAr out in the unchAAArted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.

Slicing: To return just part of a string, you can indicate which part by way of its numerical indices
- str[start : end] will return a copy of the string starting with the character at start and ending with the character at end, where start and end are integer indices, respectively
- str[start : ] will return a copy of the string starting with the character at start
- str[ : end] will return a copy of the string ending with the character at end
- a position -x will be interpreted as len(string)-s
- providing a third number k will return every kth letter: str[start : end : k]

print(k[0:7]) #returns letters 0-7

Far out

print(k[:7]) #returns letters 0-7

Far out

print(k[45:]) #returns all letters from the 45th letter to the end of the string

fashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.

print(k[::5]) #returns every 5th letter

Fu urba hfoe hs amta  anryw.

print(k[+20:-20:5]) #returns every 5th letter, starting 20 letters after the start of the string and ending 20 letters before its end

rba hfoe hs amta  an

Summary table#

Operator	Operation	Example
+	Append	“a “+ “ball” = “a ball”
*	Replication	“a” * 3 = “aaa”
len(str)	Length	len(“ball”) = 4
str.startswith(str)/str.endswith(str)	Checking start and end of string	“ball”.startswith(“b”) = True
str in str	Checking containment in string	“a” in “ball” = True
str.index(str)	Checking position of string in string	“ball”.index(“a”) = 1
str.upper()/str.lower()	Return lower- or uppercase copy of string	“ball”.upper() = “BALL”
str.replace(str,str,int)	Replace number of instances of the first string with instances of the second	“ball”.replace(“l”,”i”,1) = “bail”

Data types that we will cover in later sessions#

Sequences#

List, Tuple, Range

Mappings#

Dictionary, Set

Variables and assignment#

Creating and modifying variables is one of the essential aspects to programming. Variables are names that refer to values. Variables are created upon their first assignment through an assignment operator (=). Attempting to evaluate a variable that has not yet been assigned will result in an error.

Rules and conventions:

Important variables should get meaningful names (e.g., city = “Tübingen”).
Throwaway/Temporary variables get single-letter names (e.g., a, b, i, j, …).
Variable names must start with a letter and can contain only letters, numbers, and underscores. Although they can start with an uppercase letter, it is convention to use lowercase variable names.

l = "a"
m = 2
n = False

print(o)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[31], line 5
      2 m = 2
      3 n = False
----> 5 print(o)

NameError: name 'o' is not defined

Variable assignment#

Note that the assignment operator (=) is not the same as the equality operator known from maths! The assignment statement binds a name, on the left-hand side of the operator, to a value, on the right-hand side. To evaluate whether two sides of an equation, or alternatively two variables, are identical in their expressed value, use the == operator.

The following example illustrates:

o = 17

17 = o

  File "C:\Users\julia\AppData\Local\Temp\ipykernel_5632\2853922412.py", line 1
    17 = o
    ^
SyntaxError: cannot assign to literal

17 == o

True

Once you assign a new value to a variable, you lose (access to) its previous value. It is often wise to create temporary copies of important variables, which you can modify without losing knowledge of the original variable’s value.

o = o + 1

print(o)

Shorthand notation for changing variable values:

a += b increases the value of a by the value of b
analogously, a -= b, *a = b, a /= b
remember that + and * also have functionality for strings!

o += 1

print(o)

study_program = "Linguistics"
city = "Tübingen"
study_program += " in " + city

print(study_program)

Linguistics in Tübingen

Some variable names are illegal. This is because Python uses a set of keywords with special functionality (you have already seen some of them earlier:True, False, in). These keywords define that language’s syntax and structure; therefore they can’t be used as variable names.

Examples	of	keywords
and	as	assert	break	class	continue
def	del	elif	else	except	exec
finally	for	from	global	if	import
in	is	lambda	nonlocal	not	or
pass	raise	return	try	while	with
yield	True	False	None

Expressions vs. statements#

A basic distinction to be aware of when programming is whether you are working on an (evaluating) expression or a statement.

Expressions are pieces of code that evaluate to an object and do nothing else but to perform this evaluation. They can be (any combination of) values, variables, operators, and calls to functions. Typing expressions into the command prompt will cause Python to evaluate it and return its result.

(20 + o) * 3

While expressions ARE something, statements DO something. Statements are instructions for Python to execute. Examples that we have already encountered include variable assignments and the print() function. We will see other examples in later sessions. Typing statements into the command prompt will cause Python to execute that statement. This will not result in the display of results.

p = "I am a statement" #no output from executing this line

Comparison operators#

As we’ve seen, equality can be checked with the built-in == operator. This is not the only comparison operator:

Operator	Operation	Example
==	equality	3 == 2 + 1 (True)
!=	inequality	3 != 1 (True)
< / >	strictly smaller/bigger than	1 < 2 (True)
<= / >=	smaller/bigger than or equal to	1 <= 1 (True)
is	identity (in terms of memory location)	a = 3; b = 3; a is b (True)

Question for practice: How do you think these operators will behave if we use them to compare string or boolean variables? What if we compare two variables of different data types? Try it out!