Open In Colab

Strings#

Indexing#

Python has some great ways to deal with strings. First off you should recognize that python allows indexing on strings.

hello = 'Hello World'

hello[0]
'H'

Since indexing is allowed you can also use a string as the counter in a list!

for letter in hello:
  print(letter)
H
e
l
l
o
 
W
o
r
l
d

There are of course many more exotic ways to access parts of stings.

Below I get the last item with negative indexing which starts at the back and works its way forward.

hello[-1]
'd'

Next I get everything but the last.

hello[:-1]
'Hello Worl'

How about every other character?

hello[::2]
'HloWrd'

Perhaps a little more explanation is needed here. In the indexing x[a:b] you will get the \(a^{th}\) element up to the \(b-1\) element. You can also index x[a:b:n] where you only pick every \(n^{th}\) element from that list. Not sure how important this may be for data cleaning but it is availabe. When one of the numbers is not included, you will do this over the entire string.

hello[::]
'Hello World'

Reversing a string is a snap too!

hello[::-1]
'dlroW olleH'

String Operations#

There are some essential string operations!

Addition will combine two strings.

goodbye = 'Goodbye Cruel World'

hello + " " + goodbye
'Hello World Goodbye Cruel World'

I actually combined three strings there putting a space inbetween the variables for ease of reading.

Multiplying is allowed on strings

hello * 2
'Hello WorldHello World'

If you want to use a non-string with a string make sure to convert.

x = 3

hello + str(x)
'Hello World3'

In and Not In#

In and not in are very useful in string manipulations.

if 'Hello' in hello:
  print('Hello is in the string "{0}"'.format(hello))
Hello is in the string "Hello World"
if 'hello' not in hello:
  print('hello is not in the string "{}"'.format(hello))
hello is not in the string "Hello World"

The difference here is very slight just whether ‘Hello’ has been capitalized or not!

Capitalize#

Capitalizing can make a big difference in how we see a string. Most likely you capitalize parts of your name but does that make the word different? This could be a big issue for us (we actually saw this in Iowa Liquor data set!)

Capitalize will capitalize the first letter.

str.capitalize('hello')
'Hello'

upper and lower will convert the entire string to upper case or lower case.

str.upper('hello')
'HELLO'
str.lower('Hello')
'hello'
str.lower('HELLO')
'hello'

Spaces#

Sometimes spaces will creep into your data. Be aware that will effect the difference in strings!

hello_with_space = ' Hello World'

hello == hello_with_space
False

To fix this we can remove leading spaces with lstrip

hello_with_space.lstrip()
'Hello World'
hello_with_space.lstrip() == hello
True

You can also check if a string is just a space. This command will return true on other more exotic space like items too.

" ".isspace()
True

Find, Replace and Split#

We can find the index in a string of another string.

hello.find('Wor')
6

Of course this is just where ‘Wor’ starts!

If we want to replace that string with another, we can.

hello.replace('Wor', 'worrrrrr')
'Hello worrrrrrld'

This will replace EVERY occurance of ‘Wor’ if we wanted to limit how many times it could be done, we give the replace command one more integer argument saying how many replacements to do.

split will create new stings out of the old on a character you define. It creates a list of the new strings. Below I have split at the space.

hello.split(' ')
['Hello', 'World']

You Turn#

Write a program to print the ‘Happy Birthday’ song including your name. Try to do it with as few strings as possible.

Continued Reading#