String in Python
Computers understand bytes but humans don’t. Machines can read data in binary but humans can’t. So, along with creating data structures for machines, computer scientists created data structures for humans too. Among all types of data in any programming language, string is the most user friendly and best understood by humans with bare eyes.
As everything is object in Python, string is no exception to that. A string is nothing but a ordered collection of characters. Python do not have anything called character data type. To represent character we use a string that contains only one character.
Creating Strings
The most basic way of creating string in Python is writing some characters inside quotation marks in the source code. We call it literal string. For example,
a_string = "Hey, you have done it! I am the string you are looking for."
Literal strings can be created with single or double quote.
a_single_quoted_string = 'I am a single quoted string.'
String can be created by placing triple quote at the start and at the end of the string. I am going to discuss about triple quoted string in details later in this article.
These are not the only ways you can create strings. There are many other functions and methods that return string as a result. If you open a file for reading in text mode and you call the read()
method, you will get a string in return.
my_str = ""
with open("info.txt", "r", encoding="utf-8") as f:
my_str = f.read()
Calling decode()
method on a byte object also returns string.
my_bytes = b'I am not a string, I am a byte object'
my_str = my_bytes.decode(encoding='utf-8')
There are a lot of other ways in which you can create or get string. You will learn many of them in your journey to the awesome land of Python.
Modifying Strings
Unlike many other languages string
s are immutable in Python. That means you cannot modify string in Python. There are many string manipulation functions and methods in Python but none of them can modify the original string. The string on which the manipulation functions or methods are executed are kept intact and a new string object is returned by those functions or methods. For example,
my_str = "The word THIS will be replaced by the word THAT"
my_str2 = my_str.replace("THIS", "THAT")
print("The original string: ", my_str)
print("The string after manipulation: ", my_str2)
print("The id of my_str: ", id(my_str))
print("The id of my_str2: ", id(my_str2))
Run the above code to see similar output like below:
The original string: The word THIS will be replaced by the word THAT
The string after manipulation: The word THAT will be replaced by the word THAT
The id of my_str: 2204934842288
The id of my_str2: 2204934841904
Look at the result after replacing the word ‘THIS’ from my_str
. After executing the manipulation method, the original string was intact and it returned a new string. Also look at the string objects ids in memory – they are different. The ids will be different on each run of the program and also will be different in different machines. So, don’t worry if they look different on your machine than mine.
Concatenation
Concatenation means making a new string by adding one or more string objects. As a result all the string objects are placed side by side and a new string object is returned. In Python we concatenate string by the +
operator.
str1 = "Hello, "
str2 = "world!"
result_str = str1 + str2
print(result_str)
Output:
Hello, world!
Slices
Slicing string means extracting parts of a string with the help of its character indexes. String slicing in Python is very easy and fun. You have to put the start index and end index of the characters separated by a colon in square brackets. For example,
my_str = "Knock knock.\nRace condition.\nWho's there?"
slice1 = my_str[:13]
slice2 = my_str[13:29]
slice3 = my_str[29:]
print("The joke is:\n" + my_str)
print("Slice 1: " + slice1)
print("Slice 2: " + slice2)
print("Slice 3: " + slice3)
Output:
The joke is:
Knock knock.
Race condition.
Who's there?
Slice 1: Knock knock.
Slice 2: Race condition.
Slice 3: Who's there?
Notice that I omitted the starting index for the first slice and omitted end index for the last slice. If you are referring to the start and the end you can code in the similar manner.
IN/NOT IN
in
is a membership checking operator in Python. It is widely used for list, dictionary and set. You can use them in the same way for strings too. Let’s say we want to check whether the word ‘stupid’ is present in a string.
my_str = "Python programmers are not stupid."
if 'stupid' in my_str:
print("Caution, a stupid is in there!")
else:
print("Ah, no stupid present!")
Output:
Caution, a stupid is in there!
Formatting
Every time you want to put values from different variables to string you need to convert all other objects to string and concatenate them all with +
operator. But it gets dirty soon. We need a better way to do it to make the world of Python a better place for programmers. Let’s see a dirty code first.
a = 1
b = 3.1416
c = "pi"
my_str = str(1) + ") The value of " + c + " is: " + str(b)
Just imagine if you had 10 or more such values to be put in a string, what would you do.
Don’t worry, you do not have to make your code so dirty with all those extra +
operators, quotation marks and str()
function. Python is a better land for programmers. You can make your life easier with string formatting in Python.
String formatting in Python is similar to that of C programming language. But in this article we are not going to learn C or string formatting in C. We are going to use Python. String formatting in Python is easy and fun.
Before going into theory, let’s see an example:
a = 1
b = 3.1416
c = "pi"
my_str = "%d) The value of %s is: %f" % (a, c, b)
print(my_str)
Output:
1) The value of pi is: 3.141600
See the magic? Your code is cleaner now.
Python uses %
as the indicator of a format. The character(s) after it indicates which type of object is going to be inserted at the specified position. By d
we mean integer, by f
we mean a floating point number, by s
we mean a string. Look at the official Python reference documentation for other formatting characters. The values that you want to put in the string can be placed after the string with a % sing. If you have more than variables/values to put in the string you should put them in a tuple.
Triple Quoted Strings
In the string slicing section of this article we had a joke in a string. But there was a problem. The joke was multiline and to represent multiline string we had to write the whole joke in a single line and create newlines with the escape sequence \n
. It is really a pain if we always need to use escape characters to represent newlines. Just imagine, you have thousands of lines of string to be processed and you have to put that whole big thing in a single line and replace all the newlines with the escape sequence \n
or \r\n
. Python is well aware of your tension and it has some cool features to represent multiline string in code. You have to put three single or double quote at the start and at the end of the string. Now you are free to write or copy-paste any text with newlines just inside your python source code. Let’s see an example with our ‘race condition’ joke.
my_str = '''Knock knock.
Race condition.
Who's there?
'''
print(my_str)
Output:
Knock knock.
Race condition.
Who's there?
You can do the same with triple double quote.
my_str = """Knock knock.
Race condition.
Who's there?
"""
print(my_str)
startswith(), endswith() Methods
In programming we often need to check whether some string has something at the beginning or at the end. String object in python has builtin method for that. Let’s say we want to check whether some web URL is secure or not.
my_url1 = "https://example.com"
if my_url1.startswith("https"):
print("You are on a secure connection")
else:
print("Caution! You are on a unsecure connection")
Output:
You are on a secure connection
Now, say you have a content writer who often puts multiple dots at the end of the lines that are not professional looking. So, you have a python program that checks if any line ends with multiple dots. The multiple dots checking code should look like the following:
line = "Twinkle, Twinkle, Little Star........."
if line.endswith(".."):
print("Warning! more than one dot present at the end of this line")
else:
print("Clean! move on")
Output:
Warning! more than one dot present at the end of this line
We checked whether at least two dots are present at the end of the line. In this way we can check whether the content writer was really careless or not.
find() Method
The find method of string is used to find the starting index of another sub-string.
my_str = "I am WANTED"
start_idx = my_str.find("WANTED")
print("The start index of the word WANTED in my_str is:", start_idx)
Output:
The start index of the word WANTED in my_str is: 5
If the sub-string is not found the find()
method will return `-1“.
my_str = "I am WANTED"
start_idx = my_str.find("NOT WANTED")
print("The start index of the sub string NOT WANTED in my_str is:", start_idx)
Output:
The start index of the sub string NOT WANTED in my_str is: -1
isalpha(), isdigit() Methods
The isalpha()
method of string is used to check whether the string contains only alphabetic characters. It returns boolean True or False.
my_str = "AllTheCharactersInThisStringAreAlphabetic"
if my_str.isalpha():
print("Characters in the string are alphabetic")
else:
print("Characters in the string are not alphabetic")
Output:
Characters in the string are alphabetic
Now, put only spaces in the string to check it again.
my_str = "All The CharactersIn This String Are NOT Alphabetic"
if my_str.isalpha():
print("Characters in the string are alphabetic")
else:
print("Characters in the string are not alphabetic")
Output:
Characters in the string are not alphabetic
The isdigit()
is used to check whether all the characters in the string represent digit or not. This method returns boolean True or False.
my_str = "1232345234"
if my_str.isdigit():
print("The string contains only digits")
else:
print("The string does not contain digit only")
Output:
The string contains only digits
isupper(), islower() Methods
isupper()
is used to check whether alphabetic characters in the string are in uppercase or not. It returns boolean True or False
my_str = "ALL THESE ARE IN UPPERCASE"
if my_str.isupper():
print("The string contains uppercase characters only")
else:
print("The string does not contain uppercase characters only")
Output:
The string contains uppercase characters only
In the same way, islower()
method is used to check whether every alphabetic characters in the string are in lowercase or not. Like isupper()
it returns boolean value.
my_str = "this string contains only lowercase alphabetic characters"
if my_str.islower():
print("The string contains lowercase characters only")
else:
print("The string does not contain lowercase characters only")
Output:
The string contains lowercase characters only
strip(), rstrip(), lstrip() Methods
We often find unwanted characters at the start or at the end of the strings. Think about multiple dots at the end of the string as we discussed before in this article. Python string has three builtin methods for stripping those unwanted characters off the string. All three methods strip white spaces (newline, white space, and tab characters) if no parameter is given. When you provide a string as a parameter, the characters from that string is stripped off the target string.
strip()
is used to strip characters from both end of the target string.
my_str = "...I have unwanted dots on both side of me..."
stripped_str = my_str.strip(".")
print("Stripped string is: " + stripped_str)
Output:
Stripped string is: I have unwanted dots on both side of me
But if you need only to strip from the left side of the string then you can use the lstrip()
method.
my_str = "...I have unwanted dots on both side of me..."
stripped_str = my_str.lstrip(".")
print("Stripped string is: " + stripped_str)
Output:
Stripped string is: I have unwanted dots on both side of me...
In the same way if you want to strip characters from the right side of the string you have to use rstrip()
my_str = "...I have unwanted dots on both side of me..."
stripped_str = my_str.rstrip(".")
print("Stripped string is: " + stripped_str)
Output:
Stripped string is: ...I have unwanted dots on both side of me
isspace() Method
isspace()
method is used to check whether the target string only contains space characters. Usually that means it can check newline \n
, \r
, white space and tabs \t
, \v
. But those are not the only characters that this methods detects. Python is fully unicode aware. All those characters that are defined as space in the unicode spec are checked by this method.
my_str = "\n "
if my_str.isspace():
print("This string has space characters")
else:
print("This string doesn't have space characters")
Output:
This string has space characters
len() Function
The builtin len()
function can be used on different objects. When used with string it tells us how many characters are inside the target string.
my_str = "This string contains 34 characters"
no_of_char = len(my_str)
print(no_of_char)
Output:
34
splitlines() Method
Remember that race condition
joke? We separated the lines by slicing. But to do that we had to count the character positions and we had to put start and end indexes accordingly. But we do not want to do that. We know that each line were separated by newlines and we want to get different lines with the help of that marker.
joke = '''Knock knock.
Race condition.
Who's there?'''
joke_lines = joke.splitlines()
print(joke_lines)
Output:
['Knock knock.', 'Race condition.', "Who's there?"]
Look at the result. Now, we have all the lines separated and returned as a list of string with the help of splitlines()
.
For a Python programmer, the life is impossible without string. To get the full potential and power of it, you have to know each and ever way of working with it in the language. You have to keep practicing everyday before you become expert in it. After you are quite comfortable with various string operations you should learn regular expressions to go far and beyond.
Note: this article of mine was also published on: mypythonquiz.com
Leave a Reply