- General
- Indexing and slicing
- String is a sequence
- Reversing
- Sorting
- Operators
- Escape characters and raw strings
- Escape characters
- Raw strings
- Built-in functions
- ascii()
- chr()
- format()
- input()
- ord()
- print()
- str()
- Methods
- Case conversion
- Search and replace
- Classification
- Formatting
- Conversion
- Mappings
- f-strings
- Formatting
- expression
- conversion
- format
- string module
- io.StringIO
Strings is immutable!!!
s = 'abcdefgh'
t = type(s) # t = <class 'str'>
a = s[1] # a = 'b'
b = s[-2] # b = 'g'
c = len(s) # c = 8
d = s[1:-2:2] # d = 'bdf'l = list('abcdefg') # l = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
m = [i for i in 'abcdefg'] # m = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
# see str.split()s = 'abcdefgh'
r = s[::-1]
r = ''.join(reversed(s)) # r = 'hgfedcba's = 'tbyjdfasertgert'
r = ''.join(sorted(s, reverse=True)) # r = 'ytttsrrjgfeedba'# +
a, b = '123', 'abc'
c = a + b # c = '123abc'
# see str.join()
# *
a = 'a'
b = a * 10 # b = 'aaaaaaaaaa'
# in
a = ('bc' in 'abcd') # a = TrueEscape sequences:
\<newline> Backslash and newline ignored
\\ Backslash (\)
\' Single quote (')
\" Double quote (")
\a ASCII Bell (BEL)
\b ASCII Backspace (BS)
\f ASCII Formfeed (FF)
\n ASCII Linefeed (LF)
\r ASCII Carriage Return (CR)
\t ASCII Horizontal Tab (TAB)
\v ASCII Vertical Tab (VT)
\ooo Character with octal value ooo
\xhh Character with hex value hh
Escape sequences only recognized in string literals:
\N{name} Character named name in the Unicode database
\uxxxx Character with 16-bit hex value xxxx
\Uxxxxxxxx Character with 32-bit hex value xxxxxxxx
Unicode codes and characters...
Universal newlines:
\n Line Feed Unix
\r Carriage Return Macintosh
\r\n Carriage Return + Line Feed Windows
\v or \x0b Line Tabulation
\f or \x0c Form Feed
\x1c File Separator
\x1d Group Separator
\x1e Record Separator
\x85 Next Line (C1 Control Code)
\u2028 Line Separator
\u2029 Paragraph Separator
import string
h = string.whitespace
# h = ' \t\n\r\x0b\x0c'Raw string treats the backslashes \ as literal characters.
Unless an ‘r’ or ‘R’ prefix is present, escape sequences in strings are interpreted according to rules given above.
s = 'abc\tdef\nghi' # s = 'abc\tdef\nghi'
print(s)
#output:
# abc def
# ghi
s = r'abc\tdef\nghi' # s = 'abc\\tdef\\nghi'
print(s)
#output
# abc\tdef\nghis1 = r'abc\tdef\nghi'
s2 = 'abc\\tdef\\nghi'
a = s1 == s2 # a = True
b = s1 is s2 # b = Truea = len('\n') # a = 1
b = len(r'\n') # b = 2A raw string cannot end with an odd number of backslashes !!!
s = r'\\\' # SyntaxError: EOL while scanning string literalUsed for low level Windows path handling:
path = 'c:\user\task\new'
# SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \uXXXX escapecorrect:
path = r'c:\user\task\new'
# or
path = 'c:\\user\\task\\new'Be careful with \ at the end !!!
Raw strings with repr():
s = 'abc\tdef\nghi'
v = repr(s) # v = "'abc\\tdef\\nghi'"See also f-strings for f'string' and bytes for b'string'
Returns only ascii characters. Escaping others.
t = """
Ńńżź"""
s = ascii(t) # s = "'\\u0105\\u0104\\u0106\\u0107\\n\\u0143\\u0144\\u017c\\u017a'"
print(s)
# output: '\u0105\u0104\u0106\u0107\n\u0143\u0144\u017c\u017a'
q = ascii('∰') # q = "'\\u2230'"
w = ascii('a') # w = "'a'"Function returns a string from a Unicode code intege.
a = chr(65) # a = 'A'
b = chr(0x104) # b = 'Ą'
c = chr(0x2230) # c = '∰'See f-strings...
reads line from input, converts it into string.
s = input('>')
#> 46
# s = '46'Returns the Unicode code from a given character.
a = ord('A') # a = 65
b = ord('Ą') # b = 260 '0x104'
c = ord('€') # c = 8364 '0x20ac'Print objects to the text stream file, separated by sep and followed by end. All non-keyword arguments are converted to strings.
print() # prints 'end' be default \n
print(1, 2, 3, 4) # 1 2 3 4\n
print(1, 2, 3, 4, sep=',', end='...') # 1,2,3,4...file keyword:
import sys
print()
# is equivalent
print(file=sys.stdout)with open('text.txt', 'w') as f:
print('test...', file=f)
# text.txt: test...flush keyword:
Flushes buffered output.
Returns a string representation of an object. -> __str()__
a = str() # a = ''
b = str(12) # b = '12'For bytes:
s = 'ĄĄĄĄ'
a = s.encode('utf-8')
# !! or !!
a = bytes('ĄĄĄĄ', encoding='utf-8') # a = b'\xc4\x84\xc4\x84\xc4\x84\xc4\x84'
b = str(a) # b = "b'\\xc4\\x84\\xc4\\x84\\xc4\\x84\\xc4\\x84'"
c = str(a, encoding='utf-8') # c = 'ĄĄĄĄ'
d = str(a, encoding='ascii', errors='ignore') # d = ''
# !! or !!
e = a.decode('utf-8') # e = 'ĄĄĄĄ'capitalize()
lower()
swapcase()
title()
upper()
casefold() - more aggressibe than lower()
count()
s = 'ababababababababababa'
v = s.count('ab') # v = 10
v = s.count('ab', 10, -2) # v = 4endswith()
find() - return index, if want to check only use 'in'
index() - Like find(), but raise ValueError when the substring is not found.
rfind()
rindex()
replace()
startswith()
isalnum()
isalfa()
isdigit()
isidentifier()
islower()
isprintable()
isspace() - whitespace
istitle()
isupper()
format()
format_map()
center()
expandtabs()
s = '1\t2\t3'
v = s.expandtabs(4) # v = '1 2 3'ljust()
lstrip()
rjust()
rstrip()
strip()
s = ' abc '
v = s.strip() # v = 'abc'
s = '# .............. abceef #. qwert........#.'
v = s.strip('# .') # v = 'abceef #. qwert'zfill()
v = '34'.zfill(5) # v = '00034'
v = '-34'.zfill(5) # v = '-0034'removeprefix() - 3.9
removesuffix() - 3.9
join()
Return a string which is the concatenation of the strings in iterable.
l = ['a', 'b', 'c']
j = '_'
s = j.join(l) # s = 'a_b_c'partition()
Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings.
s = 'asd asd d asda XXX dfgg d dfgd XXX df XXX'
v = s.partition('XXX') # v = ('asd asd d asda ', 'XXX', ' dfgg d dfgd XXX df XXX')rpartition() - from right side
rsplit() - from right side
split()
Return a list of the words in the string, using sep as the delimiter string.
s = 'ab cd ef'
v = s.split() # v = ['ab', 'cd', 'ef']
s = 'ab<>cd<>ef<>gh'
v = s.split('<>', maxsplit=2) # v = ['ab', 'cd', 'ef<>gh']splitlines()
Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.
See list of universal newlines.
s = 'ab c\n\nde fg\rkl\r\n'
v = s.splitlines() # v = ['ab c', '', 'de fg', 'kl']
v = s.splitlines(keepend=True) # v = ['ab c\n', '\n', 'de fg\r', 'kl\r\n']encode()
Return an encoded version of the string as a bytes object. Default encoding is 'utf-8'.
s = 'ąćźabc'
b = s.encode(encoding='utf-8') # b = b'\xc4\x85\xc4\x87\xc5\xbaabc'
b = s.encode(encoding='ascii')
# UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
b = s.encode(encoding='ascii', errors='replace') # b = b'???abc'Possible errors keyword:
strict, ignore, replace,xmlcharrefreplace, backslashreplace, namereplace
Standard encodings
translate()
maketrans()
Other (older) methods of string formatting:
- str.format()
- %-formatting
- string.Template
f-strings are faster !!!,
Can be multiline and nested.
Can't be empty, can't contain \.
general format:
print(f'{expression!conversion:format}')variable, object, expression
!s - str() - default
!r - repr()
!a - ascii()
:[[<fill>]<align>][<sign>][#][0][<width>][<group>][.<prec>][<type>]
Constants:
import string
a = string.ascii_letters
# a = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ; lowercase + uppercase
b = string.ascii_lowercase
# b = 'abcdefghijklmnopqrstuvwxyz'
c = string.ascii_uppercase
# c = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
d = string.digits
# d = '0123456789'
e = string.hexdigits
# e = '0123456789abcdefABCDEF'
f = string.octdigits
# f = '01234567'
g = string.punctuation
# g = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
h = string.whitespace
# h = ' \t\n\r\x0b\x0c'; \x0b - \v - line tabulation; \x0c - \f - form feed
i = string.printable
# i = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
# digits + ascii_letters + punctuation + whitespaceAnd...
- class Formatter - > str.format
- class Template
- capwords()
StringIO(initial_value='', newline='\n')
A text stream using an in-memory text buffer.
Methods:
'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'getvalue', 'isatty', 'line_buffering', 'newlines', 'read', 'readable', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines'
getvalue() - returns entire buffer - string.