Remove special and control characters from string using Python

less than 1 minute read

Today I had a database table which contained special and control characters (like DLE, DC1, ACK, DC3 etc.) in its records. After some searching I found a way to substitute those characters with their respective unicode escape value. And here Python comes to the rescue again!

#!/usr/bin/python
# -*- coding: utf-8 -*-

# DLE, DC1, ACK, DC3 characters
# a string with special and control characters
bob = """`Αδεια διεξαγωγής αγώνων κυνηγετικών ικανοτήτων σκύλων δεικτών."""

# bob string encoded in escaped unicode string
escapedbob = "\u0010`\u0391\u03b4\u03b5\u03b9\u03b1 \u03b4\u03b9\u03b5\u03be\u03b1\u03b3\u03c9\u03b3\u03ae\u03c2 \u03b1\u03b3\u03ce\u03bd\u03c9\u03bd \u03ba\u03c5\u03bd\u03b7\u03b3\u03b5\u03c4\u03b9\u03ba\u03ce\u03bd \u03b9\u03ba\u03b1\u03bd\u03bf\u03c4\u03ae\u03c4\u03c9\u03bd \u03c3\u03ba\u03cd\u03bb\u03c9\u03bd \u03b4\u03b5\u03b9\u03ba\u03c4\u03ce\u03bd.\u0010\u0011\u0006\u0013\u0010" 
def main():
    
    # encode special characters to unicode-escape
    print bob.decode('utf-8').encode('unicode-escape')

    # decode unicode-escape to simple characters
    print escapedbob.decode('unicode-escape')

    # exit
    exit(0)

if __name__ == '__main__':
    main()


# used tutorial
# http://stackoverflow.com/questions/10268518/python-string-to-unicode

Good luck tampering and hacking with your database strings !

Comments