Remove special and control characters from string using Python
Today I had a database table which contained special and control characters (like DLE, DC1, ACK, DC3 etc.) in its records. After some searching I found a way to substitute those characters with their respective unicode escape value. And here Python comes to the rescue again!
#!/usr/bin/python
# -*- coding: utf-8 -*-
# DLE, DC1, ACK, DC3 characters
# a string with special and control characters
bob = """`Αδεια διεξαγωγής αγώνων κυνηγετικών ικανοτήτων σκύλων δεικτών."""
# bob string encoded in escaped unicode string
escapedbob = "\u0010`\u0391\u03b4\u03b5\u03b9\u03b1 \u03b4\u03b9\u03b5\u03be\u03b1\u03b3\u03c9\u03b3\u03ae\u03c2 \u03b1\u03b3\u03ce\u03bd\u03c9\u03bd \u03ba\u03c5\u03bd\u03b7\u03b3\u03b5\u03c4\u03b9\u03ba\u03ce\u03bd \u03b9\u03ba\u03b1\u03bd\u03bf\u03c4\u03ae\u03c4\u03c9\u03bd \u03c3\u03ba\u03cd\u03bb\u03c9\u03bd \u03b4\u03b5\u03b9\u03ba\u03c4\u03ce\u03bd.\u0010\u0011\u0006\u0013\u0010"
def main():
# encode special characters to unicode-escape
print bob.decode('utf-8').encode('unicode-escape')
# decode unicode-escape to simple characters
print escapedbob.decode('unicode-escape')
# exit
exit(0)
if __name__ == '__main__':
main()
# used tutorial
# http://stackoverflow.com/questions/10268518/python-string-to-unicode
Good luck tampering and hacking with your database strings !
Comments