UniCode
computer-text-internationalization standard
I'm not quite clear on whether you should use UniCode and UTF as different terms
- apparently UTF-8 was invented in 1992 at a coffee shop around the corner from my old high school by Rob Pike and Ken Thompson, which isn't shocking since that's near Murray Hill Bell Labs.
http://en.wikipedia.org/wiki/Unicode
Python encoding/decoding
- WikiGraph scraping uses
page_text = unicodedata.normalize('NFKD', page_text).encode('ascii','ignore')
- https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize
- but I think there's something nicer...
- Unicode Zen in Python 2.x - The Long Version
Edited: | Tweet this! | Search Twitter for discussion
BackLinks: 2003-02-13-AntiWarDemonstrations | 2003-12-20-PollardBoycotts | 2004-01-30-PollardExxonBoycottDeath | 2007-01-16-StimsonGitmoLawFirmBoycott | 2010-12-01-AmazonStopsHostingWikileaks | 2011-11-02-ProtectipAndSopaBills | 2013-07-31-FilteringAbuseOnTwitterAndOthers | NationalConvention | SOPA
No twinpages!