文章詳情頁

python3.x - python 中的maketrans在utf-8文件中該怎么使用

瀏覽：223日期：2022-07-05 10:59:36

問題描述

我寫了一個處理文本的文件就是把文本中所有的符號都替換掉，替換成空格。用的python中maketrans和translate。其中在使用對于ASCII編碼的文件時是正常的，但對于utf-8文件時，就報錯，提示maketrans中的參數不等長，但是明明是一樣長的啊：

File '/Users/lgq/Desktop/p3.py', line 10, in text_to_words

'abcdefghijklmnopqrstuvwxyz ')

ValueError: the first two maketrans arguments must have equal length

我查了一下說是maketrans在utf-8下不能用，那我在utf-8下該怎么替換掉字符呢，求各位大神指點。

def text_to_words(the_text): ''' Return a list of words with all punctuation removed,and all in lowercase. ''' my_substitutions = the_text.maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ') # Translate the text now. cleaned_text = the_text.translate(my_substitutions) wds = cleaned_text.split() return wdsdef get_words_in_book(filename): ''' Read a book from filename, and return a list of its words.''' f = open(filename, 'r', encoding = 'utf-8') content = f.read() f.close() wds = text_to_words(content) return wdsbook_words = get_words_in_book('alice.txt')print('There are {0} words in the book, the first 100 aren{1}'.format(len(book_words), book_words[:100]))

問題解答

回答1：

首先這兩個字符串長度不相等， ' 是一個字符，也是一個字符你可以用 len() 查看。然后關于字符串什么的問題，最好說明 python 的版本

maketrans 參數長度不相等

my_substitutions = the_text.maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ')

測試代碼：

from string import translate, maketransdef text_to_words(the_text): ''' Return a list of words with all punctuation removed,and all in lowercase. ''' my_substitutions = maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ') # Translate the text now. cleaned_text = the_text.translate(my_substitutions) wds = cleaned_text.split() return wdstext_to_words(’ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’測試’)

output

[’abcdefghijklmnopqrstuvwxyz’, ’xe6xb5x8bxe8xafx95’]

這是 python2 的運行結果

Python 編程

上一條：python - 含中文JSON未能按期待進行dumps，(\xxx\xxx\xxx)?下一條：python - subprocess模塊怎樣返回執行文件內容？

相關文章：

1. macos - 無法source activate python272. 運行python程序時出現“應用程序發生異常”的內存錯誤？3. javascript - 微信 H5 授權返回鍵4. github - 求助大神啊，win10 git clone error，折騰了幾天都不行，以前原本好好的，突然就這樣了5. android - 如何實現QQ pad 點擊右側輸入框，只頂右側的布局，左側布局不動6. javascript - npm run build后調用api返回index.html7. css - 關于background-position百分比的問題？8. java - 處理數據關聯關系，使用數據庫表外鍵和代碼內維護相比的優缺點？9. javascript - node得到req不能得到boolean10. 小白學python的問題關于%d和%s的區別

排行榜

					
					javascript - 微信 H5 授權 返回鍵
css - 關于background-position百分比的問題？
android - Genymotion 模擬器可以做屏幕適配檢測嗎？
javascript - npm run build后調用api返回index.html
css - angular前端如何讓ng-repeat的內容并排一行，跑起來呢？
css - hexo+github搭建博客
macos - 無法source activate  python27
運行python程序時出現“應用程序發生異常”的內存錯誤？
java - 處理數據關聯關系，使用數據庫表外鍵和代碼內維護相比的優缺點？
小白學python的問題 關于%d和%s的區別
android - 如何實現QQ pad  點擊右側輸入框，只頂右側的布局，左側布局不動
				

熱門標簽

成人在线亚洲_国产日韩视频一区二区三区_久久久国产精品_99国内精品久久久久久久

python3.x - python 中的maketrans在utf-8文件中該怎么使用