文章詳情頁

使用Python實現(xiàn)將多表分批次從數(shù)據(jù)庫導出到Excel

瀏覽：5日期：2022-07-25 14:15:56

一、應用場景

為了避免反復的手手工從后臺數(shù)據(jù)庫導出某些數(shù)據(jù)表到Excel文件、高效率到多份離線數(shù)據(jù)。

二、功能事項

支持一次性導出多個數(shù)據(jù)源表、自動獲取各表的字段名。

支持控制批次的寫入速率。例如：每5000行一個批次寫入到excel。

支持結構相同的表導入到同一個Excel文件?？蛇m用于經過水平切分后的分布式表。

三、主要實現(xiàn)

1、概覽

A[創(chuàng)建類] -->|方法1| B(創(chuàng)建數(shù)據(jù)庫連接)A[創(chuàng)建類] -->|方法2| C(取查詢結果集)A[創(chuàng)建類] -->|方法3| D(利用句柄寫入Excel)A[創(chuàng)建類] -->|方法4| E(讀取多個源表)

B(創(chuàng)建數(shù)據(jù)庫連接) -->U(調用示例)C(取查詢結果集) -->U(調用示例)D(利用句柄寫入Excel) -->U(調用示例)E(讀取多個源表) -->U(調用示例)

2、主要方法

首先需要安裝第三方庫pymssql實現(xiàn)對SQLServer的連接訪問，自定義方法__getConn()需要指定如下五個參數(shù)：服務器host、登錄用戶名user、登錄密碼pwd、指定的數(shù)據(jù)庫db、字符編碼charset。連接成功后，通過cursor()獲取游標對象，它將用來執(zhí)行數(shù)據(jù)庫腳本，并得到返回結果集和數(shù)據(jù)總量。

創(chuàng)建數(shù)據(jù)庫連接和執(zhí)行SQL的源碼：

def __init__(self,host,user,pwd,db): self.host = host self.user = user self.pwd = pwd self.db = db def __getConn(self): if not self.db: raise(NameError,’沒有設置數(shù)據(jù)庫信息’) self.conn = pymssql.connect(host=self.host, user=self.user, password=self.pwd, database=self.db, charset=’utf8’) cur = self.conn.cursor() if not cur: raise(NameError,’連接數(shù)據(jù)庫失敗’) else: return cur

3、方法3中寫入Excel時，注意一定要用到Pandas中的公共句柄ExcelWriter對象writer。當數(shù)據(jù)被分批多次寫入同一個文件時，如果直接使用to_excel()方法，則前面批次的結果集將會被后續(xù)結果覆蓋。增加了這個公共句柄限制后，后面的寫入會累加到前面寫入的數(shù)據(jù)尾部行，而不是全部覆蓋。

writer = pd.ExcelWriter(file)df_fetch_data[rs_startrow:i*N].to_excel(writer, header=isHeader, index=False, startrow=startRow)

分批次寫入到目標Excel時的另一個要注意的參數(shù)是寫入行startrow的設置。每次寫入完成后需要重新指下一批次數(shù)據(jù)的初始位置值。每個批次的數(shù)據(jù)會記錄各自的所屬批次信息。

利用關鍵字參數(shù)**args 指定多個數(shù)據(jù)源表和數(shù)據(jù)庫連接。

def exportToExcel(self, **args): for sourceTB in args[’sourceTB’]:arc_dict = dict( sourceTB = sourceTB, path=args[’path’], startRow=args[’startRow’], isHeader=args[’isHeader’], batch=args[’batch’] ) print(’n當前導出的數(shù)據(jù)表為：%s’ %(sourceTB)) self.writeToExcel(**arc_dict) return ’success’

四、先用類MSSQL創(chuàng)建對象，再定義關鍵字參數(shù)args，最終調用方法導出到文件即完成數(shù)據(jù)導出。

#!/usr/bin/env python# coding: utf-8# 主要功能：分批次導出大數(shù)據(jù)量、結構相同的數(shù)據(jù)表到excel # 導出多個表的數(shù)據(jù)到各自的文件， # 目前問題：to_excel 雖然設置了分批寫入，但先前的數(shù)據(jù)會被下一次寫入覆蓋，# 利用Pandas包中的ExcelWriter()方法增加一個公共句柄，在寫入新的數(shù)據(jù)之時保留原來寫入的數(shù)據(jù)，等到把所有的數(shù)據(jù)都寫進去之后關閉這個句柄import pymssql import pandas as pd import datetime import math class MSSQL(object): def __init__(self,host,user,pwd,db): self.host = host self.user = user self.pwd = pwd self.db = db def __getConn(self): if not self.db: raise(NameError,’沒有設置數(shù)據(jù)庫信息’) self.conn = pymssql.connect(host=self.host, user=self.user, password=self.pwd, database=self.db, charset=’utf8’) cur = self.conn.cursor() if not cur: raise(NameError,’連接數(shù)據(jù)庫失敗’) else: return cur def executeQuery(self,sql): cur = self.__getConn() cur.execute(sql) # 獲取所有數(shù)據(jù)集 # fetchall()獲取結果集中的剩下的所有行 # 如果數(shù)據(jù)量太大，是否需要分批插入 resList, rowcount = cur.fetchall(),cur.rowcount self.conn.close() return (resList, rowcount) # 導出單個數(shù)據(jù)表到excel def writeToExcel(self,**args): sourceTB = args[’sourceTB’] columns = args.get(’columns’) path=args[’path’] fname=args.get(’fname’) startRow=args[’startRow’] isHeader=args[’isHeader’] N=args[’batch’] # 獲取指定源數(shù)據(jù)列 if columns is None: columns_select = ’ * ’ else: columns_select = ’,’.join(columns) if fname is None: fname=sourceTB+’_exportData.xlsx’ file = path + fname # 增加一個公共句柄，寫入新數(shù)據(jù)時，保留原數(shù)據(jù) writer = pd.ExcelWriter(file) sql_select = ’select ’+ columns_select + ’ from ’+ sourceTB fetch_data, rowcount = self.executeQuery(sql_select) # print(rowcount) df_fetch_data = pd.DataFrame(fetch_data) # 一共有roucount行數(shù)據(jù)，每N行一個batch提交寫入到excel times = math.floor(rowcount/N) i = 1 rs_startrow = 0 # 當總數(shù)據(jù)量 > 每批插入的數(shù)據(jù)量時 print(i, times) is_while=0 while i <= times: is_while = 1 # 如果是首次，且指定輸入標題，則有標題 if i==1:# isHeader = TruestartRow = 1 else:# isHeader = FalsestartRow+=N # 切片取指定的每個批次的數(shù)據(jù)行 ,前閉后開 # startrow: 寫入到目標文件的起始行。0表示第1行，1表示第2行。。。 df_fetch_data[’batch’] = ’batch’+str(i) df_fetch_data[rs_startrow:i*N].to_excel(writer, header=isHeader, index=False, startrow=startRow) print(’第’,str(i),’次循環(huán)，取源數(shù)據(jù)第’,rs_startrow,’行至’,i*N,’行’,’寫入到第’,startRow,’行’) print(’第’,str(i),’次寫入數(shù)據(jù)為：’,df_fetch_data[rs_startrow:i*N]) # 重新指定源數(shù)據(jù)的讀取起始行 rs_startrow =i * N i+=1 # 寫入文件的開始行數(shù) # 當沒有做任何循環(huán)時，仍然從第一行開始寫入 if is_while == 0: startRow = startRow else: startRow+=N df_fetch_data[’batch’] = ’batch’+str(i) print(’第{0}次讀取數(shù)據(jù)，從第{1}行開始，寫入到第{2}行！’.format(str(i), str(rs_startrow), str(startRow))) print(’第’,str(i),’寫入數(shù)據(jù)為：’,df_fetch_data[rs_startrow:i*N]) df_fetch_data[rs_startrow:i*N].to_excel(writer, header=isHeader, index=False, startrow=startRow) # 注：這里一定要saver()將數(shù)據(jù)從緩存寫入磁盤?。。。。。。。。。。。。。。。。。。。?！1 writer.save() start_time=datetime.datetime.now() # 導出結構相同的多個表到同一樣excel def exportToExcel(self, **args): for sourceTB in args[’sourceTB’]: arc_dict = dict(sourceTB = sourceTB,path=args[’path’],startRow=args[’startRow’],isHeader=args[’isHeader’],batch=args[’batch’] ) print(’n當前導出的數(shù)據(jù)表為：%s’ %(sourceTB)) self.writeToExcel(**arc_dict) return ’success’ start_time=datetime.datetime.now() if __name__ == '__main__': ms = MSSQL(host='localhost',user='test',pwd='test',db='db_jun') args = dict( sourceTB = [’tb2’, ’tb1’],# 待導出的表 path=’D:myPCPython’,# 導出到指定路徑 startRow=1,#設定寫入文件的首行，第2行為數(shù)據(jù)首行 isHeader=False,# 是否包含源數(shù)據(jù)的標題 batch=5 ) # 導出多個文件 ms.exportToExcel(**args)

以上這篇使用Python實現(xiàn)將多表分批次從數(shù)據(jù)庫導出到Excel就是小編分享給大家的全部內容了，希望能給大家一個參考，也希望大家多多支持好吧啦網。

python

上一條：python小程序之4名牌手洗牌發(fā)牌問題解析下一條：解決python執(zhí)行較大excel文件openpyxl慢問題

相關文章：

1. UDDI FAQs2. 解析原生JS getComputedStyle3. 刪除docker里建立容器的操作方法4. 阿里前端開發(fā)中的規(guī)范要求5. XML入門的常見問題(一)6. css進階學習選擇符7. html小技巧之td,div標簽里內容不換行8. 概述IE和SQL2k開發(fā)一個XML聊天程序9. XML解析錯誤：未組織好的解決辦法10. Echarts通過dataset數(shù)據(jù)集實現(xiàn)創(chuàng)建單軸散點圖

排行榜

					
					JavaScript偽數(shù)組和數(shù)組的使用與區(qū)別
python openpyxl 帶格式復制表格的實現(xiàn)
Python使用shutil模塊實現(xiàn)文件拷貝
Python 下載Bing壁紙的示例
如何基于windows實現(xiàn)python定時爬蟲
python 浮點數(shù)四舍五入需要注意的地方
Python 如何將integer轉化為羅馬數(shù)(3999以內)
python 實現(xiàn)aes256加密
python matlab庫簡單用法講解
Python 如何用一行代碼實現(xiàn)for循環(huán)初始化數(shù)組
刪除docker里建立容器的操作方法