數(shù)據(jù)挖掘 - 如何用python實(shí)現(xiàn)《多社交網(wǎng)絡(luò)的影響力最大化問(wèn)題分析》中的算法?
問(wèn)題描述
作為一名python小白,導(dǎo)師讓我用python實(shí)現(xiàn)論文中的算法,對(duì)于其中所要求的技術(shù)點(diǎn)以及如何實(shí)現(xiàn)算法顯得一頭霧水。目前python過(guò)完廖老師的python教程,正在看networkx文檔。望各位幫我解決以下問(wèn)題:1.實(shí)現(xiàn)算法所要求技術(shù)點(diǎn)2.如何應(yīng)對(duì)此類論文3.數(shù)據(jù)挖掘方向?qū)W習(xí)建議
論文地址 : http://cjc.ict.ac.cn/online/o...
問(wèn)題解答
回答1:經(jīng)過(guò)一周,現(xiàn)已初步完成,其中多出代碼不夠美觀以及效率不高,還請(qǐng)指點(diǎn)
# _*_ coding:utf-8 _*_# ==================================================================================## Description: Influence Maximization on Multiple Social Networks## ==================================================================================import matplotlib.pyplot as plt import networkx as nximport heapq#總圖G = nx.DiGraph()def load_graph(file): ’’’ 加載文件為列表格式,并得到G,畫出圖結(jié)構(gòu) ’’’#將總列表設(shè)成全局格式 global gllist#迭代文件中每個(gè)元素 with open(file) as f:lines = f.readlines() mylist = [line.strip().split() for line in lines]gllist = [] #將字符串型轉(zhuǎn)換為整型 for i in mylist:gllist.append(i[:-2]+map(lambda x: float(x), i[-2:])) print ’初始全局列表:’ print gllist drawlist=[] #提取二維列表mylist每行前三個(gè)元素,賦給新的列表drawlist for i in range(len(mylist)):drawlist.append([])for j in range(3): drawlist[i].append(mylist[i][j]) #將列表drawlist加載為有向加權(quán)圖 G.add_weighted_edges_from(drawlist) nx.draw(G, with_labels=True, width=1, node_color=’y’, edge_color=’b’) plt.show() print ’G圖中所有節(jié)點(diǎn):’,G.nodes() print ’G圖中所有邊:’,G.edges() print ’n’def get_self_node(gllist, target=None): ’’’ 獲取目標(biāo)節(jié)點(diǎn)的自傳播節(jié)點(diǎn),返回selflist并包含目標(biāo)節(jié)點(diǎn) ’’’ #初始化自傳播節(jié)點(diǎn)列表 selflist = [target]#存放已傳播節(jié)點(diǎn)列表 haslist = []flag = 0while (flag != 0): flag = 0 for target in selflist: if target not in haslist:for i in range(len(gllist)): #判斷二維列表中,每行第三個(gè)元素是否為1,若為1,則為自傳播節(jié)點(diǎn) if ((gllist[i][0] == target)or(gllist[i][1]==target))and(gllist[i][3]==1.0):if gllist[i][0] == target: if gllist[i][1] not in haslist:selflist.append(gllist[i][1])haslist.append(gllist[i][1])flag += 1else: if gllist[i][0] not in haslist:selflist.append(gllist[i][0])haslist.append(gllist[i][0])flag += 1#去除重復(fù)元素haslist = set(haslist) selflist = set(selflist)#去除重復(fù)元素 selflist = set(selflist) return selflistdef longest_path(gllist,source=None,target=None): ’’’ 獲取起始點(diǎn)到實(shí)體的最大路徑集合,返回為longestpath列表 ’’’ longestpath = [] newlist = [] for i in range(len(gllist)):newlist.append([])for j in range(3): newlist[i].append(gllist[i][j]) #構(gòu)建圖結(jié)構(gòu) G1 = nx.DiGraph() #添加帶權(quán)有向邊 G1.add_weighted_edges_from(newlist) #獲取目標(biāo)節(jié)點(diǎn)的所有自傳播街邊,并存入selflist中 selflist = get_self_node(gllist, target) max_path = 0 val_path = 1 #獲取初始節(jié)點(diǎn)到目標(biāo)節(jié)點(diǎn)及目標(biāo)節(jié)點(diǎn)的自傳播節(jié)點(diǎn)的最大路徑 for v in selflist:if v != source: #遍歷兩點(diǎn)之間所有路徑,并進(jìn)行比對(duì) for path in nx.all_simple_paths(G1,source=source,target=v):#判斷路徑后兩個(gè)元素是否為相同實(shí)體(如:b1->b2)if is_self_transmit_node(path[-2], v) == 0: for i in range(0, len(path)-1):val_path *= G1.get_edge_data(path[i], path[i+1])[’weight’] if max_path < val_path:max_path = val_path val_path = 1#若目標(biāo)節(jié)點(diǎn)為起始節(jié)點(diǎn)則直接跳出else: continue ############ 有待商榷 ##############longestpath.append(max_path) #返回初始節(jié)點(diǎn)到實(shí)體的最大路徑 return longestpathdef is_self_transmit_node(u, v): ’’’ 判斷目標(biāo)節(jié)點(diǎn)不為起始節(jié)點(diǎn)的自傳播點(diǎn) ’’’ flag = 0 #獲得起始節(jié)點(diǎn)的所有自傳播點(diǎn) selflist = get_self_node(gllist, v) for x in selflist:if u == x: flag = 1 return flagdef single_strong_infl(longestpath): ’’’ 計(jì)算起始點(diǎn)到實(shí)體的傳播概率(影響強(qiáng)度),返回影響強(qiáng)度stronginfl ’’’ temp = 1 for x in longestpath:temp *= 1-x stronginfl = 1-temp return stronginfldef all_strong_infl(G): ’’’ 獲得每個(gè)節(jié)點(diǎn)對(duì)實(shí)體的影響概率 ’’’ allstrong = [] #初始化所有節(jié)點(diǎn)的加權(quán)影響范圍列表 gnodes = [] #初始化節(jié)點(diǎn)列表 tempnodes = [] #初始化臨時(shí)節(jié)點(diǎn)列表gnodes = G.nodes()for u in gnodes:strong = 0 #存儲(chǔ)初始節(jié)點(diǎn)對(duì)每個(gè)實(shí)體的影響范圍加權(quán),初始化為0 #重置臨時(shí)節(jié)點(diǎn)列表tempnodes = G.nodes()for v in tempnodes: #非自身節(jié)點(diǎn) if u != v: #判斷目標(biāo)節(jié)點(diǎn)不為起始節(jié)點(diǎn)的自傳播點(diǎn)if is_self_transmit_node(v, u) == 0: #獲取起始節(jié)點(diǎn)到實(shí)體間最大加權(quán)路徑,并存入longestpath longestpath = longest_path(gllist, u, v)#去除已遍歷目標(biāo)節(jié)點(diǎn)的所有自傳播節(jié)點(diǎn) renode = get_self_node(gllist, v) for x in renode:if x != v: tempnodes.remove(x) #計(jì)算起始節(jié)點(diǎn)到實(shí)體間傳播概率(影響強(qiáng)度) stronginfl = single_strong_infl(longestpath) strong += stronginfl #添加單個(gè)節(jié)點(diǎn)到所有實(shí)體的加權(quán)影響范圍 allstrong.append([u, round(strong, 2)])#返回每個(gè)節(jié)點(diǎn)到所有實(shí)體的加權(quán)影響范圍 return allstrong #output allstrong : [[’a1’, 2.48], [’a2’, 1.6880000000000002], [’b1’, 0.7], [’b2’, 0], [’c1’, 0], [’d2’, 0.6]]def uS_e_uppergain(u, ev, S): ’’’ 獲取節(jié)點(diǎn)u在集合S的基礎(chǔ)上對(duì)實(shí)體ev的影響增益, 傳入候選節(jié)點(diǎn),上界gain(u|S, ev) ’’’#獲取目前實(shí)體的所有自傳播節(jié)點(diǎn) selflist = get_self_node(gllist, ev) stronglist = [] #遍歷自傳遍節(jié)點(diǎn) for v in selflist:’’’判斷節(jié)點(diǎn)v是否存在種子集合S中其中v為單個(gè)節(jié)點(diǎn),如v(ev, Gi)S為種子節(jié)點(diǎn)集合,如[’a1’,’a2’,’b1’,’b2’,’c1’,’d2’]’’’if v in S: ppSv = 1else: longestpath = [] #遍歷種子集合 for s in S:#初始化路徑權(quán)值與最大路徑權(quán)值val_path = 1max_path = 0#遍歷兩點(diǎn)之間所有路徑,并進(jìn)行比對(duì)for path in nx.all_simple_paths(G,source=s,target=v): #判斷路徑后兩個(gè)元素是否為相同實(shí)體(如:b1->b2) if is_self_transmit_node(path[-2], v) == 0: for i in range(0, len(path)-1): val_path *= G.get_edge_data(path[i], path[i+1])[’weight’]if max_path < val_path: max_path = val_path#重置路徑權(quán)值為1val_path = 1#將最大加權(quán)路徑存入longestpath列表longestpath.append(max_path) #得到上界pp(S,v)的影響概率,上界pp(S,v) ppSv = single_strong_infl(longestpath)stronglist.append(ppSv) #得到上界pp(S,ev)的影響概率,上界pp(S,ev) ppSev = single_strong_infl(stronglist)#獲取pp(u,ev) ppuev = single_strong_infl(longest_path(gllist, u, ev))#計(jì)算上界gain(u|S,ev) uSevgain = (1 - ppSev) * ppuev return uSevgaindef uppergain(u, emu, ems, S): ’’’ 在已有種子集合S的基礎(chǔ)上,求得節(jié)點(diǎn)u的影響增益上界, 其中傳進(jìn)參數(shù)ems為二維列表,如[[’a1’,2.48],[’a2’,1.688]],S則為[’a1’,’a2’] ’’’ uSgain = 0.0 #遍歷emu得到列表形式,得到如[’a1’,2.48]形式 for ev in emu:#判斷節(jié)點(diǎn)是否存在種子集合中if ev[0] in S: uSgain += uS_e_uppergain(u, ev[0], S)else: uSgain += ev[1] #返回上界gain(u|S)return uSgain def bound_base_imms(G, k): ’’’ 完全使用影響增益上界的方式選擇top-k個(gè)種子節(jié)點(diǎn)的過(guò)程 ’’’ #初始化emu,H,初始化ems=空集,S=空集 Htemp = [] Htemp = all_strong_infl(G) H = [] #遍歷Htemp=[[’a1’,2.48],[’a2’,1.688]],得到如[’a1’,2.48]形式 for x in Htemp:#逐個(gè)獲取二維列表中每一行,形式為[’a1’,2.48,0]H.append([x[0],x[1],0]) emu = [] emu = all_strong_infl(G)ems = [] S = []for i in range(k):#提取堆頂元素,tnode的形式為[’a1’,2.48,0]tnode = heapq.nlargest(1, H, key=lambda x: x[1])#將[[’b2’, 3.1, 0]]格式改為[’b2’, 3.1, 0]格式tnode = sum(tnode, [])while (tnode[2] != i): gain = 0.0 #獲取節(jié)點(diǎn)u的影響增益上界 gain = uppergain(tnode, emu, ems, S) #賦值影響范圍 tnode[1] = gain #修改status tnode[2] = i#對(duì)堆進(jìn)行排序 H = heapq.nlargest(len(H), H, key=lambda x: x[1])#獲取堆頂元素tnode = heapq.nlargest(1, H, key=lambda x: x[1])tnode = sum(tnode, [])#添加node到種子集合S.append([tnode[0]])#更新ems,添加新節(jié)點(diǎn)及節(jié)點(diǎn)對(duì)每個(gè)實(shí)體的影響范圍加權(quán)ems.append([tnode[0], tnode[1]])#刪除堆頂元素H.remove(tnode) print ems return sum(S, [])if __name__==’__main__’: #大小為k的種子集合S k = 60#加載文件數(shù)據(jù),得到圖G和初始列表gllist load_graph(’test.txt’)#完全使用影響增益上界值的計(jì)算過(guò)程函數(shù),打印種子集合S print ’種子集合:’,bound_base_imms(G, k)
test.txta1 b1 0.2 0a1 c1 0.8 0a2 b2 0.4 0a2 d2 1 0b1 c1 0.7 0c2 a2 0.8 0d2 b2 0.6 0a1 a2 1 1a2 a1 0.1 1....a1 l1 0.5 0a1 m1 0.5 0a1 q1 0.5 0a1 v1 0.5 0a1 z1 0.5 0a1 s1 0.5 0a1 w1 0.5 0a1 u1 0.5 0其中前兩列為傳播實(shí)體,第三列為實(shí)體間傳播概率,最后一列為0代表同一網(wǎng)絡(luò)傳播,為1代表網(wǎng)絡(luò)間自傳播。
下來(lái)要進(jìn)行優(yōu)化: 1.采用獨(dú)立級(jí)聯(lián)模型,設(shè)置閾值 2.將最大路徑改為最短路徑,利用log
相關(guān)文章:
1. php - 淘寶訂單拆單表設(shè)計(jì)2. 實(shí)現(xiàn)bing搜索工具urlAPI提交3. 如何用筆記本上的apache做微信開(kāi)發(fā)的服務(wù)器4. mysql優(yōu)化 - MySQL如何為配置表建立索引?5. 冒昧問(wèn)一下,我這php代碼哪里出錯(cuò)了???6. MySQL主鍵沖突時(shí)的更新操作和替換操作在功能上有什么差別(如圖)7. 關(guān)于mysql聯(lián)合查詢一對(duì)多的顯示結(jié)果問(wèn)題8. 數(shù)據(jù)庫(kù) - Mysql的存儲(chǔ)過(guò)程真的是個(gè)坑!求助下面的存儲(chǔ)過(guò)程哪里錯(cuò)啦,實(shí)在是找不到哪里的問(wèn)題了。9. 我在網(wǎng)址中輸入localhost/abc.php顯示的是not found是為什么呢?10. windows誤人子弟啊
