参考:http://www.cnblogs.com/zhangmengqin/p/9167358.html
1. 各个域名后缀的whois server可能不一样,可能一样。
2. 各个whois server的whois格式 可能不一样,可能一样。
标题:有全世界各种后缀的whois server
网址:https://www.iana.org/domains/root/db
意图:使用Python爬虫,抓取whois server资料,在本地做成字典;使用时匹配这个字典的key就可以获取对应的whois server进行查询。
工具:BeautifulSoup,好处就是不用自己写正则,只要根据他的语法
1. 抓取域名后缀列表
import requests from bs4 import BeautifulSoup iurl = 'https://www.iana.org/domains/root/db' res = requests.get(iurl,timeout=600) res.encoding = 'utf-8' soup = BeautifulSoup(res.text,'html.parser') list1=[] list2=[] jsonStr={} for tag in soup.find_all('span', class_='domain tld'): d_suffix = tag.get_text() print(d_suffix)
2. 抓取域名后缀对应的whois server列表
import requests from bs4 import BeautifulSoup import re import time iurl = 'https://www.iana.org/domains/root/db' res = requests.get(iurl,timeout=600) res.encoding = 'utf-8' soup = BeautifulSoup(res.text,'html.parser') list1=[] list2=[] jsonStr={} for tag in soup.find_all('span', class_='domain tld'): d_suffix = tag.get_text() print(d_suffix) list2.append(d_suffix) n_suffix = d_suffix.split('.')[1] new_url = iurl + '/' + n_suffix server='' try: res2=requests.get(new_url,timeout=600) res2.encoding='utf-8' soup2= BeautifulSoup(res2.text,'html.parser') retxt = re.compile(r'<b>WHOIS Server:</b> (.*?)\n') arr = retxt.findall(res2.text) if len(arr) > 0: server = arr[0] list2.append(server) print(server) time.sleep(1) except Exception as e: print('超时') with open('suffixList.txt', "a",encoding='utf-8') as my_file: my_file.write(n_suffix + ":" + server+'\n') print('抓取结束!!!')
本程序执行时间较长,可选择后台驻留执行:
nohup python servers.py &
3. 输入任何一个后缀的域名查询whois信息
temp = input('请输入你要查询的域名:') result = temp.split('.')[0] result1=temp.split('.')[1] r_suf='.'+result1 print(type(r_suf)) # print(result) print(r_suf) # d = json.dumps(dictionary) whois_server =dictionary.get(r_suf) print(whois_server) print(type(whois_server)) if whois_server is None: print(r_suf + '此后缀出小差啦~') else: s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((whois_server, 43)) temp=( temp +'\r\n').encode() s.send(temp) response = b'' while True: data = s.recv(4096) response += data if not data: break s.close() print(response.decode())