参考:http://www.cnblogs.com/zhangmengqin/p/9167358.html
1. 各个域名后缀的whois server可能不一样,可能一样。
2. 各个whois server的whois格式 可能不一样,可能一样。
标题:有全世界各种后缀的whois server
网址:https://www.iana.org/domains/root/db
意图:使用Python爬虫,抓取whois server资料,在本地做成字典;使用时匹配这个字典的key就可以获取对应的whois server进行查询。
工具:BeautifulSoup,好处就是不用自己写正则,只要根据他的语法
1. 抓取域名后缀列表
import requests
from bs4 import BeautifulSoup
iurl = 'https://www.iana.org/domains/root/db'
res = requests.get(iurl,timeout=600)
res.encoding = 'utf-8'
soup = BeautifulSoup(res.text,'html.parser')
list1=[]
list2=[]
jsonStr={}
for tag in soup.find_all('span', class_='domain tld'):
d_suffix = tag.get_text()
print(d_suffix)
2. 抓取域名后缀对应的whois server列表
import requests
from bs4 import BeautifulSoup
import re
import time
iurl = 'https://www.iana.org/domains/root/db'
res = requests.get(iurl,timeout=600)
res.encoding = 'utf-8'
soup = BeautifulSoup(res.text,'html.parser')
list1=[]
list2=[]
jsonStr={}
for tag in soup.find_all('span', class_='domain tld'):
d_suffix = tag.get_text()
print(d_suffix)
list2.append(d_suffix)
n_suffix = d_suffix.split('.')[1]
new_url = iurl + '/' + n_suffix
server=''
try:
res2=requests.get(new_url,timeout=600)
res2.encoding='utf-8'
soup2= BeautifulSoup(res2.text,'html.parser')
retxt = re.compile(r'<b>WHOIS Server:</b> (.*?)\n')
arr = retxt.findall(res2.text)
if len(arr) > 0:
server = arr[0]
list2.append(server)
print(server)
time.sleep(1)
except Exception as e:
print('超时')
with open('suffixList.txt', "a",encoding='utf-8') as my_file:
my_file.write(n_suffix + ":" + server+'\n')
print('抓取结束!!!')
本程序执行时间较长,可选择后台驻留执行:
nohup python servers.py &
3. 输入任何一个后缀的域名查询whois信息
temp = input('请输入你要查询的域名:')
result = temp.split('.')[0]
result1=temp.split('.')[1]
r_suf='.'+result1
print(type(r_suf))
# print(result)
print(r_suf)
# d = json.dumps(dictionary)
whois_server =dictionary.get(r_suf)
print(whois_server)
print(type(whois_server))
if whois_server is None:
print(r_suf + '此后缀出小差啦~')
else:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((whois_server, 43))
temp=( temp +'\r\n').encode()
s.send(temp)
response = b''
while True:
data = s.recv(4096)
response += data
if not data:
break
s.close()
print(response.decode())