怎样编译安装MySQL

2016年10月28日 由 Amon 没有评论 »

编译安装CMake

参考:https://amon.org/cmake

编译安装Bison

参考:https://amon.org/bison

编译安装Boost库

参考:https://amon.org/boost

编译安装MySQL

下载:http://dev.mysql.com/downloads/mysql/

最新版本:mysql-5.7.20

wget -c http://cdn.mysql.com/Downloads/MySQL-5.7/mysql-5.7.20.tar.gz && tar -zxvf mysql-5.7.20.tar.gz && cd mysql-5.7.20
cmake -DCMAKE_INSTALL_PREFIX=/usr/local/mysql -DMYSQL_DATADIR=/usr/local/mysql/data -DSYSCONFDIR=/etc -DWITH_MYISAM_STORAGE_ENGINE=1 -DWITH_INNOBASE_STORAGE_ENGINE=1 -DWITH_MEMORY_STORAGE_ENGINE=1 -DWITH_READLINE=1 -DMYSQL_UNIX_ADDR=/var/lib/mysql/mysql.sock -DMYSQL_TCP_PORT=3306 -DENABLED_LOCAL_INFILE=1 -DWITH_PARTITION_STORAGE_ENGINE=1 -DEXTRA_CHARSETS=all -DDEFAULT_CHARSET=utf8 -DDEFAULT_COLLATION=utf8_general_ci
make && make install

编译安装完成。

使用下面的命令查看是否有mysql用户及用户组:

cat /etc/passwd # 查看用户列表
cat /etc/group  # 查看用户组列表

如果没有就创建:

groupadd mysql && useradd -g mysql mysql

修改/usr/local/mysql权限:

chown -R mysql:mysql /usr/local/mysql

将MySQL编译生成的bin目录添加到当前Linux系统的环境变量中:

echo -e '\n\nexport PATH=/usr/local/mysql/bin:$PATH\n' >> /etc/profile && source /etc/profile

初始化MySQL自身的数据库

在MySQL安装目录的\bin\路径下,执行mysqld命令,初始化MySQL自身的数据库。

参数说明:user:用户;basedir:mysql的安装路径;datadir:表示数据库文件存放路径。

cd /usr/local/mysql/bin/
mysqld --initialize-insecure --user=mysql --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --explicit_defaults_for_timestamp 

输入以下命令查看是否生成了MySQL自身的数据库文件:

ls -lrt /usr/local/mysql/data/

配置开机自启动

cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysqld
chmod +x /etc/init.d/mysqld
chkconfig --add mysqld 
chkconfig mysqld on

创建MySQL日志存放目录

下面配置的MySQL日志存放目录以及权限都是根据前面my.cnf文件写的,也就是两者需要保持一致。

mkdir -p /var/log/mysql
chown -R mysql:mysql /var/log/mysql
mkdir -p /var/lib/mysql/
chown -R mysql:mysql /var/lib/mysql/

编辑 /etc/my.cnf

下面配置的MySQL日志存放目录以及权限都是根据前面my.cnf文件写的,也就是两者需要保持一致。


[client]
port=3306
socket=/var/lib/mysql/mysql.sock

[mysqld]
user = mysql
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
port=3306
server-id = 1
socket=/var/lib/mysql/mysql.sock

character-set-server = utf8
log-error = /var/log/mysql/error.log
pid-file = /var/log/mysql/mysql.pid
general_log = 1
back_log = 300

max_connections = 1000
max_connect_errors = 6000
open_files_limit = 65535
table_open_cache = 128 
max_allowed_packet = 4M
binlog_cache_size = 1M
max_heap_table_size = 8M
tmp_table_size = 16M

read_buffer_size = 2M
read_rnd_buffer_size = 8M
sort_buffer_size = 8M
join_buffer_size = 28M
key_buffer_size = 4M

thread_cache_size = 8

query_cache_type = 1
query_cache_size = 8M
query_cache_limit = 2M

ft_min_word_len = 4

log_bin = mysql-bin
binlog_format = mixed
expire_logs_days = 30

performance_schema = 0
explicit_defaults_for_timestamp

myisam_sort_buffer_size = 8M
myisam_repair_threads = 1

interactive_timeout = 28800
wait_timeout = 28800

symbolic-links=0

sql-mode="NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"

[mysqldump]
quick
max_allowed_packet = 16M

[myisamchk]
key_buffer_size = 8M
sort_buffer_size = 8M
read_buffer = 4M
write_buffer = 4M

启动MySQL服务

启动MySQL进程服务:

mysqld_safe --user=mysql --datadir=/usr/local/mysql/data --log-error=/var/log/mysql/error.log &

输出:

[1] 10274
150513 21:28:16 mysqld_safe Logging to '/var/log/mysql/error.log'.
150513 21:28:16 mysqld_safe Starting mysqld daemon with databases from /usr/local/mysql/data

直接回车,这条命令会在后台继续执行。

启动MySQL:

service mysqld start

如果启动成功,输出:

Starting MySQL SUCCESS!

如果启动失败,输出:

Starting MySQL... ERROR! The server quit without updating PID file (/var/log/mysql/mysql.pid).

可见是.pid文件的配置有出入,务必保证编译配置时的路径和 my.cnf 中的路径保持一致。因为MySQL默认的my.conf。

现在打开 /usr/local/mysql/data/ ,删除刚才生成的一些文件,重新执行mysqld命令,初始化MySQL自身的数据库。

查看MySQL服务进程:

ps -ef | grep mysql

查看MySQL端口监听情况:

netstat -tunpl | grep 3306

初始化MySQL数据库的root用户密码

MySQL安全配置向导mysql_secure_installation各项配置能够提高mysql库的安全。
MySQL数据库默认自带了一个root用户,在设置好MySQL数据库的安全配置后初始化root用户的密码。
MySQL新增了密码验证插件(VALIDATE PASSWORD PLUGIN),用户密码策略分成低级LOW、中等MEDIUM和超强STRONG三种。如果考虑启用这个插件,推荐使用中等MEDIUM级别。

配置过程中,可以一路输入y。建议仔细阅读操作过程中显示的指南。

mysql_secure_installation

输出:


Securing the MySQL server deployment.

Connecting to MySQL using a blank password.

VALIDATE PASSWORD PLUGIN can be used to test passwords [是否使用密码验证插件,直接回车,选择不使用]
and improve security. It checks the strength of password
and allows the users to set only those passwords which are
secure enough. Would you like to setup VALIDATE PASSWORD plugin?

Press y|Y for Yes, any other key for No:

Please set the password for root here. [初始化root用户的密码,输入两次新密码]

New password:

Re-enter new password:

By default, a MySQL installation has an anonymous user, [删除匿名用户,输入 y]
allowing anyone to log into MySQL without having to have
a user account created for them. This is intended only for
testing, and to make the installation go a bit smoother.
You should remove them before moving into a production
environment.

Remove anonymous users? (Press y|Y for Yes, any other key for No) : y
Success.


Normally, root should only be allowed to connect from [root用户仅能本地访问,不允许远程访问,输入 y]
'localhost'. This ensures that someone cannot guess at
the root password from the network.

Disallow root login remotely? (Press y|Y for Yes, any other key for No) : y
Success.

By default, MySQL comes with a database named 'test' that [删除test数据库库和对test数据库的访问权限,输入 y]
anyone can access. This is also intended only for testing,
and should be removed before moving into a production
environment.


Remove test database and access to it? (Press y|Y for Yes, any other key for No)  : y
 - Dropping test database...
Success.

 - Removing privileges on test database...
Success.

Reloading the privilege tables will ensure that all changes [重载用户权限表,输入 y]
made so far will take effect immediately.

Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y
Success.

All done!

将MySQL数据库的动态链接库共享至系统链接库

MySQL数据库会被PHP等服务调用,所以需要将MySQL编译后的lib库文件添加至当前Linux主机链接库/etc/ld.so.conf.d/下,这样MySQL服务就可以被其它服务调用了。

echo "/usr/local/mysql/lib" > /etc/ld.so.conf.d/mysql.conf
ldconfig
ldconfig -v |grep mysql

输出:

/usr/lib64/mysql:
        libmysqlclient.so.18 -> libmysqlclient.so.18.0.0
/usr/local/mysql/lib:
        libmysqlclient.so.20 -> libmysqlclient_r.so.20.0.0

查看MySQL版本

mysql -v -p
Enter password:

输入MySQL密码,输出:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.20-log Source distribution

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Reading history-file /root/.mysql_history
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

MySQL日志管理

MySQL的日志将会占用很大空间,因此需要一定的管理策略。

mysql-bin.000002这样文件存放的是数据库每天增加的数据,所有数据库的数据增量都在这里面,如何删除mysql-bin.0000X 日志文件呢?

mysql -u root -p
mysql> reset master;

参考:《解决”mysql-bin.000001″占用超大空间的问题》
参考:《mysql 利用binlog增量备份,还原实例》
参考:《MySQL 删除binlog日志文件》
参考:《mysql-bin引起mysql不能启动》
参考:http://blog.csdn.net/alishun/article/details/5084318

报错:MySQL占用空间和磁盘空间对应

删除一个MySQL占用空间2.3G的数据库,释放磁盘空间2.3G。

磁盘空间不足时,报错:

Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server.

报错:PHP7连接MySQL设置

参考:https://amon.org/mysqli-real-connect-hy000-2002

怎样编译安装Apache

2016年10月28日 由 Amon 没有评论 »

查看当前版本:

/usr/local/apache2/bin/apachectl -v

卸载低版本:

如果有通过yum/rpm安装的Apache,需要先卸载:

rpm -e --nodeps httpd
rpm -e --nodeps httpd-devel

通过源码包编译安装:

参考:http://www.linuxfromscratch.org/blfs/view/svn/server/apache

下载:http://httpd.apache.org/download.cgi
镜像:http://www.apache.org/mirrors/
美国:http://www-us.apache.org/dist/httpd/httpd-2.2.34.tar.gz
中国:http://mirrors.tuna.tsinghua.edu.cn/apache//httpd/httpd-2.2.34.tar.gz

2.2版本:Apache 2.2.34 (2017-07-11)。

wget http://apache.fayea.com//httpd/httpd-2.2.34.tar.gz && tar -zxvf httpd-2.2.34.tar.gz && cd httpd-2.2.34
./configure --prefix=/usr/local/apache2 --enable-mods-shared=most --with-crypto --enable-rewrite --enable-headers --enable-deflate --enable-socache-shmcb --enable-negotiation --enable-ssl --with-ssl=/usr/lib --enable-http2 --with-nghttp2=/usr/local/lib
make && make install

2.4版本:Apache 2.4.29 (2017-07-11)。

wget http://www-us.apache.org/dist/httpd/httpd-2.4.29.tar.gz && tar -zxvf httpd-2.4.29.tar.gz && cd httpd-2.4.29
./configure --prefix=/usr/local/apache2 --enable-mods-shared=most --with-crypto --enable-rewrite --enable-headers --enable-deflate --enable-socache-shmcb --enable-negotiation --enable-ssl --with-ssl=/usr/lib --enable-http2 --with-nghttp2=/usr/local/lib
make && make install

输出:

...
mkdir /usr/local/apache2/manual
make[1]: Leaving directory `/root/apache/httpd-2.4.29'

查看版本:

/usr/local/apache2/bin/httpd -v

输出:

Server version: Apache/2.4.29 (Unix)
Server built:   Jul 14 2016 15:03:44

安装CGI支持:

参考:https://amon.org/perl

cd modules/generators

编译安装 cgi模块:

/usr/local/apache2/bin/apxs -i -a -c mod_cgi.c

编译安装 cgid模板:

/usr/local/apache2/bin/apxs -i -a -c mod_cgid.c

查看下配置文件 httpd.conf ,发现cgi与cgid模块已被加载:

...
LoadModule cgi_module         modules/mod_cgi.so
LoadModule cgid_module        modules/mod_cgid.so

创建 www 用户

groupadd www && useradd -g www -s /sbin/nologin -M www

设置启动命令:

将Apache设成系统服务:

cp /usr/local/apache2/bin/apachectl /etc/init.d/httpd

编辑 /etc/init.d/httpd,在第二行加入:

# chkconfig: 2345 85 15
# description: httpd is web server

把Apache添加到系统的启动服务组里:

chkconfig --add httpd && chkconfig --level 35 httpd on

输入命令:

service httpd start

输出:

AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using localhost.localdomain. Set the 'ServerName' directive globally to suppress this message

没关系,这是因为没有在httpd.conf中设置主机名,在后面再处理。

设为开机启动:

chkconfig httpd on

查看Apache版本:

/usr/local/apache2/bin/apachectl -v

输出:

Server version: Apache/2.4.29 (Unix)
Server built:   Aug  9 2016 09:37:48

编译安装完成。

怎样使用安全隧道SSH Tunnel

2016年10月12日 由 Amon 没有评论 »

怎样使用VPS实现穿越呢?

1. 下载

http://www.coolapk.com/apk/org.sshtunnel

2. 安装

使用360手机助手连接安卓手机后,在电脑上右键点击APK文件“安装到手机”

3. 设置

参考:http://feiyang.me/2011/04/android-use-ssh-tunnel/
参考:http://www.chenboy.com/andrews-proxy-access-using-ssh-tunnel

» 阅读更多: 怎样使用安全隧道SSH Tunnel

怎样编译安装PHP5.3.*

2016年9月30日 由 Amon 没有评论 »

因为调试一个老程序,不得不降级。

实证:

1. 适用于php-5.2.17,但是因为http2.4.*不再支持php5.2.*,需要将Apache降级到 httpd-2.2.31。

2. 适用于php-5.3.*

下载:http://museum.php.net/php5/

编译安装php-5.2.17:

wget http://museum.php.net/php5/php-5.2.17.tar.gz && tar zxvf php-5.2.17.tar.gz && cd php-5.2.17
./configure --prefix=/usr/local/php --with-config-file-path=/usr/local/php/etc --with-apxs2=/usr/local/apache2/bin/apxs
make
make install

编译安装php-5.3.3:

wget http://museum.php.net/php5/php-5.3.3.tar.gz && tar zxvf php-5.3.3.tar.gz && cd php-5.3.3
./configure --prefix=/usr/local/php --with-config-file-path=/usr/local/php/etc --with-apxs2=/usr/local/apache2/bin/apxs
make ZEND_EXTRA_LIBS='-liconv'
make install

编译安装php-5.3.29:

wget http://museum.php.net/php5/php-5.3.29.tar.gz && tar zxvf php-5.3.29.tar.gz && cd php-5.3.29
./configure --prefix=/usr/local/php --with-config-file-path=/usr/local/php/etc --with-apxs2=/usr/local/apache2/bin/apxs
make ZEND_EXTRA_LIBS='-liconv'
make install

报错1:

make时报错:

...
/root/php-5.3.3/ext/dom/node.c: In function ‘dom_canonicalization’:
/root/php-5.3.3/ext/dom/node.c:1903:21: error: dereferencing pointer to incomplete type
    ret = buf->buffer->use;
                     ^
In file included from /root/php-5.3.3/main/php.h:38:0,
                 from /root/php-5.3.3/ext/dom/node.c:26:
/root/php-5.3.3/ext/dom/node.c:1905:40: error: dereferencing pointer to incomplete type
     RETVAL_STRINGL((char *) buf->buffer->content, ret, 1);
                                        ^
/root/php-5.3.3/Zend/zend_API.h:545:20: note: in definition of macro ‘ZVAL_STRINGL’
   const char *__s=(s); int __l=l;  \
                    ^
/root/php-5.3.3/ext/dom/node.c:1905:5: note: in expansion of macro ‘RETVAL_STRINGL’
     RETVAL_STRINGL((char *) buf->buffer->content, ret, 1);
     ^
make: *** [ext/dom/node.lo] Error 1

参考:http://blog.csdn.net/moqiang02/article/details/19699557

有一个补丁可以解决问题:

cd /root/php-5.3.3/
curl -o php-5.x.x.patch https://mail.gnome.org/archives/xml/2012-August/txtbgxGXAvz4N.txt
patch -p0 -b < php-5.x.x.patch

输出:

patching file ext/dom/node.c
Hunk #1 succeeded at 1900 (offset 5 lines).
patching file ext/dom/documenttype.c
patching file ext/simplexml/simplexml.c
Hunk #1 succeeded at 1385 (offset -32 lines).

然后重新make。

报错2:

make时报错:

...
ext/iconv/iconv.o: In function `php_iconv_stream_filter_dtor':
/root/php-5.3.3/ext/iconv/iconv.c:2440: undefined reference to `libiconv_close'
ext/iconv/iconv.o: In function `_php_iconv_appendl':
/root/php-5.3.3/ext/iconv/iconv.c:337: undefined reference to `libiconv'
/root/php-5.3.3/ext/iconv/iconv.c:374: undefined reference to `libiconv'
...
collect2: error: ld returned 1 exit status

参考:http://blog.csdn.net/21aspnet/article/details/6925644

编辑 Makefile ,在 81 行末尾添加 -liconv :

EXTRA_LIBS = -lcrypt -lresolv -lcrypt -lrt -lm -ldl -lnsl -lxml2 -lz -llzma -lm -ldl -lxml2 -lz -llzma -lm -ldl -lxml2 -lz -llzma -lm -ldl -lcrypt -lxml2 -lz -llzma -lm -ldl -lxml2 -lz -llzma -lm -ldl -lxml2 -lz -llzma -lm -ldl -lcrypt -liconv

然后重新make,顺利通过; make install,顺利编译完成。

创建软连接:

ln -s /usr/local/php/bin/php /usr/local/bin/
ln -s /usr/local/php/bin/phpize /usr/local/bin/
ln -s /usr/local/php/bin/php-config /usr/local/bin/

查看版本:

php -v

输出:

PHP 5.2.17 (cli) (built: Jan  8 2017 04:31:43)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2010 Zend Technologies

怎样使用Scrapy

2016年9月29日 由 Amon 没有评论 »

参考:http://scrapy-chs.readthedocs.io/zh_CN/latest/intro/tutorial

创建一个项目:

scrapy startproject tutorial

输出:

New Scrapy project 'tutorial', using template directory '/usr/local/lib/python2.7/dist-packages/scrapy/templates/project', created in:
    /root/tutorial

You can start your first spider with:
    cd tutorial
    scrapy genspider example example.com

该命令将会创建包含下列内容的 tutorial 目录:

tutorial/
    scrapy.cfg # 项目的配置文件
    tutorial/ # 该项目的python模块。之后您将在此加入代码。
        __init__.py
        items.py # 项目中的item文件
        pipelines.py #  项目中的pipelines文件
        settings.py # 项目的设置文件.
        spiders/ # 放置spider代码的目录
            __init__.py
            ...

定义Item:

Item 是保存爬取到的数据的容器;其使用方法和python字典类似, 并且提供了额外保护机制来避免拼写错误导致的未定义字段错误。
通过创建一个 scrapy.Item 类, 并且定义类型为 scrapy.Field 的类属性来定义一个Item。

每个Item是一个指定的采集对象。每个url对应采集的一组Item。

首先根据需要从dmoz.org获取到的数据对item进行建模。 我们需要从dmoz中获取名字,url,以及网站的描述。

对此,在item中定义相应的字段。编辑 tutorial 目录中的 items.py 文件:

import scrapy

class DmozItem(scrapy.Item):
    title = scrapy.Field() # 标题
    link = scrapy.Field() # 链接
    desc = scrapy.Field() # 描述

编写第一个爬虫(Spider)

Spider是用户编写用于从单个网站(或者一些网站)爬取数据的类。其包含了一个用于下载的初始URL,如何跟进网页中的链接以及如何分析页面中的内容, 提取生成 item 的方法。

为了创建一个Spider,您必须继承 scrapy.Spider 类, 且定义以下三个属性:

name: 用于区别Spider。 该名字必须是唯一的,您不可以为不同的Spider设定相同的名字。

start_urls: 包含了Spider在启动时进行爬取的url列表。 因此,第一个被获取到的页面将是其中之一。 后续的URL则从初始的URL获取到的数据中提取。

parse() 是spider的一个方法。 被调用时,每个初始URL完成下载后生成的 Response 对象将会作为唯一的参数传递给该函数。 该方法负责解析返回的数据(response data),提取数据(生成item)以及生成需要进一步处理的URL的 Request 对象。

以下为我们的第一个Spider代码,保存在 tutorial/spiders 目录下的 dmoz_spider.py 文件中:

import scrapy

class DmozSpider(scrapy.spiders.Spider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    start_urls = [
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
    ]

    def parse(self, response):
        filename = response.url.split("/")[-2]
        with open(filename, 'wb') as f:
            f.write(response.body)

爬取:

进入项目的根目录,执行下列命令启动spider:

scrapy crawl dmoz

crawl dmoz 启动用于爬取 dmoz.org 的spider,您将得到类似的输出:

2016-09-29 09:34:20 [scrapy] INFO: Scrapy 1.1.3 started (bot: tutorial)
2016-09-29 09:34:20 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tutorial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'tutorial'}
2016-09-29 09:34:20 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2016-09-29 09:34:20 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-09-29 09:34:20 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-09-29 09:34:20 [scrapy] INFO: Enabled item pipelines:
[]
2016-09-29 09:34:20 [scrapy] INFO: Spider opened
2016-09-29 09:34:20 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-09-29 09:34:20 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-09-29 09:34:21 [scrapy] DEBUG: Crawled (200) <GET http://www.dmoz.org/robots.txt> (referer: None)
2016-09-29 09:34:21 [scrapy] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
2016-09-29 09:34:21 [scrapy] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
2016-09-29 09:34:21 [scrapy] INFO: Closing spider (finished)
2016-09-29 09:34:21 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 734,
 'downloader/request_count': 3,
 'downloader/request_method_count/GET': 3,
 'downloader/response_bytes': 15997,
 'downloader/response_count': 3,
 'downloader/response_status_count/200': 3,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 9, 29, 9, 34, 21, 909921),
 'log_count/DEBUG': 4,
 'log_count/INFO': 7,
 'response_received_count': 3,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2016, 9, 29, 9, 34, 20, 533029)}
2016-09-29 09:34:21 [scrapy] INFO: Spider closed (finished)

查看包含 [dmoz] 的输出,可以看到输出的log中包含定义在 start_urls 的初始URL,并且与spider中是一一对应的。在log中可以看到其没有指向其他页面( (referer:None) )。

除此之外,更有趣的事情发生了。就像我们 parse 方法指定的那样,有两个包含url所对应的内容的文件被创建了: Book , Resources 。

刚才发生了什么?

Scrapy为Spider的 start_urls 属性中的每个URL创建了 scrapy.Request 对象,并将 parse 方法作为回调函数(callback)赋值给了Request。

Request对象经过调度,执行生成 scrapy.http.Response 对象并送回给spider parse() 方法。