Citeulike: A Researcher’s Social Bookmarking Service

2022年12月18日 | 分类: 【商业】

Kevin Emamy and Richard Cameron describe a tool that helps researchers gather, collect and share papers.

本文介绍了 Citeulike,它是基于 Web 的社交书签服务和传统书目管理工具的融合。它讨论了Citeulike如何将学术研究中固有的线性“收集,收集,共享”过程转变为循环的“收集,收集,共享和网络”过程,从而实现学术文献和研究论文的共享和发现。

什么是Citeulike?

Citeulike是一个基于网络的工具,可帮助科学家,研究人员和学者存储,组织,共享和发现学术研究论文的链接。自 2004 年 11 月以来,它一直作为免费的 Web 服务提供,并且像许多成功的软件工具一样,它是为了解决作者自己遇到的问题而编写的:

“为参考书目收集材料似乎需要大量的苦差事……所以,显而易见的想法是,如果我使用网络浏览器阅读文章,存储它们的最方便方式也是使用网络浏览器。当你考虑共同撰写论文的过程时,这一点变得更加有趣。[1]

该工具的基本功能很简单;当研究人员在网络上看到他们感兴趣的论文时,他们可以单击一个按钮并将指向该论文的链接添加到他们的个人图书馆中。
截图 (68KB) : 图 1 : Citeulike 首页

图1.Citeulike头版
屏幕截图 (57KB):图 2。发布页面

图2.发布页面

当用户发布论文时,Citeulike会自动提取引文详细信息并存储指向论文的链接以及一组用户定义的标签。然后,用户将返回到原始网页,他们可以在其中继续阅读。

Citeulike有一个基于标签的灵活的申请系统[2]。标签提供了一个开放、快速和用户定义的分类模型,可以产生有趣的新分类。

“标记的美妙之处在于,它利用了现有的认知过程,而不会增加太多的认知成本。在认知层面,人们已经进行了局部的、概念性的观察。标记将这些概念性观察与对整体分类方案的关注分离。’- 拉希米·辛哈 [3]

通过标记他们发布的论文,用户正在构建一个特定于领域的“大众分类法”,该术语以对他们自己和其他专业研究人员(在Citeulike的情况下)有意义的术语来描述他们正在书签的论文。

由于每个人的图书馆都存储在服务器上,因此可以从任何计算机访问它,使用户能够与他人共享他们的链接库,并查看还有谁为相同的论文添加了书签(他们的Citeulike“邻居”)。然后,他们可以点击查看其他用户的其余图书馆,并以这种方式发现与他们领域相关但他们可能不知道的文献。标签还提供了另一种简单的机制,用户可以在其中浏览图书馆并发现新论文。

RSS 源和监视列表允许用户跟踪他们感兴趣的标签和用户库,显示这些选定类别的最新添加内容。

除了浏览邻居的标签和图书馆外,用户还可以在Citeulike首页上发现论文,其中显示了已发布的最新论文(见上图1)。

另一个发现点是一组主题特定的页面,其中根据用户分类的主题显示发布论文的最新链接。这是目前一个简单的封闭分类,包括计算机科学,生物科学,社会科学,医学,工程,经济/商业,艺术/人文,数学,物理,化学,哲学和地球/环境科学。

在这些类别中,除了通过最新添加的论文显示外,还有一个投票系统,用户可以对他们认为感兴趣的特定论文进行投票,从而使该论文在最新论文列表中晋升。
屏幕截图 (72KB):图 3。计算机科学最新论文

图3.计算机科学最新论文

新兴的论文数据集、标签和两者之间的关系为调查和数据挖掘提供了许多有趣的途径。可以调查标签簇的模式,并且可以对具有相似标签的论文进行分组并公开它们之间的关系。已经有几个独立的项目正在进行中,以从Citeulike数据集进行分析。

重要的是要注意,标记最初是为了个人用户的个人利益而进行的,而社区利益是这种行为的结果。话虽如此,很明显,为社区做出贡献,或者至少为一个团体做出贡献,也是许多用户动机的重要组成部分(这可能会在选择标签词(个人与通用)时产生可能的二分法)。
收集、收集、分享

Citeulike融合了两类独立的软件:新的“Web 2.0”社交书签服务(del.icio.us[4]等)和传统的书目管理软件(EndNote等)。虽然 Web 书签是简单的 URL,但引文稍微复杂一些,包括期刊名称、作者、页码等元数据。

传统书目软件中的收集-收集-共享模型是一个线性过程。通过使用 Web 浏览器查询 OPAC 数据库或科学出版商的网站来收集文献。

然后,EndNote等桌面软件将允许用户收集他或她希望保留以备将来参考的文章,并且收集过程存储文章的足够元数据(标题,作者,期刊名称,页码),其格式允许通过在作者自己的出版物中引用它来最终与他人共享。

当出版物以印刷形式出现时,整个收集-收集-共享过程由不同机构的不同研究人员重新开始。

Citeulike扮演着两个角色:

首先,它使最终用户更容易收集信息的现有模型。Web浏览器是探索出版物列表的自然工具,我们的前提是它同样应该是收集书目记录的自然工具。社交书签服务(如 del.icio.us 允许用户将网页链接存储在在线帐户中 – 只需单击一个按钮即可完成。

另一方面,学术用户传统上不得不在 Web 浏览器和外部应用程序之间来回切换,因为他们在收集和收集模式之间交替。此过程既耗时又容易出错。

Citeulike通过像标准的社交书签服务(用户点击书签以将文章发布到他或她的帐户)来解决这个问题,但它也从出版商的网站中提取自动创建适当的书目记录所需的所有相关元数据。Citeulike支持大多数主要出版商[5],并且该过程的“收集”和“收集”步骤为用户无缝工作,而无需离开Web浏览器,没有传统上与保持个人书目数据库最新的苦差事。

收集、收集、分享、重复

Citeulike履行的第二个作用是,它实际上改变了发现和共享信息的传统方法。线性的收集-收集-共享过程已经变成了一个良性循环。由于用户的馆藏现在存储在 Web 服务器上,而不是锁定在台式计算机上的专有书目数据库中,因此 Citeulike 的用户现在可以浏览彼此的馆藏。
屏幕截图 (65KB):图 4。浏览用户的库

图4.浏览用户的库

Citeulike的用户可以浏览或搜索由具有相似兴趣的其他人添加书签的文章集。
屏幕截图 (46KB):图 5。搜索用户的库

图5.搜索用户的库

由于其专注于特定的利基市场(仅迎合学术文章),Citeulike对研究人员的价值超出了针对一般受众的服务。在这个利基市场中,新论文更容易被发现,相关的兴趣集群自然形成,标签可能对用户有意义。

利基市场中的标记也更加具体。例如,由应用于整个万维网的术语“演变”组成的标签可能对应于该词的许多可能解释。在同行评审文章的上下文中,这种标签的范围可能要窄得多,搜索该术语的用户将检索到更多有针对性的结果。

一个有趣的问题出现了:这种专业化应该走多远?是否需要为每个单独的学科或子学科提供单独的书签服务?

可以说,按标签过滤的单一统一书签系统是这个问题的答案,但专家服务的效用与Citeulike所证明的通用服务相比与此相悖。单独的服务对跨学科领域或用户没有好处,实际上也不利于发现不同学科的各个方面之间的关系。或许在选择学术研究的整体范畴作为专业时,Citeulike在这方面自然而然地陷入了理性的平衡。

如上所述,Citeulike通过将论文按主题领域进行用户分类(主要是为了增强发现)来部分解决这个问题。在撰写本文时,我们相信Citeulike是唯一尝试将标记与这样的封闭分类(尽管很简单)相结合的社交书签服务。

值得注意的是,从用户的角度来看,使用单个社交书签服务来存储他们所有的研究论文(而不是多个服务)要实用得多。从共享和协作的角度来看也是如此。出于这个原因,我们认为期刊出版商和数据库提供商应该链接到Citeulike等服务,以便为其订阅者提供书签功能。此外,Citeulike用户之间自然共享和发现论文对出版商的产出具有明显的促销优势。由于Citeulike是一项独立的服务,因此发布到Citeulike的论文更有可能覆盖特定出版商自然选区以外的受众。
收集、收集、共享、网络

如前所述,Citeulike的“浏览器中的所有内容”模型的进一步后果是,用户将不可避免地发现具有相似兴趣的其他人。两个用户阅读相似文献的事实可能表明他们可能对彼此有专业兴趣。书目数据形成了将人们联系在一起的结构。

同一领域的研究人员之间建立了专业网络。研究人员不要求设施与朋友聊天,而是欢迎让他们协作执行与工作相关的任务的工具。为了进一步实现这一目标,Citeulike提供了小组,允许已经有工作关系并且正在合作出版的人共享他们的书目数据库。

此外,它还允许具有共同主题的全球分布的研究人员建立他们认为相关的共享文献集合。

发布商特定数据

除了自己浏览Citeulike之外,出版商还可以从其特定期刊的Citeulike用户使用的标签数据库中获取提要。
屏幕截图 (67KB):图 6。《科学》杂志上被标记为“生物信息学”的文章列表

图6.《科学》杂志上被标记为“生物信息学”的文章列表

出版商可以通过在其出版物的文章级别向Citeulike链接添加帖子来鼓励在Citeulike上分享论文。除了链接之外,发布商还可以选择在其网站上显示使用的代码和在文章级别标记的用户数量。
建筑与未来方向

在技术架构方面,该软件建立在PostgreSQL,Tcl和Memcached之上。数据库和 Web 服务器驻留在可扩展、冗余、专业托管的 Linux 服务器上,数据库每 15 分钟备份一次。Citeulike网站的设计简单,清晰且实用。这不应该因为粗心添加过多的额外功能而丢失。

与其他服务不同,Citeulike非常没有垃圾邮件链接,防止垃圾邮件发送者入侵的技术设计决策将继续成为活动的焦点。

当前的开发计划包括构建现有组的功能,使用户更容易分离个人和组关联的书签,并通过引入标签捆绑等内容来扩展和改进标记工具,这将提供更高水平的用户定义分类并启用大规模标记编辑操作。私人书签是一个功能请求,到目前为止一直被拒绝,以保持Citeulike面向社区的理念,但是可能有充分的理由为什么某些研究人员希望保持书签的私密性,这正在审查中。

使用开源组件,今天构建和运行Web服务的成本出奇地低,这在过去需要数百万英镑的投资,这肯定是一个将继续的趋势。Citeulike受益于这一趋势;然而,作者确实打算使其成为一种自我维持的资源。有许多方法可以实现这一点,但它们都取决于用户群的持续扩大,这是目前集中精力的地方。Citeulike通过口口相传病毒式增长到目前的规模:33,000名用户每月产生200,000次不同的访问(请参阅下面的当前统计数据)。创造这种规模的网络效应继续加速。促进其使用的明显且侵入性最小的方法是通过图书馆和信息管理专业人员,以及上面提到的出版商的链接。

未来实验最有趣的领域是挖掘正在创建的标签和文章数据。是否有可能使用书签和标记的大规模数据集来补充传统的同行评审和引文分析?这是一个很难解决的问题,但在形成的模式中一定有一些隐含的群体知识。达里奥·塔拉博雷利(Dario Taraborelli)就此写了一篇发人深省的文章:

“协作元数据不能提供与标准选择过程相同的保证(只要它们不依赖于专家的审查,并且对偏见和操纵的免疫力较低)。然而,它们是大规模制作科学内容的评估性表示的有趣解决方案。[6]

结论

Citeulike是一种在学术界获得大量受众的工具。通过帮助用户跟踪自己的书目,自然而然地创造了一个有利于学术文献共享和消费的环境。出版商可以鼓励Citeulike在读者中的使用,从而受益于其内容的曝光率提高和用户对内容的更大参与度。许多人通过在文章级别放置发布链接来做到这一点,并且很快就会在其网站上显示统计数据以及来自Citeulike的流行标签。

通过分析新兴的标签数据集可以获得见解,并且给定足够大的数据集,可能会出现科学文献的发现和评级的补充形式。

最终,Citeulike之所以有效,是因为它对用户有用。它自动化了重复的书目管理任务,并通过社会中介的论文检索和发现,为学术文献的搜索引擎和数据库提供了一个补充的替代方案。
当前统计数据

截至2007年3月13日,Citeulike目前有33 000名注册用户,并以每天100名的速度获得新的注册(6个月前每天50名)。在这33,000人中,45%的人继续在网站上发布文章,许多人只是“潜伏”(即浏览其他用户的图书馆但不发布自己),有些就消失了。

Citeulike每月收到超过200,000次不同的访问(由Google Analytics定义为唯一用户的一组页面浏览量,在30分钟不活动后超时),每次访问平均产生2.77次页面浏览量。在这 200,000 个中,大约 40,000 个是以前多次访问过该网站的独特用户的访问。

目前数据库中发布了 505,402 个项目(如果 n 人发布同一篇文章,则计算 n);1,676,130 个标签(如果文章应用了“n”个标签,则计算 n);以及用作标签的 130,548 个不同单词。这些数字呈指数级增长。

有超过800个用户创建的特殊兴趣小组。

Citeulike拥有国际受众,并已被翻译成8种不同的语言,包括日语和中文(按国家划分的最大单个用户群体是美国)。

This article describes Citeulike, a fusion of Web-based social bookmarking services and traditional bibliographic management tools. It discusses how Citeulike turns the linear ‘gather, collect, share’ process inherent in academic research into a circular ‘gather, collect, share and network’ process, enabling the sharing and discovery of academic literature and research papers.

What is Citeulike?

Citeulike is a Web-based tool to help scientists, researchers and academics store, organise, share and discover links to academic research papers. It has been available as a free Web service since November 2004 and like many successful software tools, it was written to solve a problem the authors were experiencing themselves:

‘Collecting material for a bibliography is something which appeared to require an amazing amount of drudgery….So, the obvious idea was that if I use a web browser to read articles, the most convenient way of storing them is by using a web browser too. This becomes even more interesting when you consider the process of jointly authoring a paper.‘[1]

The basic functionality of the tool is simple; when a researcher sees a paper on the Web that interests them, they can click a button and have a link to it added to their personal library.

 screenshot (68KB) : Figure 1 : Citeulike front page

Figure 1. Citeulike front page

screenshot (57KB) : Figure 2. Posting page

Figure 2. Posting page

When a user posts a paper, Citeulike automatically extracts the citation details and stores a link to the paper, along with a set of user-defined tags. The user is then returned to the original Web page, where they can continue reading.

Citeulike has a flexible filing system, based on the tags [2]. Tags provide an open, quick and user-defined classification model that can produce interesting new categorisations.

the beauty of tagging is that it taps into an existing cognitive process without adding add much cognitive cost. At the cognitive level, people already make local, conceptual observations. Tagging decouples these conceptual observations from concerns about the overall categorical scheme. ‘- Rahshmi Sinha [3]

By tagging papers they post, users are building a domain-specific ‘folksonomy’ that describes the paper they are bookmarking in terms that are meaningful to themselves and usually other specialist researchers (in Citeulike’s case).

Because everyone’s library is stored on the server, it is accessible from any computer, enabling users to share their link library with others and see who else has bookmarked the same papers (their Citeulike ‘neighbours’). They can then click through to see the rest of these other users’ libraries and in this way discover literature that is relevant to their field but of which they may have been unaware. Tags also provide another simple mechanism whereby users can navigate the libraries and discover new papers.

RSS feeds and Watchlists allow users to track tags and users’ libraries that interest them, showing the latest additions to these chosen categories.

As well as browsing their neighbours’ tags and libraries, users can discover papers on the Citeulike front page where the latest papers that have been posted are displayed (see Figure 1. above).

Another point of discovery is the set of subject specific pages, where the latest links to papers posted are displayed according to the subject under which users have classified them. This is currently a simple closed classification consisting of Computer Science, Biological Science, Social Science, Medicine, Engineering, Economics/Business, Arts/Humanities, Mathematics, Physics, Chemistry, Philosophy and Earth/Environmental Science.

Within these categories, as well as display of papers by latest addition, there is a voting system whereby users can vote on a particular paper that they find interesting, resulting in that paper’s promotion up the list of latest papers.

screenshot (72KB) : Figure 3. Latest papers in Computer Science

Figure 3. Latest papers in Computer Science

The emerging dataset of papers, tags and the relations between the two offers many intriguing avenues for investigation and data mining. Clusters of tags can be investigated for patterns, and papers with similar tags can be grouped and relations between them exposed. There are already several independent projects under way to produce analyses from the Citeulike dataset.

It is important to note that the tagging is initially done for the individual user’s personal benefit and the community benefits arise as a consequence of this behaviour. Having said that, it is also clear that contributing to the community, or at least to a group, is also an important part of the motivation of many users (which can create a possible dichotomy in the choice of tag words (personal vs generic)).

Gather, Collect, Share

Citeulike fuses together two separate categories of software: the new ‘Web 2.0’ breed of social bookmarking services (del.icio.us [4] etc) and traditional bibliographic management software (EndNote etc). While Web bookmarks are simple URLs, citations are a bit more complex and include metadata like journal names, authors, page numbers etc.

The gather-collect-share model found in traditional bibliographic software is a linear process. Gathering literature is conducted by querying an OPAC database or a scientific publisher’s site using a Web browser.

Desktop software such as EndNote will then allow the user to collect the articles which he or she wishes to keep for future reference, and the collection process stores sufficient metadata for the article (the title, authors, journal name, page number) in a format which allows for its ultimate sharing with others by citing it in the author’s own publication.

When the publication appears in print, the whole gather-collect-share process starts again with a different researcher in a different institution.

Citeulike fulfils two roles:

Firstly, it makes the existing model of collecting information easier for the end-user. A Web browser is the natural tool for exploring lists of publications, and our premise was that it ought equally to be the natural tool with which to collect bibliographic records. Social bookmarking services such as del.icio.us allow the user to store links to Web pages in an online account – all at the click of a button.

On the other hand, academic users have traditionally had to switch back-and-forth between Web browser and external application as they alternate between gathering and collecting modes. This process is time-consuming and error-prone.

Citeulike solves this problem by operating like a standard social bookmarking service (the user clicks a bookmarklet in order to post an article to his or her account), but it also extracts all the relevant metadata required to create a proper bibliographic record automatically from the publisher’s site. Citeulike supports most of the major publishers [5], and the ‘gather’ and ‘collect’ steps of the process work seamlessly for the user without having to leave the Web browser, with none of the drudgery traditionally associated with keeping one’s personal bibliographic database up to date.

Gather, Collect, Share, Repeat

The second role fulfilled by Citeulike is that it has actually changed the traditional method for discovering and sharing information. The linear gather-collect-share process has turned into a virtuous circle. Because users’ collections are now stored on a Web server rather than in a proprietary bibliographic database locked away on a desktop computer, it is now possible for users of Citeulike to browse each other’s collections.

screenshot (65KB) : Figure 4. Browsing a user's library

Figure 4. Browsing a user’s library

Users of Citeulike can browse or search through collections of articles bookmarked by other people with similar interests.

screenshot (46KB) : Figure 5. Searching a user's library

Figure 5. Searching a user’s library

Due to its specialisation in a particular niche (only catering for academic articles), Citeulike has value to researchers beyond a service aimed at a generalised audience. Within this niche, new papers are more easily discovered, relevant clusters of interest are naturally formed and the tags are likely to be meaningful to users.

The tagging within a niche is also more specific. For example, a tag consisting of the term ‘evolution’ applied to the World Wide Web as a whole could correspond to many possible interpretations of the word. Within the context of peer-reviewed articles, the scope of such a tag is likely to be much narrower, and users searching on that term will retrieve many more targeted results.

An interesting question arises: how far should this specialisation be taken? Is there a requirement for a separate bookmarking service for each individual academic discipline or sub-discipline?

It could be argued that a single unified bookmarking system filtered by tags is the answer to this problem, but the utility of a specialist service versus a generalised one as demonstrated by Citeulike weighs against this. Separate services would not be of benefit to cross-disciplinary fields or users, or indeed the discovery of relations between aspects of separate disciplines. Perhaps in choosing the overall category of academic research as a specialisation Citeulike has fallen naturally into a rational balance in this regard.

As noted above, Citeulike partly addresses this issue with the user classification of papers into subject areas (primarily to enhance discovery). At the time of writing, we believe that Citeulike is the only social bookmarking service that has attempted to combine tagging with a closed classification like this (albeit a simple one).

It is worth noting that, from users’ point of view, it is far more practical to have a single social bookmarking service to store all their research papers, from whatever source (rather than several services). This is also true from the sharing and collaborative point of view. For this reason, we would argue that journal publishers and database providers should link to services like Citeulike in order to provide bookmarking functionality for their subscribers. Additionally, the natural sharing and discovery of papers amongst users on Citeulike has obvious promotional benefits for the output of publishers. Because Citeulike is an independent service, papers posted to Citeulike are more likely to reach an audience beyond a particular publisher’s natural constituency.

Gather, Collect, Share, Network

As already noted, a further consequence of the “everything in your browser” model of Citeulike is that users will inevitably discover others with similar interests. The fact that two users read similar literature probably indicates that they will potentially have a professional interest in each other. The bibliographic data forms a fabric binding people together.

Professional networks build up between researchers in the same field. Rather than requiring the facility to chat to friends, researchers welcome tools which let them carry out tasks associated with their work collaboratively. To further serve that end, Citeulike provides groups, allowing people who already have working relationships and are, say, collaborating on a publication to share their bibliographic databases.

Additionally, it allows for globally distributed researchers with a common theme to build up a shared collection of literature they find relevant.

Publisher-specific Data

As well as browsing Citeulike themselves, publishers can obtain feeds from the database of tags being used by Citeulike users of their particular journals.

screenshot (67KB) : Figure 6. A list of articles from the journal 'Science' which have been tagged 'bioinformatics'

Figure 6. A list of articles from the journal ‘Science’ which have been tagged ‘bioinformatics’

Publishers can encourage the sharing of papers on Citeulike by adding a post to Citeulike link at the article level on their publications. Alongside the link, publishers could also choose to display the tags used and number of users tagging at the article level on their sites.

Architecture and Future Directions

In terms of technical architecture, the software is built on PostgreSQL, Tcl and Memcached. The database and Web servers resides on scalable, redundant, professionally hosted Linux servers and the database is backed up every 15 minutes. The design of the Citeulike Web site is simple, clear and functional. This should not be lost through careless addition of excessive extra functionality.

Unlike other services, Citeulike is remarkably free of spam links and the technical design decisions that have prevented spammers’ invasions will continue to be a focus of activity.

The current development schedule includes building on the existing group’s functionality, to make it easier for users to separate personal and group associated bookmarks and extending and improving on the tagging tools by introducing things like tag bundling which would give a further level of user-defined classification and enable mass tag-editing operations. Private bookmarks are a feature request that has been resisted so far in order to keep with Citeulike’s community-orientated philosophy, however there are probably good reasons why certain researchers wish to keep their bookmarks private and this is under review.

Using open source components, it costs surprisingly little to build and run Web services today which in the past would have required millions of pounds of investment, and this is surely a trend that will continue. Citeulike benefits from this trend; however the authors do intend to make it a self-sustaining resource. There are a number of ways in which this could be achieved however they all depend on a continued expansion of the user base, which is where efforts are concentrated at the moment. Citeulike has grown virally through word of mouth to its current size: 33,000 users generating 200,000 distinct visits per month, (see Current Statistics below). The network effects that created this scale continue to accelerate. The obvious and least intrusive way to promote its use is through library and information management professionals, as well links from publishers alluded to above.

The most intriguing area for future experimentation is mining the tag and article data that is being created. Is it possible that large-scale datasets from bookmarking and tagging can be used to supplement traditional peer review and citation analysis? This is a hard problem to solve, but there must be some implicit crowd knowledge in the patterns formed. Dario Taraborelli has written a thought provoking post on this:

Collaborative metadata cannot offer the same guarantees as standard selection processes (insofar as they do not rely on experts’ reviews and are less immune to biases and manipulations). However, they are an interesting solution for producing evaluative representations of scientific content on a large scale.’ [6]

Conclusion

Citeulike is a tool that has gained a significant audience in the academic community. Through helping users keep track of their own bibliographies, it naturally creates an environment that facilitates sharing and consumption of academic literature. Publishers can encourage Citeulike’s use amongst their readers, thereby benefiting from enhanced exposure for their content and greater user engagement with content. Many are doing this by placing posting links at the article level on their content, and will soon be displaying statistics as well as popular tags from Citeulike on their sites.

Insights can be gained through analysis of the emerging dataset of tags, and given a sufficiently large dataset, supplemental forms of discovery and rating of scientific literature could emerge.

Ultimately Citeulike works because it is useful to its users. It automates a repetitive bibliographic management task and it offers a complimentary alternative to search engines and databases of academic literature through socially mediated retrieval and discovery of papers.

Current Statistics

As of 13 March 2007, Citeulike currently has 33,000 registered users and is gaining new registrations at the rate of 100 per day (up from 50 per day 6 months ago). Of that 33,000, 45% go on to post articles to the site, many simply ‘lurk’ (i.e. browse other users’ libraries but do not post themselves), and some disappear.

Citeulike receives in excess of 200,000 distinct visits (defined by Google Analytics as a set of page views by a unique user with a timeout after 30 minutes of inactivity) per month, with each visit generating an average of 2.77 page views. Of that 200,000 around 40,000 are visits from unique users who have previously visited the site on multiple occasions.

There are currently 505,402 items posted in the database (counting n if n people post the same article); 1,676,130 tags (counting n if there are ‘n’ tags applied to an article); and 130,548 distinct words used as tags. These numbers are growing exponentially.

There are over 800 user-created special interest groups.

Citeulike has an international audience and has been translated (by enthusiastic users) into 8 different languages including Japanese and Chinese (the largest single group of users by country is the USA).

References

    1. Citeulike FAQ http://www.citeulike.org/faq/all.adp
    2. Wikipedia entry on Tags http://en.wikipedia.org/wiki/Tags
    3. Rashmi Sinha blog post ‘A cognitive analysis of tagging’ http://www.rashmisinha.com/archives/05_09/tagging-cognitive.html
    4. del.icio.us, the grandfather of social bookmarking sites: http://www.del.icio.us
    5. Supported sites:
      AIP Scitation; Amazon; American Chem. Soc. Publications; American Geophysical Union; American Meteorological Society; Anthrosource; Association for Computing Machinery (ACM) portal; BMJ; BioMed Central; Blackwell Synergy; CSIRO Publishing; CiteSeer; Cryptology ePrint Archive; HighWire; IEEE Xplore; Ingenta; IngentaConnect; IoP Electronic Journals; JSTOR; MathSciNet; MetaPress; NASA Astrophysics Data System; Nature; PloS; PLoS Biology; Physical Review Online Archive; Project MUSE; PubMed; PubMed Central; Science; ScienceDirect; Scopus; SpringerLink; Wiley InterScience; arXiv.org e-Print archive.
    6. Dario Taraborelli (2007). ‘Soft Peer review, social software and distributed evaluation’
      http://www.academicproductivity.com/blog/2007/soft-peer-review-social-software-and-distributed-scientific-evaluation/

{C}{C}

Author Details

Kevin Emamy
Citeulike

Email: kevin@citeulike.org
Web site: http://www.citeulike.org/

Richard Cameron
Citeulike

Email: Richard@citeulike.org
Web site: http://www.citeulike.org/

Return to top