PubMed Central

PubMed Central is a free digital database of full-text scientific literature in biomedical and life sciences. It can be reached at.

It grew from the online Entrez PubMed biomedical literature search system. PubMed Central was developed by the U.S. National Library of Medicine as an online archive of biomedical journal articles. The full text of all PubMed Central articles is available for free. Some participating journals, however, delay release of their articles for a set time after paper publication (often 6 months).

As of June 2007, the archive contains approximately 1,000,000 items, including articles, editorials, letters, and so on. It appears to be growing by at least 7% per year.

The minutes of the NIH Board of Regents at indicate that as of September, 2004, PubMedCentral, PubMed, and related NLM service were handling approximately 1300 hits per second, and supplying 1.3 Terabytes of data per day; no doubt the numbers are much higher today (hit rates would likely rise faster than archive size).

Adoption
This repository has grown rapidly, as the U.S. NIH's "Policy on Enhancing Public Access to Archived Publications Resulting from NIH-Funded Research" is designed to make all research funded by NIH freely accessible to anyone, and, in addition, many publishers are working cooperatively with the NIH to provide free access to their works.

Some publishers may be reluctant to make works freely available online in any form, including archives such as PubMed Central, fearing that it may harm sales of the paper journal. However, freely accessible journals have been found to (a) be cited more often; (b) be cited earlier; and (c) generally include better papers (that is, better authors appear to prefer open access for their works). These benefits often lead to an increase in paper journal sales. Users also appear to prefer electronic journals (see Sathe, et al., "Print versus electronic journals: a preliminary investigation into the effect of journal format on research processes", J Med Libr Assoc. 2002 April; 90(2): 235–243. See also Open Access for more information on this debate.

A UK version of the PubMed Central system,UK PubMed Central (UKPMC) has been developed by the Wellcome Trust and the British Library as part of a nine-strong group of UK research funders. This system went live in January 2007.

The National Library of Medicine "NLM Journal Publishing DTD" is freely available, and appears to be the most widely used journal article markup language. The Association of Learned and Professional Society Publishers has held seminars titled A Standard XML Document Format: The case for the adoption of NLM DTD?, commenting that "it is likely to become the standard for preparing scholarly content for both books and journals." A related DTD is available for books.

The Library of Congress and the British Library have announced support for the NLM DTD. It has also been popular with journal service providers.

PubMed Central Technology
(not all information in this section has been verified)

Articles are sent to PubMed Central by publishers in XML or SGML, using a variety of article DTDs. Older and larger publishers may have their own established in-house DTDs, but many publishers use the NLM Journal Publishing DTD (see above).

Received articles are converted via XSLT to the very similar NLM Archiving and Interchange DTD. This process may reveal errors that are reported back to the publisher for correction. Graphics are also converted to standard formats and sizes. The original and converted forms are archived. The converted form is moved into a relational database, along with associated files for graphics, multimedia, or other associated data. Many publishers also provide PDF of their articles, and these are made available without change.

Bibliographic citations are parsed and automatically linked to the relevant abstracts in PubMed, articles in PubMed Central, and resources on publishers' Web sites. PubMed links also lead to PubMed Central. Unresolvable references, such as to journals or particular articles not yet available at one of these sources, are tracked in the database and automatically come "live" when the resources become available.

An in-house indexing system provides search capability, and is aware of biological and medical terminology such as generic vs. proprietary drug names; alternate names for organisms, diseases, anatomical parts; and so on.

Participating journals can also arrange to have their back-issues scanned, with the results also added to PubMed Central. Scanned issues are delivered as page images, but they are converted to text internally so the full text is searchable.

When a user accesses a journal issue, a Table of Contents is automatically generated by retrieving all articles, letters, editorials, etc. for that issue. When an actual item such as an article is reached, PubMed Central converts the NLM markup to HTML for delivery, and provides links to related data objects. This is feasible because the variety of incoming data has first been converted to standard DTDs and graphic formats.