Public health informatics

Public Health Informatics has been defined as the systematic application of information and computer science and technology to public health practice, research, and learning.

It is one of the subdomains of (bio)medical or health informatics.

In the same way that Public Health as a distinct field relates to healthcare generally, public health informatics is distinguished from healthcare informatics by emphasizing data about populations rather than that of individuals. The activities of public health informatics can be broadly divided into the collection, storage, and analysis of data of interest to the various activities of public health.

United States
In the United States, public health informatics is practiced by individuals in public health agencies at the federal and state levels and in the larger local health jurisdictions. Additionally, research and training in public health informatics takes place at a variety of academic institutions.

At the federal Centers for Disease Control and Prevention in Atlanta, Georgia, the National Center for Public Health Informatics (NCPHI) is charged with providing national leadership in public health informatics. The major initiative in this area at the beginning of the 21st century has been to promote and fund the definition, coordination, and implementation of the Public Health Information Network (PHIN).

The bulk of the work of public health informatics in the United States, as with public health generally, takes place at the state and local level, in the state departments of health and the county or parish departments of health. At a state health department the activities may include: collection and storage of vital statistics (birth and death records); collection of reports of communicable disease cases from doctors, hospitals, and laboratories, used for infectious disease surveillance; display of infectious disease statistics and trends; collection of child immunization and lead screening information; daily collection and analysis of emergency room data to detect early evidence of biological threats; collection of hospital capacity information to allow for planning of responses in case of emergencies. Each of these activities presents its own information processing challenge.

Collection of public health data
Before the advent of the internet, public health data in the United States, like healthcare data and other business data, were collected on paper forms and stored centrally at the relevant public health agency. If the data were to be computerized they required a distinct data entry process, were stored in the various file formats of the day and analyzed by mainframe computers using standard batch processing.

(TODO: describe CDC-provided DOS/desktop-based systems like TIMSS (TB), STDMIS (Sexually transmitted diseases); Epi-Info for epidemiology investigations; and others )

Since the beginning of the worldwide web, public health agencies with sufficient information technology resources have been transitioning to web-based collection of public health data, and, more recently, to automated messaging of the same information. In the years roughly 2000 to 2005 the Centers for Disease Control and Prevention, under its National Electronic Disease Surveillance System (NEDSS), built and provided free to states a comprehensive web and message-based reporting system called the NEDSS Base System (NBS). Many states and even larger counties have built their own versions of electronic disease surveillance systems, such as Pennsylvania's PA-NEDSS.

To promote interoperability, the CDC has encouraged the adoption in public health data exchange of several standard vocabularies and messaging formats from the health care world. The most prominent of these are: the Health Level 7 (HL7) standards for health care messaging; the LOINC system for encoding laboratory test and result information; and the Systematized Nomenclature of Medicine (SNOMED) vocabulary of health care concepts.

Since about 2005, the CDC has promoted the idea of the Public Health Information Network to facilitate the transmission of data from various partners in the health care industry and elsewhere (hospitals, clinical and environmental laboratories, doctors' practices, pharmacies) to local health agencies, then to state health agencies, and then to the CDC. At each stage the entity must be capable of receiving the data, storing it, aggregating it appropriately, and transmitting it to the next level. A typical example would be infectious disease data, which hospitals, labs, and doctors are legally required to report to local health agencies; local health agencies must report to their state public health department; and which the states must report in aggregate form to the CDC. Among other uses, the CDC publishes the Morbidity and Mortality Weekly Report (MMWR) based on these data acquired systematically from across the United States. (TODO: include the next step: CDC reports to the World Health Organization? I am not familiar.)

Major issues in the collection of public health data are: awareness of the need to report data; lack of resources of either the reporter or collector; lack of interoperability of data interchange formats, which can be at the purely syntactic or at the semantic level; variation in reporting requirements across the states, territories, and localities.

Storage of public health data
Storage of public health data shares the same data management issues as other industries. And like other industries, the details of how these issues play out are affected by the nature of the data being managed.

Due to the complexity and variability of public health data, like health care data generally, the issue of data modeling presents a particular challenge. While a generation ago flat data sets for statistical analysis were the norm, today's requirements of interoperability and integrated sets of data across the public health enterprise require more sophistication. The relational database is increasingly the norm in public health informatics. Designers and implementers of the many sets of data required for various public health purposes must find a workable balance between very complex and abstract data models such as HL7's Reference Information Model (RIM) or CDC's Public Health Logical Data Model, and simplistic, ad hoc models that untrained public health practitioners come up with and feel capable of working with.

Due to the variability of the incoming data to public health jurisdictions, data quality assurance is also a major issue.

Analysis of public health data
The need to extract usable public health information from the mass of data available requires the public health informaticist to become familiar with a range of analysis tools, ranging from business intelligence tools to produce routine or ad hoc reports, to sophisticated statistical analysis tools such as SAS and SPSS, to Graphical Information Systems (GIS) to expose the geographical dimension of public health trends.