Secondary data

In research, secondary data is data collected and possibly processed by people other than the researcher in question. Common sources of secondary data for social science include censuses, large surveys, and organizational records. In sociology primary data is data you have collected yourself and secondary data is data you have gathered from primary sources to create new research. In terms of historical research, these two terms have different meanings. A primary source is a book or set of archival records. A secondary source is a summary of a book or set of records.

Secondary data analysis
There are two different types of sources that need to be established in order to conduct a good analysis. The first type is a primary source which is the initial material that is collected during the research process. Primary data is the data that the researcher is collecting themselves using methods such as surveys,direct observations, interviews, as well as logs(objective data sources). Primary data is a reliable way to collect data because the researcher will know where it came from and how it was collected and analyzed since they did it themselves. Secondary sources on the other hand are sources that are based upon the data that was collected from the primary source. Secondary sources take the role of analyzing, explaining, and combining the information from the primary source with additional information.

Secondary data analysis is commonly known as second-hand analysis. It is simply the analysis of preexisting data in a different way or to answer a different question than originally intended. Secondary data analysis utilizes the data that was collected by someone else in order to further a study that you are interested in completing.

Common sources of secondary data are social science surveys and data from government agencies, including the Bureau of the Census, the Bureau of Labor Statistics and various other agencies. The data collected is most often collected via survey research methods. Data from experimental studies may also be used.

Sources of secondary data
Sources of secondary data may be classified into qualitative and quantitative. Examples of qualitative sources are biographies, memoirs, newspapers, etc. Quantitave sources incude published statistics (e.g., census, survey), data archives, market research, etc. Today, with the aid of our internet capabilities, thousands of large scale datasets are at the click of a mouse for secondary data analyst. Globally, there are many sources available. These sources can arrive from the data arranged by governmental and private organizations, to data collected by any social researcher. Secondary data analysis is a growing research tool in our modern day society. Social scientists have the opportunity to explore massive amounts of secondary data. Sources for these data are vast, yet there are some concrete sources in need of mentioning:

U.S. Bureau of the Census

The United States Government has kept track of the census of the population for over two hundred years. Moreover, the census includes housing, the labor force, manufacturers, business, agriculture, foreign aspects, and so on. Census data can be used for a number of research questions. For example, a researcher can study the behavior of persons not only in one state, or one region, but they can specifically study a small area such as one city block. Anyone has access to the large amounts of statistics, and information on the nearly one hundred surveys conducted by the bureau, by visiting their Web site at (http://www.census.gov).

Integrated Public Use Microdata Series

Samples of U.S. census gathered for over one hundred years, including historical census files from other countries, are available through this microdata series. These samples are on an individual level. They are available at the University of Minnesota’s Minnesota Population Center. These data provide codes and names for all samples in an easy-to-use format. Samples can include demographic measures, educational, occupational, and all work indicators. You can view this data at www.ipums.umn.edu.

Bureau of Labor Statistics

This source collects data on employment, industrial relations, prices, earnings, living conditions, occupational safety, technology, and productivity. Reports are published each month in this Bureau and they can be viewed at (http://stats.bls.gov).

International Data Sources

This is a strong source for comparative researchers, and can deal with economic and political aspects, including political events across many other nations. The U.S. is involved with the use of our Social Security Administration, reports from this administration can be used to classify other nations. In Europe, a Eurobarometer Survey Series is used to publish reports on social and political attitudes.

ICPSR

Outside of the federal government, the ICPSR is the most extensive source for secondary data. Derived from the University of Michigan, this source includes hundreds of institutions located on other continents, not to mention over 325 colleges and universities in North America. These datasets can be viewed by visiting, http://www.icpsr.umich.edu.

Qualitative Data Sources

Qualitative datasets are far less available as secondary data sources. Several university-based secondary data sources are qualitative, yet at the same time, most have limited access to them. Cross-cultural research is made available through the Human Relations Area Files, a Yale University source. ICPSR also carries some qualitative sources, but the data in these sources can be difficult to interpret.

Combining Data with Secondary Data
Where It's Used

For what different purposes can data from archives be used? The first and simplest case would be for descriptive purposes, such as a phone book. A particular contribution of the data archives can be made to comparative research, both, across nations and over time. In the early years of data archives, when secondary analysis was not yet a popular research strategy, the idea of comparative research based on archival data was promoted in conferences already some 40 years ago. In the first case this would allow for comparative analysis over time, in the second for comparative analysis across societies or nations. Therefore, the design of comparative surveys is crucial for making empirical knowledge cumulative over space and time.

Combing Data From a Different Source with Different Time Periods

Equally important are longitudinal studies which can be compiled over time. For example, in a research project on "Attitudes Towards Technology" it is of crucial importance to include data collected in the fifties and sixties in order to answer the research question whether potential threats from new technologies have decreased the level of technology acceptance or whether tendencies to reject new developments concentrate on particular technologies only, and if so, under what circumstances.

Combining Existing Secondary Data Sources with New Primary Data Sources

Imagine that we could get hold of a good collection of surveys taken in earlier years, such as detailed studies about changes going on in this phase and hopefully additional studies in the years to come. Analyzing this data base over time could give us a good picture of what changes actually have taken place in the orientation of the population and of the extent to which new technical concepts did have an impact on subgroups of the population. Furthermore, data archives can help to prepare studies on change over time by monitoring what questions have been asked in earlier years and alerting principal investigators to important questions which should be repeated in planned research projects. Actually, data archives should consider including funds in their budgets which allow them to collect data for relevant questions in order to avoid interruptions in important time series.

Technical Challenges in Combining Data Sets

A number of methodological and technical requirements have to be observed and should be implemented rigorously. Just to mention the most important: Some methodologists require that the questions should be functionally equivalent, whereas others claim that the question texts must be phrased identically. Frequently, it is not the linguistic identity which matters. Sometimes it is much more important, whether the questions are understood by the respondent in the same way. Thus, a thermometer or scale used as a representation for intensity of attitudes in the more developed societies may be replaced by a ladder in less developed societies. Both, thermometer and ladder, would still measure the same dimension in the conceptual world of the respective respondents. A second requirement would be comparability of samples, thus, a cross-national representative random sample would be hard to compare with the local quota sample in one community in a different nation. Several other factors have to be controlled as well, in particular contextual influences at the time of field work or political or environmental events, which are related to the topic of the research.

Collecting, reviewing, and analyzing secondary data
The Design and Purpose of Research

Secondary data analysis consists of collecting data that was compiled through research by another person and using that data to get a better understanding of a concept. A good way to begin your research using secondary data that you are collecting to further support your concept is to clearly define the goals of your research and the design that you anticipate using. . An important thing to remember when defining your plan is to ensure that you have established what kind of data you plan on using for your research and the exact goal. Establishing what type of research design is an important component. In terms of using secondary data for research it helps to create an outline of what the final product will look like consisting of all the types of data to be used along with a list of sources that were used to compile the research. In order to use secondary data three steps must be completed: 1. locate the data 2. evaluate the data 3. verify the data

Locating the data can be easily done with the advancements of searching sources online. However, people need to be aware of the details when searching online since pages can be out of date or poorly put together. Therefore, use caution and pay attention to whether it is a reliable data source online and check when the last update was. To evaluate the data a researcher must carefully examine the secondary data they are considering to ensure that it meets their needs and purpose of study. The person must look at the population and what the sample strategy and type were. It is also important to look at when the data was collected, how it was collected, how it was coded and edited, along with the operational definitions of measures that were used. Finally, the data must be verified to ensure good quality material to be used in new research.

Determining the Types of Data and Information Needed to Conduct Analysis

Data and information collection for secondary data analysis will depend entirely upon the subject that is central to the focal point of the study. The purpose of conducting secondary data analysis is to further develop an improved understanding of the subject matter at hand. Some important types of data and information that should be collected and summarized include demographic information, information gathered by government agencies (i.e. the Census), and social science surveys. There is also the possibility of reanalyzing data that was collected in experimental studies or data collected with qualitative measures that can be applied in secondary data analysis. The most important component is to ensure that the information and data being collected needs to relate to the subject of study.

Determine the Quality of Sources of Data

In secondary data analysis, most individuals who do not have much experience in research training or technical expertise can be trained accordingly. However, this advantage is not without difficulty as the individual must be able to judge the quality of the data or information that has been gathered. These key tips will assist you in assessing the quality of the data: Determine the original purpose of the data collection, attempt to discover the credentials of the source(s) or author(s) of the information, consider if the document is a primary or secondary source, verify that the source well-referenced, and finally find out the: date of the publication; the intended audience, and coverage of the report or document.

Challenges of secondary data analysis
Advantages

Using secondary data can allow for the analyses of social processes in what would otherwise be inaccessible settings. It also saves time and money since the work has already been done to collect the data. That lets the researcher avoid problems with the data collection process. Using someone else's data can also facilitate a comparison with other data samples and allow multiple sets of data to be combined. There is also the chance that other variables could be included, resulting in a more diverse sample than would have been feasible before.

Disadvantages

There are several things to take into consideration when using preexisting data. Secondary data does not permit the progression from formulating a research question to designing methods to answer that question. It is also not feasible for a secondary data analyst to engage in the habitual process of making observations and developing concepts. These limitations hinder the ability of the researcher to focus on the original research question. Data quality is always a concern because it's source may not be trusted. Even data from official records may be bad because the data is only as good as the records themselves. There are six questions that a secondary analyst should be able to answer about the data they wish to analyze.

1.What were the agency's or researcher's goals when collecting the data?

2.What data was collected and what is it supposed to measure?

3.When was the data collected?

4.What methods were used? Who was responsible and are they available for questions?

5.How is the data organized?

6.What information is known about the success of that data collection? How consistent is the data with data from other sources?

Examples of Secondary Data in Use of Current Research
Gapminder

Gapminder is a website that makes excellent use of secondary data by compiling already collected data (secondary data) to show trends across the world between different populations or social situations depending upon the topic of interest.

http://www.gapminder.org/

PEW Internet & American Life Project data page

PEW is a webpage that contains links to their Project's survey datasets that are available in SPSS. It is a helpful source for using the sites raw SPSS data outputs (secondary data) to enhance a research study being conducted to answer a different question.

http://www.pewinternet.org/

Substance Abuse and Dependence Treatment in Outpatient Physicians Offices,1997-2004 Study

Secondary data is used in a variety of ways to further enhance current research projects being conducted today. An example of secondary data being used to develop new research is this study examining patient, physician, and the visit characteristics that are connected with the treatment for substance abuse in outpatient doctors visits. In this study the researchers methods for data collection consisted of obtaining data from the 1997-2004 National Ambulatory Medical Care Survey (secondary data since it was already collected).

This is a survey that is conducted by the National Center for Health and Statistics and is done annually on randomly selected office-based physicians in the U.S.. By analyzing and performing statistical analysis on certain components of the surveys from 1997-2004 they found that females and the elderly were less likely to receive substance abuse treatment and that African Americans and Hispanics were more likely than Whites to have no access to substance abuse treatments. The researchers were able to conclude that with an increase in screening, especially on existing patients, that it would decrease gender, age, and racial discrepancies in the diagnosis and treatment of ailments.

Ethnicity, discrimination and health outcomes: a secondary analysis of hospital data from Victoria, Australia

In this study, secondary data was used in the form of hospital discharge abstracts for the state of Victoria in Australia. The variables that were looked at were a person's country of birth and the quality of care they received in a universal health care system. It was secondary data because it had already been collected by the hospital in the way of their charts and discharge abstracts. The researchers were simply looking at the data and the relationship between the listed country of birth and what type of care was listed. The goal of the research was to explore the relationship between a person's ethnic background and the amount of care they received from the hospital. The researchers were interested in developing a preliminary set of data that would allow them to develop methods to study the issue further.

The discharge abstracts contained demographic and clinical information about each patient. From the abstracts the researchers separated the patients into three groups. The first being Australian or English speaking patients. The second group consisted of patients who did not visibly appear to be a minority, e.g. people from Europe, South and Central America. The third group contained people that were visible minorities, e.g. Middle Easterners, Asians, Africans, and Pacific Islanders.