Data curation is a subset of activities within the greater scope of e-science.
Because e-science generates information that is often desirable to organize, share, and preserve for future use, it follows that information professionals are affected by a shift in how this research is conducted. Librarians can now become involved further “upstream” in the research process, helping faculty create data management plans, offer advice in how to organize and curate data, and consult on ways to make data accessible and/or archived.
Assisting with data and the research process can be viewed as a shift in how information science can be applied to research. It can offer elements of collection development, cataloging, reference, integrated systems and more. Collaborations with researchers related to their data can also open up new partnerships related to grant proposals, teaching, and publication. As other aspects of librarianship may have less importance in the digital age, we should take this opportunity to migrate or evolve into these new roles.
No - data can live in analog form, such as lab notebooks. It must, however, be in the form of a digital file in order to be uploaded to a repository like PURR.
In the past, subject specialists have worked or are working with College of Agriculture, College of Engineering, College of Science, College of Liberal Arts, and School of Veterinary Medicine as well as various research centers. This is NOT a comprehensive list.
There always needs to be a primary point person, but multiple people can be involved with taking care of data. This can include subject specialists, data services specialists, information technology/digital initiatives specialists, administrators, departmental faculty, and departmental affiliates (staff, students, etc).
FAQ for Librarians/Specialists and Data Projects
From “Guideline for Sabbaticals in Purdue University Libraries”:
Operative Definition of LIS Research (must meet one of the following)
To put it simply, collaborative or embedded research is a means by which we can apply library science principles to other discipline’s problems. In this way, we are contributing to the interdisciplinary nature of the research.
Almost any research nowadays involves generating data, and quite likely researchers in many domains have problems organizing, archiving, accessing or disseminating data. This is especially true since computers and the Internet make data more immediately accessible, and more and more the state of scholarly communication allows for dissemination of data sooner in the research lifecycle. Thus, library science principles can help researchers with data or information organization or management needs (e.g., by applying descriptive metadata, archiving in a repository, or disseminating via an information resource or portal).
Examples of applying library science to research:
Given that the Libraries defined a number of Guiding Principles for our modified strategic plan in 2009 of:
And that these Guiding Principles informed our Overarching Goals of:
It can easily be seen that an important goal is for librarians to become more collaborative with our patrons and not tied to their locations. Through these actions one hope is to build relationships with disciplinary faculty and to develop a robust research and scholarship program. This FAQ, the Research Council, and the Libraries’ Research Department will assist in furthering these goals.
First of all, research isn’t meant to displace the service function in the role librarians play in helping people with information needs. For instance, it is quite possible that simply finding a metadata schema for someone in a specific discipline is all that’s needed. However, as faculty, librarians have a research role of investigating problems through the dissemination of knowledge. Thus, research will involve a methodological approach to identifying problems and formal sharing of results (i.e., publication) that comes about from investigation. Becoming further involved with the research aspects within various departments and research groups will help librarians transition into more of a collaborative role instead of a supportive role.
By attending meetings, callouts, seminars, etc., you can introduce yourself as a library science professor who is interested in participating in research. While the Libraries will continue to provide traditional services, it is important to distinguish research as an activity which involves grant proposals and sponsored funding to articulate, fund and support research. Initially, there may not be an obvious opportunity to contribute, but you can emphasize that the Libraries faculty interdisciplinary research initiative encourages such collaborations. Other researchers will understand when you say you are looking to collaborate as a research investigator—PI or co-PI (defined below).
Often, after a research seminar or presentation, you can follow-up by talking with a researcher further about information or data needs related to the project—starting out with a general discussion, a researcher is more likely to describe her or his needs. A more specific level of conversation may lead to topics such as archiving data, organizing information on a website/portal for dissemination, or teaching others to use such a resource. Ultimately, conversations like this are similar to those one would have in a subject specialist or liaison role, and can also lead to discussions related to reference or collection building or access.
In some cases, subject expertise will also be valuable, but not necessarily required. Your particular subject knowledge may further enhance the research by allowing you to relate further with the faculty and his/her research group or simply allowing you to more easily interpret the research actually being conducted.
Once a relationship has been established, the conversation may turn to submitting a proposal for a grant or other sponsored research. A simple rule of thumb, or shortcut for cutting to the chase regarding desired involvement on a proposal, is to be able to tell people what you have done that relates to this research, what you could contribute to a particular proposal, and what it would take to do that. A concise explanation of your previous accomplishments establishes that you have done work in this area. A compelling argument of how you can help make the proposal stronger is always worth listening to. And being able to qualify and quantify needs to accomplish outcomes demonstrates your grasp of the project and ability to plan.
At the point of developing a proposal, you should already have some idea of what you can do, through discussions of what your role would be, what research questions you can address and what outcomes you can help achieve. For the proposal, you will likely be asked to write a section, page or paragraph describing your effort and contribution. You will need to contribute to the budget if you are a co-PI. You may also be asked to review and contribute to other sections and/or the list of references used to support the document. You will likely be asked to provide a short c.v., a current and pending statement and/or a research statement (described below).
The Research Council (RC), made up of a diverse cross-section of the Libraries’ faculty in terms of rank, discipline, and research experience, and chaired by the ADR, is the Libraries’ organizational body for coordinating the research efforts throughout the Libraries. The RC is charged with creating policies and procedures related to research; promoting, supporting, and encouraging Libraries’ staff to participate in LIS and interdisciplinary research; and funneling feedback from Libraries’ staff to the ADR regarding issues related to research. Any Libraries’ staff member is encouraged to contact any member of the RC to bring agenda items to the RC.
The official charge of the Research Council (RC) is: chaired by the Associate Dean for research (ADR). Responsible for coordination of research activities for the Libraries, creation of policies and procedures for research activity including sabbatical, research leave, and research grants. Members to represent various research interests throughout the Libraries. Composition would include each of the faculty levels (full, assoc and asst prof), and a variety of specialized, subject librarianship. Total of six members plus AD. Initial members appointed for staggered one, two and three year terms. After initial period, terms will be three years. Associate dean for learning will be ex officio member of the RC serving as liaison between Learning Council and Research Council.
The role of the Research Department is to engage and encourage research programming for the Libraries, ensuring local practices meet relevant and appropriate policies and guidelines. The Associate Dean for Research (ADR) represents the Libraries to constituencies involved in research, both internally to the university (Sponsored Program Service (SPS), Office of the Vice President for Research (OVPR), other ADRs) and externally (funding agencies, program directors, corporate sponsored research, etc.). The ADR leads the effort to support, facilitate and promote Libraries' research through Research Council, AdCom, etc., including developing policy, guidelines and best practices. The Interdisciplinary Research Librarian (IRL) serves as liaison to and engages in interdisciplinary research programs and efforts, and serves as lead contact for Discovery Park. The IRL, because of its focus on sponsored interdisciplinary research serves as resource in this area to other librarians. The Data Research Scientist serves as initiator, collaborator and project manager for research efforts specifically focused on data, data collection, data archiving, etc., and likewise serves as resource in this area to other librarians.
Sponsored research is any research which requests and receives funding from an agency, foundation, corporation, etc. where auditable accounts must be maintained (by Sponsored Program Service (SPS) for the university). Awards are usually “extramural,” from outside the university, but may also be “intramural,” such as seed grants from a center (or ITaP), a sub-contract from another award, or allocated for research from elsewhere on campus, such as the Provost Office, Office of the Vice President for Research (OVPR), etc.
The Principle Investigator (PI) is the person who is ultimately responsible for the overall grant and project, including making sure everything is done right for submission, and all responsibilities of the grant are met (e.g., how money is spent, reports are sent, etc.). Sometimes there are restrictions on who can be a PI (faculty, PhD, etc.), but Libraries faculty can be PIs on NSF and NIH grants, in addition to IMLS, NARA, etc. And the PI has a line for salary on the grant (note, this is not additional salary, but creates a salary savings for the dean, who can use the money to hire temps or GSAs to cover other work as negotiated). A co-PI is usually someone who does significant work on the project and has a line in the budget (same as above), and who is responsible to the PI for doing her or his part. Senior personnel are often named on a grant as people who support a project and/or can be consulted as needed for specific problems. (This might equate to being named on a grant, but providing a service, not research.)
There are basically two answers to that question—one relates to abilities and your area of expertise, and the other to capabilities, such as how much time/resources you have to get involved.
In terms of your expert contribution, be realistic about what you can do. For instance, developing and implementing a thesaurus structure takes research to identify sources, effort to combine or reconcile terms, and time to extract pertinent language to be used for describing content in a web portal. Break down the goal (developing and implementing a thesaurus) into objectives (researching schemes, constructing specific thesaurus, testing results) that you are able to do. Will you need to consult with another specialist on thesauri in similar fields? Will you need to learn XML to construct the classification scheme, or will someone else do that?
Once you have a good grasp on what you can do (and to some extent, how), you should try to sketch out a budget, and both the Research and Business Offices can assist with this. It helps to identify the various activities you’ll need to accomplish (searching for thesauri, examining thesauri, examining portal documents, etc.) so you can get a feel for how much time would be needed to do the work. Once you have a “gross” estimate, you can decide if 5, 10 or 15% effort is needed for the work. If you decide on 10%, keep in mind that the time can be distributed over a week, month or year. For instance, 10% could roughly translate as three and a half weeks over the summer—if you can negotiate the release time, it might be useful to work with the PI who is also working on the project over the summer. As noted elsewhere, be sure to coordinate such schedules with a supervisor.
Keep in mind that while 10% represents half a day per week of a 40 hour week, as non-exempt personnel you still need to be able to spend the time it takes to get other work done. So, while you may spend 4 hours Friday afternoon at work doing research, if an annual report is due Monday, you may need to spend time over the weekend writing it.
As noted above, once you decide on your effort, it is most useful to convert this to a standard percentage—5, 10, or 15% (or maybe more, depending on the project or your availability and what you work out within the organization). Usually, the business office of the PI takes that percentage and uses a university program to calculate your “salary cost” (the program automatically accounts for fringe benefits). Sometimes you may be asked to produce a budget, in which case the Libraries’ business office can work up the figures.
In addition to salary, there may be other costs to put into a budget. One of them may be for technical personnel (so you don’t have to learn XML or java)—the data research scientist position is designed to help in such cases. Or you may want to utilize a graduate student assistant for a semester. The business office can help calculate these costs. Sometimes you may need to budget for travel—either to a location to do work, or to a conference to present on findings (dissemination of findings is often a requirement). Sometimes you may need to include supplies and expenses (but be sure to include any books or related materials). Rarely will you be afforded the opportunity to budget for computers, unless it is for a hardware that is needed to host a resource, or something similar—don’t presume you can’t get the latest laptop for your desktop, as most grants won’t pay for it.
The short curriculum vitae (c.v.) is a 1-2 page document listing your “credentials,” such as rank, title, education, publications, awards, etc. The current and pending statement is a list of grants to which you associated (either as PI, co-PI, collaborator, or senior personnel) and lists your percentage of obligation (e.g., 10% or <5%). Usually, the information is formatted to suit the granting agency needs, and the PI’s office (e.g., administrative assistant) will usually be able to work with you to ensure application requirements are met. A research interest statement is simply a statement that explains how your current research relates to the research being proposed.
You must coordinate with your supervisor, because you need to negotiate time or responsibilities to be able to take on most research projects. In particular, you and your supervisor must work out what will and won’t get done regarding your current roles and duties if you were to be required to devote additional time to a research project. Additionally, there would have to be additional discussions with others if some of your roles and duties would need to be covered by other staff.
Also, the ADR needs to be able to track various aspects of involvement for statistics and annual reports, and to avoid duplication or unintentional competition. Ultimately, the Business Office will have to know when there are funds involved, and of course, the Dean of Libraries will want to know the overall track of research—usually by contacting the ADR these two will be kept in the loop. The sooner you can alert the ADR in the research discussion, the better.
The Libraries’ Research Department is keep track of some of this information through the LCRISP (Libraries Current Research Information System Public). LCRISP keeps track of Project, Ideas, and Interactions related to collaborative research that librarians have submitted to the system. You can enter new data or view a list of current information in the system. This tool is also meant to assist in internal collaboration and referrals by making the information regarding Ideas and Interactions more “public.”
See: “Roles and Responsibilities of an investigator (PI or co‐PI) once a grant has been awarded.” [PDF]
Also see: “Roles and Responsibilities of Players Involved in Proposal Submission at Purdue.” [Web site] http://www.purdue.edu/sps/proposals/roles.html
*This glossary reflects the terminology used by Purdue University Libraries and is not an official set of standardized definitions*
A research center based out of the Purdue Libraries, the aim of the D2C2 is to address curation issues and work on problems related to unorganized, disparate, heterogeneous and distributed data, data workflow and environments.
A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing. Examples of data include a sequence of bits, a table of numbers, the characters on a page, the recording of sounds made by a person speaking, or a moon rock specimen. (OAIS Reference Model)
The OAIS entity that contains the services and functions which make the archival information holdings and related services visible to Consumers.( OAIS Reference Model)
The activity of managing and promoting the use of data from its point of creation, to ensure it is fit for contemporary purpose and available for discovery and re-use.” - Lord, Macdonald, Lyon & Giaretta (2004) "From data deluge to data curation. (Proceedings of the UK e-Science All Hands Meeting 2004, 31st August - 3rd September, Nottingham UK.)
Data Curation Profiles are designed to capture requirements for specific data generated by a researcher as articulated by the researcher him or herself. They are also intended to enable librarians and others to make informed decisions in working with data of this form, from this research area or sub-discipline. Data Curation Profiles employ a standardized set of fields to enable comparison. They are also designed to be flexible enough for use in any domain or discipline.
A profile is based on the scientist/scholar’s reported needs and preferences for these data. They are derived from several sources of information, including interviews, documentation, publications, or other relevant materials (http://www4.lib.purdue.edu/dcp/purpose)
Someone who engages in the act of data curation (see definition for 'Data Curation')
A data management plan is written at the beginning of a research proposal or project to document how the data that are produced by the research will be collected, archived, and shared. Many funding agencies such as the National Science Foundation require data management plans in grant proposals. (https://purr.purdue.edu/kb/AboutPURR/whatisadatamanagementplan)
The stages of a data lifecycle can be broadly categorized into Raw, Processed, Analyzed, and Published. For an example of a data lifecycle, see the following graphic.
Humphrey, Charles. “e-Science and the Life Cycle of Research” (2006)
Retrieved 4/20/10: http://datalib.library.ualberta.ca/~humphrey/lifecycle-science060308.doc
A set of services offered by the Research Unit
A package of files that include metadata describing content and content media files (work in progress).
Enabling the use and reuse of data (no official working definition found)
Under the digital humanities rubric, I would include topics like open access to materials, intellectual property rights, tool development, digital libraries, data mining, born-digital preservation, multimedia publication, visualization, GIS, digital reconstruction, study of the impact of technology on numerous fields, technology for teaching and learning, sustainability models, and many others.(“Why the Digital Humanities?”, Brett Bobley, Director, Office of Digital Humanities, National Endowment for the Humanities, 7/24/2008)
A term used to describe computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. (Wikipedia via the DCC)
In the context of the Institute, e-science encompasses the breadth of e-research activities applied across all disciplines (and including all interdisciplinary teams), but with a particular focus on the sciences. Its scope is not limited to the types of scientific research requiring very large-scale computation (i.e.,computational science or high-performance computing research) but includes all aspects and types of research that are performed digitally, such as data production and curation, social interaction, publishing and scholarly communication, and the use of physical space for specialized group activities. (ARL E-science Institute)
Term used in place of “e-Science” by ARL and also includes the following: We assume the continuation of the traditional library mission to collect, preserve, and make available to scholars a documented record of research, and to provide environments suitable for study and learning. (ARL E-science Institute)
eScholarship is the CDL Publishing Group’s open access scholarly publishing platform, providing digital publishing services to the University of California and delivering a dynamic research platform to scholars worldwide. (http://www.cdlib.org/services/publishing/escholarship.html)
A software platform used to create dynamic web sites for scientific research and educational activities. With HUBzero, you can easily publish your research software and related educational materials on the web.
Data about other data.( OAIS Reference Model)
The standards to be used for data and metadata format and content (https://research.hub.purdue.edu/overview)
The lead investigator and project manager for a grant-funded initiative. Is usually a tenured or tenure-track faculty, although exceptions may be made. PIs are usually responsible for ownership of a project within PURR. See FAQ for Librarians/Specialists and Data Projects.
A repository project co-sponsored by the Libraries, ITaP and the OVPR, PURR provides an online, collaborative working space and data-sharing platform to support the data management needs of Purdue researchers and their collaborators. The system is based on HUBzero technology.
Organizational unit within the Purdue Libraries responsible for furthering the Libraries research agenda. The Libraries Associate Dean for Research is the head of this unit. http://www.lib.purdue.edu/libraries/rsrch/
Scholarly Communication is the process of conducting research and sharing the results: from creation, to dissemination, to preservation of knowledge, for teaching, research, and scholarship. (http://www.lib.purdue.edu/scholarly/)
See FAQ for Librarians/Specialists and Data Projects.
SPS has responsibility for: Proposals, Award Management, Contract Negotiation, Data Access and Support Services, Research Administration, Regulatory Compliance, and Agricultural and International Programs.