Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

203 add nsf curation project #242

Merged
merged 3 commits into from
Dec 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/Research/datacuration_about.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ headless = true # This file represents a page section.
active = true # Activate this widget? true/false
weight = 100 # Order that this section will appear in.

title = "Synthetic Biology Data Curation"
title = "Accelerating Synthetic Biology Discovery through Integrated Curation"

# Choose the user profile to display
# This should be the username of a profile in your `content/authors/` folder.
Expand Down
7 changes: 4 additions & 3 deletions content/authors/Chunxiao Liao/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ organizations:
#bio: My research interests include distributed robotics, mobile computing and programmable matter.

interests:
- Synthetic Biology
- Computational Biology
- Synthetic Biology, Computational Biology
- Software Engineering
- Machine Learning

education:
Expand Down Expand Up @@ -95,6 +95,7 @@ user_groups:
display_groups:
- SynBioHub Tool
- SBOLExplorer
- SeqImprove
---

Chunxiao Liao is a first-year doctoral student in the computer science department at CU Boulder, studying under Dr. Chris Myers. Before arriving in Colorado, Chunxiao completed a Master’s degree in Computer Science from the Rice University. Her research is centered around enhancing the usability and discoverability of synthetic biological data to adhere to the FAIR principles for data sharing, which stands for Findable, Accessible, Interoperable, and Reusable. More specifically, her research endeavors to apply techniques from Software Engineering, Bioinformatics, and Data Science to create an integrated curation workflow and a corresponding search methodology.
Chunxiao Liao is a second-year doctoral student in the computer science department at CU Boulder, studying under Dr. Chris Myers. Before arriving in Colorado, Chunxiao completed a Master’s degree in Computer Science from the Rice University. Her research is centered around enhancing the usability and discoverability of synthetic biological data to adhere to the FAIR principles for data sharing, which stands for Findable, Accessible, Interoperable, and Reusable. More specifically, her research endeavors to apply techniques from Software Engineering, Bioinformatics, and Data Science to create an integrated curation workflow and a corresponding search methodology.
6 changes: 4 additions & 2 deletions content/authors/NSF_Data_Curation/_index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
# Display name
title: Synthetic Biology Data Curation
title: Accelerating Synthetic Biology Discovery through Integrated Curation

#Use 1 for PI, 100 for Current Postdocs, 200 for current phds, 300 for current masters, 400 for current undergrads, 800 for alum postdocs, 810 for alum phds, 820 for alum masters, and 830 for alum undergrads, 900 for tools, 1000 for projects, 900 for tools, 1000 for projects
weight: 1000
Expand Down Expand Up @@ -58,4 +58,6 @@ user_groups:
- SBKS Project
---

Synthetic biology designed systems have many applications in areas including environmental, manufacturing, sensor development, defense, and medicine. However, currently the progress and usefulness of synthetic biology is impeded by the time required for literature studies and the replication of existing but poorly documented work. The Synthetic Biology Knowledge System (SBKS) project endeavored to address these challenges by integrating data from parts repositories with information extracted from literature into a unified knowledge system. However, this form of post-hoc curation requires the extraction of knowledge from manuscript and supplemental text files after publication by curators separate from the original authors. To handle large amounts of data, machines are used to scour free text and attempt to recognize key words and work out their meaning from context. This tests the limits of named entity recognition and entity classification. Additionally, it leaves ambiguous entities that only the original authors might disambiguate. For example, yeast may refer to many different strains of yeast. Furthermore, the SBKS project also extracted sequences provided as supplemental information in publications. However, these sequences, even when they are provided, are typically poorly annotated, incomplete, and provided in non-machine readable formats. Taken together, the SBKS project demonstrated that reconstruction of this important design information through post-hoc curation is extremely noisy and error prone.
Synthetic biology designed systems have many applications in areas including environmental, manufacturing, sensor development, defense, and medicine. However, currently the progress and usefulness of synthetic biology is impeded by the time required for literature studies and the replication of existing but poorly documented work. The Synthetic Biology Knowledge System (SBKS) project endeavored to address these challenges by integrating data from parts repositories with information extracted from literature into a unified knowledge system. However, this form of post-hoc curation requires the extraction of knowledge from manuscript and supplemental text files after publication by curators separate from the original authors. To handle large amounts of data, machines are used to scour free text and attempt to recognize key words and work out their meaning from context. This tests the limits of named entity recognition and entity classification. Additionally, it leaves ambiguous entities that only the original authors might disambiguate. For example, yeast may refer to many different strains of yeast. Furthermore, the SBKS project also extracted sequences provided as supplemental information in publications. However, these sequences, even when they are provided, are typically poorly annotated, incomplete, and provided in non-machine readable formats. Taken together, the SBKS project demonstrated that reconstruction of this important design information through post-hoc curation is extremely noisy and error prone.

This project is founded by National Science Foundation Grants No. 2231864
2 changes: 2 additions & 0 deletions content/authors/SBKS/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,5 @@ user_groups:
---

The scientific challenge for this project is to accelerate discovery and exploration of the synthetic biology design space. In particular, many parts used in synthetic biology come from or are initially tested in a simple bacteria, E. coli, but many potential applications in energy, agriculture, materials, and health require either different bacteria or higher level organisms (yeast for example). Currently, researchers use a trial-and-error approach because they cannot find reliable information about prior experiments with a given part of interest. This process simply cannot scale. Therefore, to achieve scale, a wide range of data must be harnessed to allow confidence to be determined about the likelihood of success. The quantity of data and the exponential increase in the publications generated by this field is creating a tipping point, but this data is not readily accessible to practitioners. To address this challenge, our multidisciplinary team of biological engineers, machine learning experts, data scientists, library scientists, and social scientists will build a knowledge system integrating disparate data and publication repositories in order to deliver effective and efficient access to collectively available information; doing so will enable expedited, knowledge-based synthetic biology design research.

This project was founded by National Science Foundation Grants No. 1939892