Skip to content

Jen Final Project idea

October 4, 2011

For my final project I was thinking about creating some sort of nexus for scientific databases and web applications. With the advancement of DNA sequencing and other technologies biological datasets are getting larger and larger. Analysis becomes a problem particularly for biologists with little computer savvy. There are a large number of scientific databases and tools available for use in a lot of different areas but other than a few very large organizations such as the NCBI, a subset of the NLM, many of these tools are not well known outside of a core group of researchers and can be difficult to find for researchers working in other fields. I would propose to build a site that provides a guide to different databases and tools that are specific to areas of research therefor providing access to a wide range of tools in one spot.

5 Comments leave one →
  1. October 5, 2011 8:24 pm

    Neato. I have been wondering how closely library classification systems match scientific classifications.

    How many of the databases are available to nonsubscribers? This could be interesting. A friend works for NMFS and I wonder if his data for fish counts and wave patterns, for example, is available to someone studying, I don’t know, garbage in the ocean or something.

  2. October 6, 2011 7:13 am

    Well in my experience a lot of these tools are available free of charge. They are typically made by scientists for scientists and I am pretty sure that if a paper is published describing a tool or a database some version has to be freely available. My current lab uses a tool that has both a free version for academic institutions and a paid version. I believe the paid version is more powerful. The tools that i am thinking about are less about classifying data and more about manipulating data. I am thinking about sequence data like DNA and protein or microarray data (this looks at expression levels of thousands of genes in a cell at a particular time). because this information is required to be deposited in freely available public databases and also what I am most familiar with. For example there are a lot of different web tools available for comparing different protein sequences or identifying domains in a proteins.

    For other types of data like what you mentioned I don’t believe there are the same types of public depositories. Typically for smaller scale data, it becomes available when it is published then people will be able to access what is published and if they want more information will contact one of the authors directly.

    In terms of classification of scientific data, it is actually kind of a big mess. There is no standard for metadata or much curation of data that is deposited in one of these publicly available databases. Basically the user will put in whatever metadata they want or have along with a citation (though many lack citations) and then the sequence. So there are some data that have a lot of metadata and other that have virtually none. Perhaps that would be a good addition to my page, a little guide describing a minimum of metadata that should be attached to a deposited sequence. This is actually currently a bit of an issue in many different research fields.

    In addition to spotty metadata there are also issues with nomenclature. There are biological entities that have multiple names and which one is used depends on the field of research and also nomenclature isn’t always very descriptive. In fruit fly genetics they always like to use clever names for genes one of my favorites has always been sonic hedegehog named after the sega character whereas in other fields they are much more methodical and name things like c-Jun activated kinase which is really descriptive about the protein. Anyway that was probably a little too indepth about nomenlature but it is kind of fun 8)

    Thanks for the comment Justine!

  3. derek permalink
    October 13, 2011 1:52 pm


    This has the feel of something that could get out of hand in terms of scope and scale. Not that I am in any way against the idea, it just seems (maybe my ignorance talking here) that this could turn into a life’s work project of systematizing scientific data through evaluation of tools. Can you give some thought to narrowing the topic such as: “use these tools for this, those for that” kind of a high-level pathfinder?

  4. October 17, 2011 3:07 pm

    Hey Derek,

    It definitely could turn into a life’s work! In fact there are some bioinformatics labs that devote a significant amount of research time to classification of scientific data.

    I probably made it sound too broad in my description and then particularly in my follow-up comment. I was thinking more along the lines of what you wrote “use these tools for this, those for that” kind of site. I would probably have three or four sections for tools separated by type of input data such as, protein, genomic, ribosomal RNA and microarray data. Sometimes it can be difficult to find appropriate tools for newbies to a field.

    The idea about metadata standards just kind of popped into my head after reading Justine’s comment. In reality what I had in mind was more of an additional guide that would provide links to or references for metadata standards that have been published in different fields not for me to write such standards de novo.

    Let me know if that sounds better.


    • derek permalink
      October 18, 2011 10:59 am

      Great! A nice narrow path so you can get something done and have good evidence for the e-portfolio.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: