WWW


You know by now that I love literature search tools (check out the small but growing “Links: Search” category in the right-hand column on the main page). I am strongly motivated by a desire to filter the huge and growing biological literature so that I can find the most relevant papers with the least amount of effort. Therefore, I’m always curious when I hear of a tool that purports to do an old task (searching Medline) in a new and unusual way.

A company called Cognition (“Giving technologies new meaning”) claims to enable the user to use semantic natural language processing to search the literature. Here’s their elevator pitch:

Cognition’s Semantic Natural Language Processing (NLP) technologies add word and phrase meaning and understanding to computer applications, providing a technology and/or end-user with actionable content based upon semantic knowledge. This understanding results in simultaneously much higher precision and recall of salient data within the universe of possible results. Cognition’s Semantic NLPTM makes technologies and applications more human-like in their understanding of language, thereby resulting in more robust applications, greater user satisfaction and new capabilities available for exploitation. On the Web in particular, powering applications with Cognition’s semantic understanding technology drives these applications ever closer to Web 3.0 (the semantic Web).

They have various commercial applications for sale but their semantic MEDLINE product is freely available on the web.

I’m not going to lie to you — it’s pretty great. You can ask the interface a real English question, like “Which genes are expressed in senescent fibroblasts?” and get real answers. (OK, to be fair, it’s fine with just “genes expressed senescent fibroblasts”, but I enjoy being able to use my native language when I talk to a computer.) I encourage you to play around with it; it’s fun.

One feature that seemed promising at first didn’t seem to work well at all. On the right-hand side of the search results screen are a series of dropdown menus; each menu contains several different meanings for keywords within the query. The idea is that one could refine a search by choosing the specific meaning of an ambiguous term, rather than having to slog through a search result that allows all meanings of the term in question. Unfortunately, this feature doesn’t deliver. Allow me to illustrate.

In the query example mentioned above, the dropdowns allowed for six meanings for the word “express”. The results had initially come back with one of these meanings (“6. to make a protein in bacteria or cells in culture”) already selected (I assume because this meaning gave the largest number of hits: ~12 papers, a totally manageable number, all of which were good answers to the question).

That definition is OK but I felt like another meaning (“5. to make a protein from a gene”) was slightly closer to the original intent, so I chose that definition and resubmitted. This culled the list down to only 1 paper, which wasn’t a very good match, and eliminated all the excellent answers from the earlier version of the search.

I can’t even begin to guess how the “sense” of a word is determined algorithmically by the Cognition software, but I do know that the outcome of my twiddling didn’t conform to my intuitive understanding of the words involved — which, after all, is the whole point of natural language processing. So I have to list this under “room for improvement”.

Which is all just to say that this search engine isn’t perfect yet — but please don’t let that stop you from checking it out. I like a lot of things about Cognition semantic Medline, and I’m going to be using it a lot.

What do you think? I’d love to hear about other people’s experience with the software.

(Hat tip to Code-Itch. Yes, I’ve had that post bookmarked since September.)

I’ve finally gotten around to registering this website at ResearchBlogging.org:

ResearchBlogging.org is a system for identifying the best, most thoughtful blog posts about peer-reviewed research.

Since many blogs combine serious posts with more personal or frivolous posts, our site offers a way to find only the most carefully-crafted posts about cutting-edge research, often written by experts in their respective fields.

Originally I resisted signing up, on the grounds that — after all! — every post on Ouroboros was about peer-reviewed papers (or at least papers from peer-reviewed journals; many reviews, despite their name, don’t undergo the same type of review as primary research papers). Hence, flagging posts about the peer-reviewed literature would be redundant. Simple inspection proves that this premise was false, so I’ve stopped worrying and learned to love yet another blog post aggregator.

The only difference to the user is that there will be a citation block at the bottom of any post about a scholarly paper (e.g., see here).

For those of you who are interested, there’s a ResearchBlogging.org link in the “Feeds etc.” section on the right-hand margin of the front page, which will take you to a tabulated list of all Ouroboros posts about scholarly papers. I am probably not going to go back and retroactively tag every qualified post, but as of now, future posts about papers will end up on that list.

I would also encourage readers of Ouroboros to go and explore ResearchBlogging.org‘s main site: The overall scientific caliber of the participating blogs is quite high, and it’s an increasingly user-configurable experience: You can use tags to read posts from the fields of greatest interest to you (e.g., biology). Likewise, you can subscribe to an aggregated RSS feed consisting of all posts in that field (again, e.g., biology). The system isn’t perfect yet — you can’t yet aggregate only those posts about sub-categories like “Biochemistry” or “Aging,” and they’ve made the weird decision to aggregate all posts tagged as Other, but in general it’s a nice service and I think readers of this blog will find it useful.

Lately I’ve been frequenting The Life Scientists room at the social networking/ microblogging/ forum site FriendFeed. I’ve been getting a lot out value out of it (I even used it to research an article I’m writing about social networking in science), so I wanted to mention it to other biologists who might be looking to take the plunge into the Web 2.0 world.

The room’s population is highly enriched in bioinformaticists, computational biologists and open-science advocates, so if those are interests of yours then you’ll find the discussions especially interesting. But if you’re not one of those things, don’t let that stop you; I’m hoping to see more experimental biologists join in. There are lot of conversations going on about new tools for science, but sometimes I feel like there aren’t enough experimentalists taking the opportunity to find out about the latest developments.

At the moment there’s no FF room devoted to biogerontology, but I suspect there’s not a big enough population of likely participants to give such a room momentum. Besides, we’re part of the larger edifice of biology; I’d rather talk with lots of different kinds of scientists than actively seek out isolation (and thereby risk becoming provincial).

(I’m also helping FriendFeed debug a feature that allows automated republishing of blog posts to twitter, so I thought it would be appropriate to make an entry about FriendFeed while I test the feature’s settings.)

As I mentioned earlier this week, I’m collaborating on a series of articles about the future of scientific communication. For the next piece (as of this moment, very much a work in progress), we’ve taken as our (loose) theme the role of the “interactive web” (FriendFeed, Twitter, blogging, social networking) in scientific publishing, communication and in the doing of science itself.

We’re especially interested in finding real-world examples of ways that the social web has been used by scientists to initiate and/or implement collaborations, but really we’d like to hear about any cases in which Web 2.0 tools have been used in scientific communication, writ large (manuscript preparation, disseminating results shared at a conference, etc.).

I thought it would be deliciously recursive to use a blog to solicit such examples — I’ve also been posting in The Life Scientists room at FriendFeed, tweeting on Twitter, and preparing a post for the relevant forum on Nature Network.

So, if you have a scientific Web 2.0 story you’d like to share: please post a Comment here, get in touch using one of the networking systems mentioned in the last paragraph, or if you’re feeling traditional email me.

I’ve added a category to the right-hand column of the Ouroboros main page: “Links: Search”.

Recently I’ve been thinking a lot about ways to filter the literature. For scientists attempting to keep abreast of relevant knowledge, efficient search is necessary — though certainly not sufficient; even the best modern search tools either yield an unacceptably high false-positive rate (i.e., results that aren’t really of interest) or require a large investment of time and effort in order to tune searches for the needs of a specific user.

Search is improving, however, and I enjoy playing with the newest technologies. I’ve been a longtime adherent of HubMed, which is simply a better skin for PubMed. Lately I’ve been enjoying the intuitive interface of novo|seek, which allows the user to rapidly configure their search based on a group of related concepts, which the search engine delivers automatically based on the initial query terms.

So I’ve started a list of useful or otherwise noteworthy search tools. Check them out, share your thoughts, and let us know whether you’ve got a favorite that’s not already listed.

Over at her new consolidated blog I was lost but now I live here, Shirley Wu has a thoughtful piece about the coming changes in academic publishing, the institutional disincentives against engaging in barrier-free publishing (e.g blogging), and one clever way of breaking what she terms “the silicon ceiling”. It will be of particular interest if (like me) you’re interested in promoting and participating in open science.

I echo many of Shirley’s thoughts regarding the attitude of older/more established scientists about “frivolous” activities like blogging — my last two advisors are benignly mystified by my involvement in this activity, and I suspect that (if they thought I’d listen) they would advise me to spend “more time at the bench” (as though the two things traded off in that simple way). Like many science bloggers, I blog in part because I feel like it will do my field some good, and in part because I feel some compulsion to do it. Of course I hope that someday it will “pay off” in a professional sense but, as Shirley points out, the Powers That Be™ are pretty happy with the way things are, and they’re generally suspicious of activities that don’t fit into the standard model of careerized, professional science.

In the meantime, some things will continue to change slowly, one retirement party at a time.

As I mentioned, I spent most of last week and weekend attending two unconferences, BioBarCamp and Scifoo.

By their very nature, unconferences tend not to converge on a single topic; over the past week, I paricipated in discussions whose topics ranged from the importance of database annotation to how mushrooms could save the world to the current technical considerations involved in settling Mars. Nonetheless, even in the anarchic environs of an unconference, self-reinforcing trends arise over the course of the discussions, and themes do emerge (though each participant might perceive different patterns and come away with a completely different report of an event’s most important themes).

For me, the most powerful and important theme emerging from the week was the idea of “open science.” This term refers not to any one initiative or project, but the cloud of concepts that includes open access publication, use of open source solutions (especially for protocols and software), commons-based licensing, and full publication of all raw data (including “failed” experiments). It also incorporates more radical ideas like opening one’s notebook in real time, prepublishing unreviewed results, replacing current models of peer review with annotation and user ratings, and redesigning (or ditching) impact factors. The world implied by these concepts is one of radical sharing, in which credit still goes where credit is due but by dramatically different mechanisms.

Open science isn’t so much “pay it forward” (though there is a bit of that) as an effort to create a (scientific) world in which no one is paying at all, a world in which there’s no incentive to withhold or protect ownership of data. The science fiction writer Iain M. Banks once wrote that “money implies poverty” — indeed, many of the current models of data ownership and publication, and their accompanying “currencies” of proprietorship, prestige and closed-access publication, imply a world in which data is scarce and must be hoarded. But data is not scarce anymore.

Given a suitable set of one-to-one and one-to-many agreements between the stakeholders, then, the benefits of sharing could come to outweigh any conceivable advantage derived from secrecy. Perhaps “open science” could be defined (for the moment) as the quest to design and optimize such agreements, along with the quest to design the best tools and licenses to empower scientists as they move from the status quo into the next system — because (and this is very important) if it is to ever succeed, open science has to work not because of governmental fiat or because a large number of people suddenly start marching in lockstep to an unnatural tune, but because it works better than competing models. Proof of that particular pudding will be entirely in the eating.

During the meetings, I met quite a few people involved in this mission, and I want to mention their organizations and projects here:

  • OpenWetWare, “an effort to promote the sharing of information, know-how, and wisdom among researchers and groups who are working in biology & biological engineering” – including tools for protocol sharing and open notebooks;
  • Epernicus, a social networking site for scientists that automatically connects peers based on institution, history, skills and research focus;
  • JournalFire, “a centralized location for you to share, discuss, and evaluate published journal articles” (still in beta);
  • Science Commons, the scientific wing of the Creative Commons, which “designs strategies and tools for faster, more efficient web-enabled scientific research. We identify unnecessary barriers to research, craft policy guidelines and legal agreements to lower those barriers, and develop technology to make research data and materials easier to find and use.”;
  • Nature Precedings, “a free online service launched in 2007 enabling researchers in the life sciences to rapidly share, discuss and cite preliminary (unpublished) findings”; and
  • UnPubDatabase, a discussion of ways for scientists to rapidly and efficiently publish “negative” results, both to allow re-analysis of data and to prevent the scientific community from following the same blind alley more than once.

Academic scientists aren’t the only ones to potentially benefit, by the way — pharmaceutical companies routinely run the same experiments as one another and often find that expensive trials could be avoided if they’d only had access to data mouldering in a competitor’s vault — so open science can benefit the profit sector as well, and there are already plans underway to make that possible.

I’m enthusiastic about bringing open science into my own project and my own laboratory — indeed, in a fit of post-conference ecstasy I basically put myself on record promising to do so. For reasons that have everything to do with available energy levels, I suspect that full-blown openness is probably easier to accomplish when it’s present from the beginning of a project, so I’m especially eager to put these ideas to the test in a large-scale collaboration that is just getting underway. I have no idea how it will go — I expect to meet resistance, especially to the more radical ideas like open notebooks — but it’s nonetheless an exciting time. Will I be able to convince my collaborators to try out open science approaches? Once implemented, will they work? I don’t know, but I am convinced that it’s a hypothesis worth testing.

« Previous PageNext Page »

Follow

Get every new post delivered to your Inbox.

Join 43 other followers