There have been other comments about Wolfram Alpha and it’s support for Chemistry (1,2 and others) but I have remained rather quiet until now about my experiences with Alpha for a couple of reasons. First of all I’d rather let the service settle down a bit before poking at it too hard. My experiences of going live with ChemSpider were definitely that it takes a while to stabilize the system and address some of the earliest feedback. Also, knowing that I would be at Scifoo and aware that Theodore Gray would be there I had hoped to see Alpha in action. I wasn’t disappointed. Yesterday Theodore drove the system in front of an audience including a number of interested scientists, members of Google and, Peter Murray-Rust and myself from Chemistry. Theo had no fear…essential for live demos. He was asked questions and he did took the plunge, did the search and with the rest of us celebrated a successful search, a weird result and just plain wrong. It was ALL good. I am impressed. I am impressed by that they are out to achieve with Wolfram Alpha. I am convinced that what they are doing with Alpha will contribute to science and mathematics in general and that Chemists will be using this system when they have more awareness of it.

For a general intro to Alpha see the presentation here.

So, some examples of interesting searches:

1) A guy in the room had asked the question “What is the largest land mammal?” and had not received an answer a few weeks earlier. Now Theo posed that question and got the answer here. Nice! Now, I took that to mean that they were keeping logs of failed queries and tweaking…confirmed by Theo. VERY nice.

2) Peter Murray Rust had previously blogged about bad results from his searches (searching on dibromoethane for example). When he repeated his searches in the session hosted by Theo he acknowledged that he was pleased that they had fixed the issues he had previously blogged about. This is how modern systems should be …moving quickly.

3) Searching on names…for example, what is the number of people with my name…my spelling is Antony NOT Anthony. See here for the results.

4) What is the return per employee for Google versus IBM. It’s in this query: http://www35.wolframalpha.com/input/?i=GOOG+IBM

5) What are the chemical structures of Taxol? Methamphetamine? Cholesterol? Buckminsterfullerene? You get answers for all. The organic molecules all give images of chemical structures. The connections in all cases are correct but I see no evidence of stereochemistry anywhere across the chemical structures on the database..it doesn’t mean it’s not there but I couldn’t find it.

So, for chemistry, am I impressed. Yes I am. I’m not worried right now that Alpha is not dealing with stereochemistry…I am sure they will layer that on later. It is clear based on most of the results that I have seen that there is some GOOD curation of the data going on. According to Theo there are chemists on staff and they are curating the data coming in. Hallelujah! If you look in the Source Information for Taxol you see a LONG list of sources of chemical source information and the primary source is the Wolfram Alpha Curated Data.

alpha-data There is much that can be done to help Wolfram Alpha to have better Chemistry. They have a HARD job ahead of them if they are going to sample the Public Databases to grab quality chemistry. It’s in there for sure but it’s hard to find. What could come out of ChemSpider and Wolfram Alpha working together?

1) If we could get the list of “compounds” in Wolfram Alpha then we can provide chemical compound connection tables with all necessary stereochemistry etc.

2) When we pass back the compound list then we can pass back ChemSpider IDs and get them listed as identifiers alongside the PubChem CID. In theory it would be good to get these linked back to ChemSpider so that a user can come and find associated articles, analytical data, the wikipedia article, predicted and experimental properties and so on. This is where ChemSpider’s integration would be of value.

3) There is an opportunity to expand the chemistry in Wolfram Alpha by passing a subset of ChemSpider compounds to be added to Alpha. Certainly I don’t think that Alpha should host all 21.5 million of our compounds for the reasons I have enumerated many times on this blog. See my last post about the 54 versions of the Taxol skeleton…there should be only one Taxol. But, there may be a way to subset “important chemistry” and get it into Alpha. OR, maybe they do want it all?

There are clearly opportunities to help expand the chemistry and I hope we have the chance. I think Alpha is incredibly ambitious. But why not be ambitious? ChemSpider was ambitious too and look what we have done with three servers in a basement…it’s a whole lot less resources that Wolfram are throwing at Alpha. I want them to be successful…a computational engine for the public. Why not….so many of us are asking questions using search engines right now and can’t get anywhere near an answer…

Reblog this post [with Zemanta]

Buy me a Coffee

Let’s start off where I intend to finish. Bigger does not necessarily mean better. A large database of unique chemical entities does not necessarily mean a good database and accurate chemical representations of chemical entities can be pretty hard to find.

Few people realize how these simple statements are impacting the quality of what’s available online for chemists to use and how curation of data must occur in order to improve what’s available.

Now…what’s the basis for me to initiate this discussion and WHY would I prefer that ChemSpider was actually a smaller database?

Today on CHMINF Steve Heller posted the following review:

“From:http://www.ala.org/ala/mgrps/divs/rusa/sections/mars/marspubs/marsbestfreewebsites/marsbestfree2009.cfm

Title: PubChem
URL: http://pubchem.ncbi.nlm.nih.gov/

PubChem is a search tool for chemical information, divided into three areas: Compounds, Substances, and BioAssays. Full entries provide detailed information with the most basic information - a general description, the molecular weight and formula, the structure, plus a Table of Contents (ToC) for the full entryall easily found above the fold. Use the ToC or scroll down to retrieve more advanced information, such as bioactivity results, synonyms, chemical actions, detailed properties, and more. Each module is fully interlinked with the other sections of PubChem as well as resources in ToxNet and PubMed, providing full access to toxicology resources and the medical literature, and allowing users access to as much or as little of the chemical information as they need.

Author/Publisher: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
Date reviewed: February 16, 2009

PS. PubChem now has 37,326,949 DIFFERENT structures”.

Bob Buntrock made the following statement “Re the PS below, I find it difficult to believe that PubChem has 37.3 million “different” compounds.  The figures from the CAS website show 48 million organic and inorganic compounds which excludes sequences but includes polymers, alloys, coordination compounds, minerals, and mixtures. Since PubChem aims to cover “small molecules”, it would seem that many compounds in these last 5 categories would not be present.  Therefore, I assume that a significant number of the 37.3 million PubChem compounds are redundant.” All hell broke loose with lots of posts discussing the uniqueness of chemical entities and the fact that PubChem compounds WERE unique. Okay, I’m not going to argue this for the moment but I am going to agree with Bob that a significant number of the compounds are likely redundant. It is ALSO true of ChemSpider. Why?

I could write a multipage blog but I have already discussed this issue many times on this blog but am clearly failing to communicate the issue. I’ll try again but I reference you to previous posts about Taxol (1,2,3), Vancomycin (4) and Ginkgolide B (5,6). I suggest you read these earlier posts but will try and explain again anyway.

Some general statements. Many complex chemical compounds, especially natural products, have timelines. A compound when initially elucidated can give the connectivity only and get reported. Then stereochemistry might be layered on later, and reported. Then stereochemistry might be adjusted, and reported. Through this whole timeline the compound might be referred to by a particular chemical name….let’s call it Afonwenium. So, based on the timeline for this molecule there can be anywhere between 1-4 “versions” of the structure by that name. They are all unique chemical entities but the “final structure” is the one that people will want. It’s the one that should be represented on Wikipedia, the one that should correctly be drawn in all publications following the final elucidation report and assertion of structure and the one that should be found on many of the “reference” databases such as KEGG, DrugBank etc.

Search Taxol on ChemSpider and Taxol on PubChem and compare the number of structures you get. I judge that there are MANY unique chemical entities on PubChem that are MEANT to be Taxol but are not. And I don’t mean the ones that are named as “Taxol derivative”, I mean the ones that may have the SAME molecular weight, formula and connectivity but have DIFFERENT stereo - no stereo, MULTIPLE partial stereo and MULTIPLE full stereo. These issues exist for compounds like Ginkgolide B and Vancomycin and many more structures.  There is of course only one Taxol, a compound registered by Bristol Myers Squibb and asserted to have a specific constitution.

Just out of interest lets see how many compounds are on ChemSPider with a specific skeleton (ignoring stereo).

There are 54 compounds with the skeleton of Taxol: http://www.chemspider.com/InChIKey/RCINICONZNJXQF. These are all UNIQUE chemical entities but there are C-11 and C-14 labeled, Deuterium and Tritium labeled and so on. But there are over 30 compounds that have the same skeleton, without isotopically labeled sites, that still have the Taxol skeleton. Maybe some of these are meant to be Taxol with different stereochemistry but I judge that MOST of these are meant to be Taxol and are labeled as such but differ in terms ofno, partial and full stereo at least. This is ONE example. To Bob’s question…is this redundancy? I say yes. How does this get solved? Curation will do it but it’s expensive and time consuming and the only way forward in my judgment is to crowdsource it. This problem is not going away anytime soon in PubChem or ChemSpider. We HAVE curated the name associations and removed the name of Taxol for all skeletons that are not what is the asserted form of Taxol. But the structures do remain on the database and link back to the original sources. We will be working on ways to show on every search that there are associated skeletons, compounds related by isotopic labeling and the status of no, partial and full stereochemistry. All to come…

The ongoing “Bigger is Better” arguments for Public Compound Databases is irrelevant at this point in my opinion. We can add 50 million new compounds with a simple enumeration exercise but woulf it bring any value? I say no. We can add virtual libraries from a number of our collaborators but I judge it to be of very limited value. The value of the Public Compound Databases are in what they connect to and whether there is an answer to a question at the end of the chain. If I search on a chemical and find it on ChemSpider but I cannot find a vendor for it, no analytical data, no properties of value, no manuscripts, no patents linked etc then I have just done a search, found it on ChemSpider but have derived no value. We are working on increasing the VALUE of our content. Linking compounds to rich data sources, layering on additional properties, links to papers, blog entries and discussions and so on. If the result of a search is a hit but with no value who cares. If the result of a search is a hit but with links to the wrong information that’s worse. If I ask the question “What is Taxol” and get one hit I need it to be right. If I ask the question and get tens of hits now what?

Curation has been underway for 2 years. We’re not finished. Its a massive task. In reality it will NEVER be finished - new chemistry comes in every day and more information gets associated. We don’t have answers to all of the issues that exist around these diverse datasets but we are not naive in our understanding that our database is polluted with issues inherited from many other sources. We have marked tens of thousands of structures for deprecation. We have likely added information into PubChem that has contributed to the issue of data quality. But we are working on it.

Meanwhile errors that exist in PubChem are proliferating. A simple example is that of methane in PubChem that I have blogged about many times…one example here. Here are some of  the names associated with the structure of methane on PubChem: 1,3-DICHLORO-PROPAN-2-ONE, diamond, charcoal and many tens of other incorrect names.

The National Cancer Institute’s Chemical Structure Lookup Service has over 46 million unique chemical entities and they have offered a series of services to search by InChI, name and many other queries. A posting to CHMINF outlined the service

“Chemical Identifier Resolver (beta):
—————————-
http://cactus.nci.nih.gov/chemical/structure

This service is a resolver for different chemical structure representations and identifiers, including those that do not carry any information about the structure itself. For instance, it can work as a Standard InChIKey Resolver, an NCI/CADD Identifier Resolver or a Chemical Name Resolver. The service also allows one to convert a given structure identifier into another representation or structure identifier.

Representations/identifiers supported are: Standard InChI/InChIKey, NCI/CADD Identifiers (FICuS, FICTS, uuuuu), SMILES, SDF, names, and a few other types of
IDs.  See the web page for more information.

For those identifiers that require lookup, the underlying database currently contains about 67 million unique structure records, from which the respective Standard InChIKeys and NCI/CADD Identifiers have been calculated. For lookup by chemical names, 68 million names associated with 16 million unique structure records are currently available in the database. The database continues to grow.

Closely related are the new capabilities of resolving/converting chemical structure identifiers by simply using a URL adhering to the following scheme: http://cactus.nci.nih.gov/chemical/structure/”structure identifier”/”representation”[/xml]

We just list a few examples here that should give you an idea of what’s possible with this service.  For more detailed explanations, see the above web page.

Example: Standard InChI for chemical name string “aspirin”: http://cactus.nci.nih.gov/chemical/structure/aspirin/stdinchi

Example: Standard InChIKey of “ethanol” specified as SMILES string “CCO”: http://cactus.nci.nih.gov/chemical/structure/CCO/stdinchikey

Example: Unique SMILES string of chemical name string “benzene”:http://cactus.nci.nih.gov/chemical/structure/benzene/smiles

Example: SD File for chemical name string “morphine”:http://cactus.nci.nih.gov/chemical/structure/morphine/sdf

Example: Chemical names for Standard InChIKey “InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N” (Standard InChIKey of “ethanol”): http://cactus.nci.nih.gov/chemical/structure/InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N/names

Example: Synonyms for chemical name string “aspirin”:http://cactus.nci.nih.gov/chemical/structure/aspirin/names”

Unfortunately polluted names are finding their way across all of these databases which is why a lookup on methane gives us: http://cactus.nci.nih.gov/chemical/structure/methane/names including in the list:
1-Chlorobenzylethyl-3,5,7,9,11,13,15-heptaisobutylpentacyclo[9.5.1.1(3,9).1(5,15).1(7,13)]octasiloxane, mixture of isomers
673323_ALDRICH
PSS-[2-[(Chloromethyl)phenyl]ethyl]-Heptaisobutyl substituted
675342_SIAL
(2R,3R)-Butanediol bis(methanesulfonate)

and DIAMOND…
(2R,3R)-Butanediol dimesylate

The CAS database is highly curated, not without errors, and built up using robots and eyes. Public Compound Databases are built with the best intent and are useful. But they are not curated and are polluted. Bigger does NOT mean better and care is warranted. ChemSPider will likely stay smaller that many of the other Public Compound Databases moving forward as we remain focused on adding value and addressing the issues of inherited and future quality. It’s a long journey…

Buy me a Coffee

ChemMobi, an application written by James Jack from Symyx has finally been posted to the App Store and can be downloaded, for free, and enable your iPhone to search both Symyx’s Discovery Gate and ChemSpider (using our web services). I’ve posted before about the work done by James (1,2) and it has now come to fruition with the first version of ChemMobi. If you are an iPhone user try it out and give us your feedback!

chemmobi

Reblog this post [with Zemanta]

Buy me a Coffee

Since it was easy to do we will bring back ChemSpider online in Read Only mode for you to ccontinue using if you need it. This will mean that the web services will all be returned also. The only things that will not be enabled are deposition, annotation and curation. In order to block these we have disabled login. While it will be possible to add comments please note that these will be dealt with on the RSC system following rollover to their systems.

Buy me a Coffee

Over the past few months there have been a number of discussions in the blogosphere about the future importance and value of classical Impact factors but for now they remain the industries primary measure of, well Impact. With that in mind it was good to see this announcement from the RSC regarding recently published impact factors for Chemistry related journals. Nice…it’s good to know that we joined an organization that continues to focus on its core mission and strengths and be acknowledged by the readers and the industry for its efforts.

25 June 2009
Publication of the 2008 impact factors, calculated by ISI, once again brought good news for authors and readers of RSC journals. 

Nearly all the RSC journals increased in impact factor, immediacy index and article influence, with an impressive average impact factor increase of 8.2%. Overall, the average impact factor for the RSC portfolio now stands at 4.7, equal to that of the ACS collection. 

RSC journals feature in the top 10 rankings (by impact factor and immediacy index) in 6 of the 7 core chemistry categories* as listed on ISI, and of the top 100 chemistry journals, ranked by impact factor, 15 are from RSC Publishing. 

Three individual journals to highlight include: 

ChemSocRev - with a 33% increase, its impact factor now stands at 17.419, confirming its position as a leading international chemistry journal. This flagship journal now contains the greatest number of chemical reviews published in 2008 of any chemistry review journal - making it truly, first in its class. 

Lab on a Chip - celebrates a 28% rise taking its impact factor to 6.48, placing it within the top ten journals in the multidisciplinary chemistry category. 

PCCP - rises over 20% to its highest ever value of 4.06. Additionally, its new immediacy index (0.81) remains the highest value for any journal publishing general primary research in the fields of physical chemistry and chemical physics. 

Editorial Director, Dr James Milne, reflected on the outstanding performance of the RSC journals, ‘the impressive increases in impact factor for the RSC portfolio of journals are a direct reflection on the world class authors who regularly publish in these prestigious international titles. Over the last five years, RSC journals have attracted a significant increase in submissions, with nearly 60% more material published during this same short period.’ He continues, ‘to provide more articles and also higher quality articles, is a clear reflection of the dedicated support the journals receive from authors, editors and referees throughout the world; for this contribution, I would like to sincerely thank all the scientists involved.’ 

“a clear reflection of the dedicated support the journals receive from authors, editors and referees throughout the world”

Other RSC highlights include: 

Analyst- impact factor now 3.761 (32% rise over 3 years) 

ChemComm- impact factor now 5.340 (21% rise over 3 years)

CrystEngComm- impact factor now 3.535, and leading journal for immediacy (0.684) in the crystallography category

DaltonTransactions- highest ever impact factor of 3.580 (an 11.5% increase)

Faraday Discussions- impressive impact factor of 4.604

Green Chemistry- impact factor of 4.542, and the leading journal in its field

JAAS- a 20% rise in impact factor to 4.028 (confirming its position as the leading journal in atomic spectrometry)

Journal of Environmental Monitoring- impact factor now 1.989 (26% rise over 3 years)

Journal of Materials Chemistry- highest ever impact factor of 4.646 (5th consecutive increase)

Molecular BioSystems- records an impressive second impact factor of 4.236

Natural Product Reports- at 7.450, the second highest impact journal in the medicinal and organic chemistry categories

New Journal of Chemistry- impact factor now 2.942, an 11% increase

Organic & Biomolecular Chemistry- enjoys its highest ever IF at 3.550 (a 12% rise)

Photochemical & Photobiological Sciences- the society-owned journal has an impact factor of 2.144

Soft Matter- impact factor now 4.586, and still the # 1 journal for impact (and immediacy) in the field.

RSC is committed to providing a world-class publishing service to its authors, and to deliver cutting-edge chemical science to researchers throughout the world. The rise in citations, impact factors and immediacy indices provide a clear indication that more researchers than ever before are recognising journals from the RSC as a key resource to access the very best research.

Journals from RSC Publishing provide exceptional value for money, with high impact science, investment in award-winning technologies and flexible pricing models. To add RSC Journals to your collection, please contact our Sales team via the form below.

* The 7 Chemistry journal subject-categories as listed by ISI: Chemistry, Analytical; Chemistry; Applied; Chemistry, Inorganic & Nuclear; Chemistry, Medicinal; Chemistry, Multidisciplinary; Chemistry, Organic; Chemistry, Physical.

Footnote:

The Impact Factor provides an indication of the average number of citations per paper. Produced annually by ISI®, Impact Factors are calculated by dividing the number of citations in a year, by the number of citeable articles published in the preceding two years.

The immediacy index is a measure of how topical and urgent the papers published by a journal are. It is calculated by dividing the number of citations to articles published in a given year by the number of articles published in that year.

Data based on 2008 Impact Factors, calculated by ISI®, released June 2009.

Buy me a Coffee

ChemSpider will go offline today for the next 24 hours. We will switch the servers off at around 11am today (give or take some latitude). We will do a differential backup and restore to the RSC servers all changes to the database and switch over to their systems overnight. Testing performed over the weekend has proceeded rather well and we are hoping for a seamless transition, acknowledging that we will have this one day of downtime.

We apologize in advance for any disruptions. We know that there are a lot of people now using ChemSpider services to feed your own systems so our apologies in advance. We expect improved service for all when this transition is complete.

We’ll see you on the other side of this transition in just over 24 hours. Wish us luck…

Buy me a Coffee

The ChemBL blog is an excellent read and if you’re interested in “Open Access Drug Discovery And Medicinal Chemistry Data ” this is one for you. We are shamelessly, and WITH permission, taking some of the blogposts about New Drug Approvals and adding them into the descriptions on ChemSpider. Some examples are here and here. To date for all cases where we have added the description the compound itself was already on ChemSpider and with the correct name. That’s good news based on some of our subjective measures of coverage for the database.

Buy me a Coffee

The Spectral Game at www.spectralgame.com is powered by chemical structures and spectra from ChemSpider. A provisional form of our manuscript regarding this paper is now online at the Journal of Cheminformatics here:

The Spectral Game: leveraging Open Data and crowdsourcing for education

Jean-Claude Bradley , Robert J Lancashire , Andrew SID Lang and Antony J Williams

Journal of Cheminformatics 2009, 1:9doi:10.1186/1758-2946-1-9

 
Published: 26 June 2009

Abstract (provisional)

We report on the implementation of the Spectral Game, a web-based game where players try to match molecules to various forms of interactive spectra including 1D/2D NMR, Mass Spectrometry and Infrared spectra. Each correct selection earns the player one point and play continues until the player supplies an incorrect answer. The game is usually played using a web browser interface, although a version has been developed in the virtual 3D environment of Second Life. Spectra uploaded as Open Data to ChemSpider in JCAMP-DX format are used for the problem sets together with structures extracted from the website. The spectra are displayed using JSpecView, an Open Source spectrum viewing applet which affords zooming and integration. The application of the game to the teaching of proton NMR spectroscopy in an undergraduate organic chemistry class and a 2D Spectrum Viewer are also presented.

Buy me a Coffee

Since the RSC acquired ChemSpider we have been working hard with the IT team in Cambridge to transfer ChemSpider from our servers and onto the RSC servers. This has been quite a significant undertaking as now we will be dealing with development servers, staging servers and live servers. This is a significant departure from the environment we have been working in for the past couple of years where code was published to the live environment for testing. Some would say this was risky but with the limited resources we had available at the time it was what it was….oh, and it worked!

We have already started testing the system on the RSC servers that will go live sometime early next week. At present the intended schedule is that we will be switching over sometime between Monday and Wednesday. Of course, this is an intention at present and, based on testing, this may change. For right now we have stopped depositions onto ChemSpider. If curation activities continue we will sync these over to the live server next week so no issues there. ChemSpider will go offline next week sometime and, as the actual data becmes clearer, the announcements will be updated.

Watch this space…ChemSpider is moving to the RSC servers and their will be disruptions in the next few days.

Buy me a Coffee

When I present on ChemSpider and talk about community participation one of the common questions is “how many people curate? deposit? annotate? records on ChemSpider”. It’s a low number for each but, in my estimation, it is in-keeping with how we operate as individuals. If you compare the number of people reading Wikipedia articles to writing them I judge it has to be a pretty high ratio of likely >5000:1. Even if its 1000:1 you get the point. More people use than contribute. It is the same for most everything that we use…Amazon book reviews, Netflix DVD reviews, things like that. It’s only when it’s “about us” that the majority of us tend to contribute - to our blogs, our LinkedIn profiles, our Twitter account, our Friendfeed discussions, our Facebook pages etc. I judge this is because it makes us directly visible…we are showing what we are interested in and taking owenership for our comments, activities etc. This is of course human nature…the majority of us have that “look at me” mentality and “connect with like minds” and it is, in many cases, that need for incoming voyeurism and participation that has driven the incredible shift to social networking we are encountering.

There are then the “servants for the community”. In this case I mean servants with the most positive connotation. Those who slave away on Wikipedia articles and don’t immediately have their names up in lights. You actually have to dig under an article to find out who wrote/contributed to it. It’s not upfront and center. On Wikipedia chemistry there are a very small number of dedicated individuals who contribute large blocks of time to working on Wikipedia to improve its quality and content. There is a Long Tail of contribution of course but you might be quite surprised by the small number of “primary” contributors. If you check out their Wiki pages however these individuals are recognized and commended within their own community of participation yet may never be known by the readers of the articles.

On ChemSpider we have a similar situation. There are a very small number of primary curators (I will name them: Myself, Heinz Kolshorn and Barrie Walker - these people are enhancing ChemSpider literally daily). We have a smaller number of secondary contributors who add a spectrum once in a while, annotate a record occasionally or curate out bad data. I would say this is about 30 other people. We also have people who provide us data to deposit and they do it willingly but don’t want to have a hands on approach to depositing data onto the database.

When I was in the UK recently during my first week of employment with the RSC I gave a number of presentations. There was a lot of interest in what ChemSpider could bring to the organization and offer the community and a lot of discussions regardng “what if”. Of the audiences I would suggest that only a small portion actually laid their hands on the system to investigate its capability and an even smaller fraction chose to jump in, feet first, and use the system and participate fully. There was one spike in particular. During the evening after one of the presentations I noticed that one individual in particular was adding comments to individual records, questioning names, suggesting that structure layouts be changed and examining links to external resources. The first evening there were a few edits. The next night, even more, and since then this individual has continued, unabated, making edits and now enhancing the articles with new information, in this case YouTube videos.

david-sharpe_50David Sharpe is fairly new to the RSC and is one of those people who just cares. A silent contibutor in the background (until today!) who is cleaning and enhancing ChemSpider for the sake of the community. To be clear, his work on these activities has been done in the evenings and weekends and this past weekend he was exchanging emails with me about adding “Element Videos” to the elements on ChemSpider. David’s been moving across the elements on ChemSpider and using the YouTube embed functionality to put the Periodic Table videos from the University of Nottingham into the Description section of the appropriate records.

Check out for example the video for Sulphur here. As we move forward we will layer on a recognition system for individuals contributing to ChemSpider so that we can track the spectral depositions, curations and so on. We believe that such efforts warrant recognition and applause. Of course some will choose to be anonymous and remain in the background making their difference in a silent manner. We honor you all.

Reblog this post [with Zemanta]

Buy me a Coffee

scifooScifoo is just a few weeks away and I was reviewing the list of attendees this evening to see who I would be sharing space with.

I am especially looking forward to spening time with Andrew Lang, one of the brains behind the Spectral Game. We’ve spoken on the phone, exchanged many emails and worked on a couple of projects together. But we get to meet at SciFoo!

Last time I was at SciFoo I spent time talking with Cameron Neylon and JC Bradley about Open Notebook Science. At that time I had lots of ideas about what we could do to support Open Notebook Science. We actually have done quite well but at that time we were severely resource constrained. Things are a little different now we have been acquired by the RSC and I am looking forward to talking about what’s necessary and possible now.

Nicko Goncharoff from SureChem will be there. Nicko and I have spent a lot of time together over the past few years, mostly by phone and over email as we worked to integrate SureChem into ChemSpider and use their software development kit under our ChemMantis semantic chemistry markup tool. It’s always good to see him.

Other people I hope to spend some time talking to: Peter Murray Rust from the university of Cambridge, Timo Hannay, Alf Eaton and Terry Sheppard from the Nature Publishing Group and Theodore Gray.

Buy me a Coffee

linkedin I have set up a LinkedIn Users and Advisors group today and welcome any LinkedIn users interested in ChemSpider to join the group and stay informed about our activities on ChemSpider. I hope that it also provides a useful environment for discussion and collaboration around ChemSpider.

The ChemSpider LinkedIn Group can be accessed here.

Reblog this post [with Zemanta]

Buy me a Coffee

eyesOh boy do we have a lot of things to do with ChemSpider. Not only now, while shifting ChemSpider to the RSC infrastructure, but in the future as we do the work necessary to make ChemSpider the primary internet resource for structure-based chemistry. We don’t have small eyes in terms of what we want to deliver to the community. Far from it…we have big eyes and big ideas regarding what is possible and even, in most cases, how to get there. What is clear is that we need the appropriate skill sets to make it happen. At present all ChemSpider platform development work is done by our team over here in the US. We are looking to add a team member into the RSC Offices in Cambridge. We’re looking for someone with established Cheminformatics skills to work with us. They need to have an established track record in working in the field of Cheminformatics, have a deep knowledge of handling chemical structures, experience in working with web-based systems and, of course, have a big appetite for making a difference and wants to work with a fast-moving team. If you’re interested in talking with us about the opportunity ping me at antonyDOTwilliamsATchemspiderDOTcom.

Reblog this post [with Zemanta]

Buy me a Coffee

There are a small number of primary chemical vendors serving the industry. These include companies such as Sigma Aldrich, Spectrum Chemical, Alfa Aesar, ThermoFisher and many others. There are also thousands of smaller companies serving the industry with their chemicals. These can very from a dozen to a few hundred chemicals but rarely number into the 10s of thousands offered by the larger companies. The large chemical companies offer excellent services in terms of delivery of catalogs to the door and circulation of updated CDs of information. I find the Aldrich catalog an excellent tool and have one on my desk, underneath my Merck Index.

Those smaller chemical companies are in the long tail of suppliers that the majority of chemists will never even hear of. Not unless there is some way for those suppliers to deliver their message regarding their list of products, availability and overall their existence, to interested parties. In China specifically there are many hundreds of small chemical companies popping up now. They cannot afford to market themselves via CD distribution and catalogs to their potential userbase and have to depend on their website to market their wares. They likely deposit their collections to the Available Chemical Directory from Symyx (a GREAT product and with a lot of quality work going into it in the background!), maybe into ChemACX from Cambridgesoft, onto ChemExper or onto the eMolecules site. Some of these offer up to date pricing and procurement systems while others offer simply “Get me a Quote” services whereby a chemist can request a quote directly from the vendor for the material of interest.

ChemSpider has been depositing chemical compound collections for chemical vendors, both large and small, for many months. The word seems to have got out that there is value to doing this. Despite the fact that we do not have, at present, the ability to list real time or availability pricing for compounds chemical vendors appear to be deriving value from the listings and chemists are finding chemicals for purchase via ChemSpider.

if there is a certain small molecule chemical vendor that you think we should list on ChemSPider let them know to contact us OR point us to their URL and we will contact them. One example of data added just today is the data set, small though it is, from Asiaron. They offer rich compound pages like this and are a good addition to the database.

Reblog this post [with Zemanta]

Buy me a Coffee

james_jack_50I have ChemMobi running on my iPhone now and, I am happy to say, it looks just like it should. While visiting the RSC in Cambridge a couple of weeks ago I had a chance to hang out with James Jack, the Symyx consultant responsible for developing ChemMobi. That’s him on the left. No, that’s not him trying to hunt sharks with hand held harpoons, it’s him driving the “ChemSpider punt” in a race against the IT team from the RSC. Since we weren’t locals it seemed appropriate to challenge us to a speed punt down the river. This was of course preceded by the imbibing of adequate  amounts of flavored water and juices.

Strangely enough all of us in the ChemSpider punt did appear to have some undiscovered talents for punting. We very quickly lost the IT team back at the “juice house” and found them when we had finished our loop back from our destination. We realized that we had an unfair advantage since we had a dopted a strategy of punting from the surface of the vessel. They had not defined to us that they were doing the whole race in their own way…pushing with a pole while immersed. That’s our colleague Doug Spooner from the IT team showing us how to do it “IT style”. doug-in-cam

ChemMobi will soon be posted to the App Store for you all to download and use. I’ll let you know when…hopefully within a week. All glory, love and adoration for the App should go to James jack and to Symyx for allowing him to do what he does best…get creative with software and structures!

Reblog this post [with Zemanta]

Buy me a Coffee

I have given a number of talks regarding ChemSpider over the past few months and generally comment “ChemSPider hosts almost 21.5 Million unqiue chemical entities from over 200 data sources. As of today it is over 21. 5 million chemical entities. We have deposited data from a number of new contributors of late, many of these are smaller chemical vendors such as Bridge Organics and ExtraSynthese. However, we recently crossed the 21.5 million mark because we have started to take advantage of the eMolecules dataset made available as a downloadable set. There are over 5 million structures in the dataset.

Many, but not all of these, deduplicate onto the ChemSpider database. The 21.5 millionth structure links to this record on eMolecules as shown below.

emolecules

When the data are added onto ChemSpider we automatically add SMILES, InChIs, MW, MF and a series of predicted physicochemical properties. This is for the new structures from eMolecules. In many cases however eMolecules is simply one more data source among many and information such as spectra, Wikipedia links, experimental data etc are all integrated. In this case though eMolecules can help you source a vendor for the material as is their strength.

Buy me a Coffee

ChemSpider has been around for about two and a half years. Based on the feedback we have received from the community regarding our humble offering to the chemistry community users like it. In most cases they “get it” too. They understand that we working to provide information to them that can assist their work. We are hoping to provide some glimpse of data, some snippet of information, some link of value which can enable their studies/research/inquiry. And, in some cases, people want more. Let’s be honest…WE want more. We want to deliver more value, provide more impact and integrate more data for you, the community.

Some of the things that we have been asked for over the past few months are more web services to tap into the experimental data available on ChemSpider (grabbing experimental properties for QSAR modeling for example), more reaction syntheses to peruse, improved speed for substructure searching, similarity searching, integration to more publishers literature and easier to navigate website. Good list!

We have  a set of priorities for the near term and will be doing our utmost to deliver them in time for the IUPAC congress in Glasgow in August and the ACS meeting in Washington later that month. But we want to hear from you. What do you, our users, want to see on ChemSpider. If you had your wishes, and resources were no object, there were no barriers to integration with any data source and you got to define the path forwar for ChemSpider what would it be?

Feel free to share it here on the blog or, if you’d prefer to be more anonymous with your comments, feel free to drop me an email at infoATchemspiderDOTcom. We want your input. Please don’t be shy…engage us and you might just get what you want (though some things might take a while!)

Reblog this post [with Zemanta]

Buy me a Coffee

It’s been a long time since I blogged here on the ChemSpider blog. Now I am officially an employee of the Royal Society of Chemistry and have spent a week in Cambridge meeting my new colleagues, discussing the transfer of ChemSpider to their servers for hosting and working on plans for a relaunch of ChemSpider later in the year. More about that later. I’ll be back in action on this blog in the coming week.

I actually write on two blogs. This one will now be dedicated to ChemSpider activities specifically and focus on new functionality, plans and vision for ChemSpider as a service. My other blog, the ChemConnector blog (www.chemconnector.com/chemunicating) will be more of a personal blog. My views of cheminformatics, activities  in Chemistry and Science, Open Science, Open Access and Open Data and other things that interest me.

Glad to be back and looking forward to connecting with everyone again.

Reblog this post [with Zemanta]

Buy me a Coffee

taxol1A couple of days ago I asked whether readers could see any issues with the structure of Micrococcin P1 published in the C&E News article this week. A few people took a stab on blog and off blog but only Stuart Cantrill from the Nature Publishing Group got it right. One double bond in the wrong place. Subtle, but rather important. General structure drawing tools will help with things like this. For example, a human might not see the issue in the structure of Taxol to the left very easily. Software tools designed to flag valency issues will show the issue easily.

In the expanded image the pentavalent carbon is marked. taxol2The same type of tools would have shown a positive charge on the sulphur in the ring for the incorrect structure of Micrococcin.In the same way, software tools can recognize charge imbalances and incomplete stereochemistry.

I sent an email to the editor of C&E News when I noticed the structure issue but didn’t get a response. Nevertheless it is an advantage of online publications that images can be swapped out easily. This has been done for the online article here at this point and the change, while subtle, is there (shown below). micrococcinp1_new-and-old

The structure is now on the ChemSpider database here.

Reblog this post [with Zemanta]

Buy me a Coffee

Drawing accurate representations of chemical structures is difficult. Copying them from publications can be fraught with errors and it is common to see that structures in publications are incomplete in their definitions of stereochemistry and that groups are missing anyway. Such is the nature of the beast. I have blogged recently about an observation of a structure drawing error in C&E News and the editor was kind enough to comment. Here’s an image of a structure from a C&E News article about Micrococcin P1 from this weeks magazine. Check out the structure….can you see any issues?

micrococcin-p1_cenews Now that ChemSpider is part of the RSC we will be able to offer some of our experiences in identifying potential errors in structures before they are published. There are ways to do this so that both authors and editors alike get flagged to such issues. This is way down the road from migrating ChemSpider to RSC servers but would definitely bring value to helping to ensure quality of data in Chemistry.

Feel free to post your comments regarding any issues you see with the structure as drawn.

Reblog this post [with Zemanta]

Buy me a Coffee

PhysChim62 (PC) is someone I meet with regularly on the Wikipedia Chemistry IRC chats. We’ve never met but I judge we have mutual respect, earned through many hours of working to improve the chemistry on Wikipedia. PC has been at it for a long time and has a broad reach in the WP community…I’m focused primarily on structure validation and delivering tools which can be of value to Wikipedians. If you have an interest in Chemistry on Wikipedia it’s one to add to your blogroll/reader as PC will likely touch on this quite regularly, as well as other things of interest. The blog is at http://phoscarb.blogspot.com/.

Reblog this post [with Zemanta]

Buy me a Coffee

We’ve received a lot of kudos, congratulations and praise for our decision to become a part of the RSC. We thank everyone who has gone out of their way to acknowledge the shift in our circumstances. We did have some concern that some people would judge us on “selling out” rather than going it alone. Based on the feedback to date our worries were unfounded.

Tonight the comments of Warren DeLano, developer of the Open Source platform PyMOL (more details here), truly struck a chord with me. His comments are below.

pymol“DeLano Scientific LLC congratulates Antony Williams et al. on the acquisition of ChemSpider by the Royal Society of Chemistry. This historic event provides a compelling example of how an independent open-minded project (open-access, open-data, open-source, etc.) can increase its resources and extend its longevity without compromising on its core mission, as is always necessary when a project “sells out” to a for-profit company beholden to narrow fiduciary objectives.

We hope that the ChemSpider / RSC example will both inspire more open-minded individuals to strike out on their own with similarly ambitious efforts and encourage various non-profit and government entities to actively recruit successful projects back into “the establishment” in ways which do not compromise project integrity and yet can enable even greater long-term positive societal impacts.”

Specifically the statement “without compromising on its core mission” hit me. It’s exactly why the fit with the RSC felt right. RSC are focused on Advancing the Chemical Sciences and look upon ChemSpider as a way to help the community to access information, data and knowledge and bring together chemists, publishers, vendors and other parties. It’s been our mission all along. So, we are not compromised as we have the same intentions. A great match.

Thanks to Warren for the recognition. Much appreciated.

Reblog this post [with Zemanta]

Buy me a Coffee

spiderman-costumesI’m heading over to the UK shortly for a week-long meeting with the RSC. In case there is any confusion I WILL be an employee of the RSC working on ChemSpider and we are building our ChemSpider team at present. I’m really looking forward to the meeting as I have already met many of the people and they are skilled, focused and yet lighthearted and funny. Yes, funny. Maybe it comes with territory of working with a young, passionate team of people. One thing about the RSC that I enjoyed during my last visit was the ENERGY in the building. The place is buzzing. There is a lot of young passionate energy with mature skills in the building and it is focused on growing the reputation and impact of the society. Even the “older guys” of which I am now one (!) have this youthful spirit that they bring to RSC. It’s great.

BUT, enough is enough. Okay, I might still run 5km a few days a week, and I might still lift weights a few times a week but gravity is not my friend and I do not have the lithe, supple physique that I had as a 30 year old. Add to that twin boys tearing me apart and bilateral rotator cuff injuries from said boys and I have not been able to stay in shape to the level I had hoped this past year. So, imagine my surprise when I am told that for the inaugral ChemSpider presentation to RSC staff in June I will be expected to dress appropriately. Here’s me thinking that meant a shirt and tie (and best behavior) but no…here comes a package with a “party dress” for me. Sure…make fun of the ChemSpiderman moniker why don’t you! Look at that costume. I wouldn’t wear it when I was young and lithe. Not my thing that. Sorry guys, I have my limits..it’ll be shirt and tie and maybe best behavior but no Lycra Spandex Spidey suit for me for my presentation at RSC!

Reblog this post [with Zemanta]

Buy me a Coffee

An article in the latest C&E News discusses the acquisition of ChemSpider by the Royal Society of Chemistry. I certainly appreciate the comments of Robert Massie, President of CAS who stated:  “CAS has worked with Williams in the past,” CAS President Robert J. Massie notes. “We join everyone who is interested in the advance of chemical information in recognizing his considerable contributions. We are delighted to see that his creativity and enthusiasm will continue to benefit the chemical enterprise.”

I worked a lot with CAS while I was at ACD/Labs (over 10.5 years and left there as their Chief Science officer). I was intitmately involved in the development and deployment of a number of software tools and visited Columbus many times. I have many fond memories of working with the CAS team and there are some great people working at the organization. I hope that in my new role at the Royal Society of Chemistry that I will have the opportunity to work with CAS again in a collaborative and cross-publisher manner to the benefit of the  Chemistry Community.

Reblog this post [with Zemanta]

Buy me a Coffee

With the news about the RSC acquiring ChemSpider assets sterting to settle it is time to get back to work. One of the things we are noticing is that people are really starting to take advantage of the ability to integrate their articles to ChemSpider via the Add DOI function that is available to registered users. If you want to associate a paper with a single chemical structure then it is very easy and uses the CrossRef service to Fetch the Result from a DOI lookup and deposit directly to ChemSpider. The four images below outline the way in which this can be done. In this example I want to associate two particular articles with the record for  1,2-dioxetanedione shown below.

It’s easy. Navigate to the record of interest, Make sure that the structure is the correct structure of interest and Simply click the Add DOI button above the chemical structure to the left. Don’t forget you must be logged in! Now, Enter the DOI, click on LookUp and confirm that the title retrieved is the correct publication. Then click on OK. Now the publication will be submitted for a curator to confirm that it is appropriate and it will show up online under the supplementary information when approved.

There are also processes for depositing an SDF file with a single publication and the SAME process is applied to connecting via PubmedIDs (Add PMID). Try it out. Help the community discover publications by adding appropriate DOIs to particular records. Look at how many there are associated with cholesterol already.

doipart1

doipart2doipart3doipart4

Reblog this post [with Zemanta]

Buy me a Coffee