Thursday, August 11, 2011

Open Melting Point Collection Book Edition 1

Several months of work through a collaboration between myself, Andrew Lang, Antony Williams and Evan Curtin have culminated in the publication of an Open Melting Point Collection Book. Like our other books on solubility and Reaction Attempts, the conversion from a database format to a PDF has several advantages.

Now that the book has been accepted by Nature Precedings, it provides a convenient mechanism for citation via DOI, a formal author list, version control, etc. The book is also now available from LuLu.com either as a free PDF download or a physical copy. Because the book runs 699 pages (it covers 2706 unique compounds) the lowest price we could get is $30.96, which just covers printing and shipping.


Even though we have melting points for about 20,000 unique compounds, most of these are from single sources. Unless we can get another major donation of melting points (not using any of the sources we already have), progress in curating single values manually will take time.

As described in the abstract:
This book represents a PDF version of Dataset ONSMP029 (2706 unique compounds, 7413 measurements) from a project to collect and curate melting points made available as Open Data. This particular collection was selected from the application of a threshold to favor the likelihood of reliability. Specifically, the entire range of averaged values for a data point was set to 0.01 C to 5 C, with at least two different measurements within this range. Measurements were pooled and processed from the following sources: Alfa Aesar, MDPI, Bergstrom, PhysProp, DrugBank, Bell, Oxford MSDS, Hughes, Griffiths and the Chemical Information Validation Spreadsheet. Links to all the information sources and web services are available from the Open Melting Point Resource page: http://onswebservices.wikispaces.com/meltingpoint

This filtering of double validated melting point measurements within a range of 5C is an attempt to provide a "reasonably" good source, It is imperative to understand that this is not a "trusted source" - as I've mentioned several time there is no such thing. However, since absolute trusted sources do not exist, this double validated dataset of 2706 compounds is probably the best we can do for now. In fact, use of this double validated to build melting point model has led to some excellent models, which are far superior to models constructed from the entire database of 20,000 compounds.


0 Comments:

Post a Comment

<< Home

Creative Commons Attribution Share-Alike 2.5 License