LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 104

Search options

  1. Article: Opening up connectivity between documents, structures and bioactivity.

    Southan, Christopher

    Beilstein journal of organic chemistry

    2020  Volume 16, Page(s) 596–606

    Abstract: Bioscientists reading papers or patents strive to discern the key relationships reported within a document "D" where a bioactivity "A" with a quantitative result "R" (e.g., an ... ...

    Abstract Bioscientists reading papers or patents strive to discern the key relationships reported within a document "D" where a bioactivity "A" with a quantitative result "R" (e.g., an IC
    Language English
    Publishing date 2020-04-02
    Publishing country Germany
    Document type Journal Article ; Review
    ZDB-ID 2192461-2
    ISSN 1860-5397
    ISSN 1860-5397
    DOI 10.3762/bjoc.16.54
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Opening up connectivity between documents, structures and bioactivity

    Christopher Southan

    Beilstein Journal of Organic Chemistry, Vol 16, Iss 1, Pp 596-

    2020  Volume 606

    Abstract: Bioscientists reading papers or patents strive to discern the key relationships reported within a document “D“ where a bioactivity “A” with a quantitative result “R” (e.g., an IC50) is reported for chemical structure “C” that modulates (e.g., inhibits) a ...

    Abstract Bioscientists reading papers or patents strive to discern the key relationships reported within a document “D“ where a bioactivity “A” with a quantitative result “R” (e.g., an IC50) is reported for chemical structure “C” that modulates (e.g., inhibits) a protein target “P”. A useful shorthand for this connectivity thus becomes DARCP. The problem at the core of this article is that the community has spent millions effectively burying these relationships in PDFs over many decades but must now spend millions more trying to get them back out. The key imperative for this is to increase the flow into structured open databases. The positive impacts will include expanded data mining opportunities for drug discovery and chemical biology. Over the last decade commercial sources have manually extracted DARCP from ≈300,000 documents encompassing ≈7 million compounds interacting with ≈10,000 targets. Over a similar time, the Guide to Pharmacology, BindingDB and ChEMBL have carried out analogues DARCP extractions. Although their expert-curated numbers are lower (i.e., ≈2 million compounds against ≈3700 human proteins), these open sources have the great advantage of being merged within PubChem. Parallel efforts have focused on the extraction of document-to-compound (D-C-only) connectivity. In the absence of molecular mechanism of action (mmoa) annotation, this is of less value but can be automatically extracted. This has been significantly accomplished for patents, (e.g., by IBM, SureChEMBL and WIPO) for over 30 million compounds in PubChem. These have recently been joined by 1.4 million D-C submissions from three major chemistry publishers. In addition, both the European and US PubMed Central portals now add chemistry look-ups from abstracts and full-text papers. However, the fully automated extraction of DARCLP has not yet been achieved. This stands in contrast to the ability of biocurators to discern these relationships in minutes. Unfortunately, no journals have yet instigated a flow of author-specified DARCP directly into ...
    Keywords activity data ; databases ; drug discovery ; chemical structures ; protein targets ; Science ; Q ; Organic chemistry ; QD241-441
    Subject code 020
    Language English
    Publishing date 2020-04-01T00:00:00Z
    Publisher Beilstein-Institut
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article ; Online: Caveat Usor: Assessing Differences between Major Chemistry Databases.

    Southan, Christopher

    ChemMedChem

    2018  Volume 13, Issue 6, Page(s) 470–481

    Abstract: The three databases of PubChem, ChemSpider, and UniChem capture the majority of open chemical structure records with February 2018 totals of 95, 63, and 154 million, respectively. Collectively, they constitute a massively enabling resource for ... ...

    Abstract The three databases of PubChem, ChemSpider, and UniChem capture the majority of open chemical structure records with February 2018 totals of 95, 63, and 154 million, respectively. Collectively, they constitute a massively enabling resource for cheminformatics, chemical biology, and drug discovery. As meta-portals, they subsume and link out to the major proportion of public bioactivity data extracted from the literature and screening center assay results. Therefore, they not only present three different entry points, but the many subsumed independent resources present a fourth entry point in the form of standalone databases. Because this creates a complex picture it is important for users to have at least some appreciation of differential content to enable utility judgments for the tasks at hand. This turns out to be challenging. By comparing the three resources in detail, this review assesses their differences, some of which are not obvious. This includes the fact that coverage is significantly different between the 587, 282, and 38 contributing sources, respectively. This not only presents the "who-has-what" question, but also the reason "why" any particular inclusion is considered valuable is rarely made explicit. Also confusing is that sources nominally in common (i.e., having the same submitter name) can have significantly different structure counts, not only in each of the three but also from their standalone instantiations. Assessing a series of examples indicates that differences in loading dates and structural standardization are the main causes of this inter-portal discordance.
    MeSH term(s) Databases, Chemical ; Databases, Factual ; Humans ; Proteins/chemistry ; Proteins/metabolism
    Chemical Substances Proteins
    Language English
    Publishing date 2018-02-23
    Publishing country Germany
    Document type Journal Article ; Research Support, Non-U.S. Gov't ; Review
    ZDB-ID 2218496-X
    ISSN 1860-7187 ; 1860-7179
    ISSN (online) 1860-7187
    ISSN 1860-7179
    DOI 10.1002/cmdc.201700724
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article: Last rolls of the yoyo: Assessing the human canonical protein count.

    Southan, Christopher

    F1000Research

    2017  Volume 6, Page(s) 448

    Abstract: In 2004, when the protein estimate from the finished human genome was only 24,000, the surprise was compounded as reviewed estimates fell to 19,000 by 2014. However, variability in the total canonical protein counts (i.e. excluding alternative splice ... ...

    Abstract In 2004, when the protein estimate from the finished human genome was only 24,000, the surprise was compounded as reviewed estimates fell to 19,000 by 2014. However, variability in the total canonical protein counts (i.e. excluding alternative splice forms) of open reading frames (ORFs) in different annotation portals persists. This work assesses these differences and possible causes. A 16-year analysis of Ensembl and UniProtKB/Swiss-Prot shows convergence to a protein number of ~20,000. The former had shown some yo-yoing, but both have now plateaued. Nine major annotation portals, reviewed at the beginning of 2017, gave a spread of counts from 21,819 down to 18,891. The 4-way cross-reference concordance (within UniProt) between Ensembl, Swiss-Prot, Entrez Gene and the Human Gene Nomenclature Committee (HGNC) drops to 18,690, indicating methodological differences in protein definitions and experimental existence support between sources. The Swiss-Prot and neXtProt evidence criteria include mass spectrometry peptide verification and also cross-references for antibody detection from the Human Protein Atlas. Notwithstanding, hundreds of Swiss-Prot entries are classified as non-coding biotypes by HGNC. The only inference that protein numbers might still rise comes from numerous reports of small ORF (smORF) discovery. However, while there have been recent cases of protein verifications from previous miss-annotation of non-coding RNA, very few have passed the Swiss-Prot curation and genome annotation thresholds. The post-genomic era has seen both advances in data generation and improvements in the human reference assembly. Notwithstanding, current numbers, while persistently discordant, show that the earlier yo-yoing has largely ceased. Given the importance to biology and biomedicine of defining the canonical human proteome, the task will need more collaborative inter-source curation combined with broader and deeper experimental confirmation
    Language English
    Publishing date 2017-04-07
    Publishing country England
    Document type Journal Article ; Review
    ZDB-ID 2699932-8
    ISSN 2046-1402
    ISSN 2046-1402
    DOI 10.12688/f1000research.11119.1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Retrieving GPCR data from public databases.

    Southan, Christopher

    Current opinion in pharmacology

    2016  Volume 30, Page(s) 38–43

    Abstract: Improvements in databases have already impacted GPCR research. The purpose of the review is to give a snapshot of the GPCR data available and provide utility examples. Consequently, this review covers a small set of major databases, including UniProt for ...

    Abstract Improvements in databases have already impacted GPCR research. The purpose of the review is to give a snapshot of the GPCR data available and provide utility examples. Consequently, this review covers a small set of major databases, including UniProt for proteins, Ensembl for genes, ChEMBL for bioactive chemistry and SureChEMBL for patents. In addition, two portals are outlined, GPCRdb and the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) that are based on expert annotation. The former has an emphasis on structures, sequences, point mutations, analysis tools and visualisation. The latter focuses on endogenous GPCR ligands, pharmacological modulation, approved drugs, clinical candidates and tool compounds. Since data growth is accelerating, those embarking on GPCR projects should not only check databases but also recent journal and patent publications.
    MeSH term(s) Databases, Factual ; Drug Discovery/methods ; Humans ; Ligands ; Pharmaceutical Preparations/metabolism ; Point Mutation ; Receptors, G-Protein-Coupled/chemistry ; Receptors, G-Protein-Coupled/drug effects ; Receptors, G-Protein-Coupled/metabolism
    Chemical Substances Ligands ; Pharmaceutical Preparations ; Receptors, G-Protein-Coupled
    Language English
    Publishing date 2016-10
    Publishing country England
    Document type Journal Article ; Review
    ZDB-ID 2037057-X
    ISSN 1471-4973 ; 1471-4892
    ISSN (online) 1471-4973
    ISSN 1471-4892
    DOI 10.1016/j.coph.2016.07.002
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Expanding opportunities for mining bioactive chemistry from patents.

    Southan, Christopher

    Drug discovery today. Technologies

    2015  Volume 14, Page(s) 3–9

    Abstract: Bioactive structures published in medicinal chemistry patents typically exceed those in papers by at least twofold and may precede them by several years. The Big-Bang of open automated extraction since 2012 has contributed to over 15 million patent- ... ...

    Abstract Bioactive structures published in medicinal chemistry patents typically exceed those in papers by at least twofold and may precede them by several years. The Big-Bang of open automated extraction since 2012 has contributed to over 15 million patent-derived compounds in PubChem. While mapping between chemical structures, assay results and protein targets from patent documents is challenging, these relationships can be harvested using open tools and are beginning to be curated into databases.
    MeSH term(s) Chemistry, Pharmaceutical ; Data Mining ; Databases, Factual ; Patents as Topic
    Language English
    Publishing date 2015-02-11
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't ; Review
    ISSN 1740-6749
    ISSN (online) 1740-6749
    DOI 10.1016/j.ddtec.2014.12.001
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Last rolls of the yoyo

    Christopher Southan

    F1000Research, Vol

    Assessing the human canonical protein count [version 1; referees: 1 approved, 2 approved with reservations]

    2017  Volume 6

    Abstract: In 2004, when the protein estimate from the finished human genome was only 24,000, the surprise was compounded as reviewed estimates fell to 19,000 by 2014. However, variability in the total canonical protein counts (i.e. excluding alternative splice ... ...

    Abstract In 2004, when the protein estimate from the finished human genome was only 24,000, the surprise was compounded as reviewed estimates fell to 19,000 by 2014. However, variability in the total canonical protein counts (i.e. excluding alternative splice forms) of open reading frames (ORFs) in different annotation portals persists. This work assesses these differences and possible causes. A 16-year analysis of Ensembl and UniProtKB/Swiss-Prot shows convergence to a protein number of ~20,000. The former had shown some yo-yoing, but both have now plateaued. Nine major annotation portals, reviewed at the beginning of 2017, gave a spread of counts from 21,819 down to 18,891. The 4-way cross-reference concordance (within UniProt) between Ensembl, Swiss-Prot, Entrez Gene and the Human Gene Nomenclature Committee (HGNC) drops to 18,690, indicating methodological differences in protein definitions and experimental existence support between sources. The Swiss-Prot and neXtProt evidence criteria include mass spectrometry peptide verification and also cross-references for antibody detection from the Human Protein Atlas. Notwithstanding, hundreds of Swiss-Prot entries are classified as non-coding biotypes by HGNC. The only inference that protein numbers might still rise comes from numerous reports of small ORF (smORF) discovery. However, while there have been recent cases of protein verifications from previous miss-annotation of non-coding RNA, very few have passed the Swiss-Prot curation and genome annotation thresholds. The post-genomic era has seen both advances in data generation and improvements in the human reference assembly. Notwithstanding, current numbers, while persistently discordant, show that the earlier yo-yoing has largely ceased. Given the importance to biology and biomedicine of defining the canonical human proteome, the task will need more collaborative inter-source curation combined with broader and deeper experimental confirmation in vivo and in vitro of proteins predicted in silico. The eventual closure ...
    Keywords Protein Chemistry & Proteomics ; Medicine ; R ; Science ; Q
    Subject code 572
    Language English
    Publishing date 2017-04-01T00:00:00Z
    Publisher F1000 Research Ltd
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Article ; Online: Will the chemical probes please stand up?

    Škuta, Ctibor / Southan, Christopher / Bartůněk, Petr

    RSC medicinal chemistry

    2021  Volume 12, Issue 8, Page(s) 1428–1441

    Abstract: In 2005, the NIH Molecular Libraries Program (MLP) undertook the identification of tool compounds to expand biological insights, now termed small-molecule chemical probes. This inspired other organisations to initiate similar efforts from 2010 onwards. ... ...

    Abstract In 2005, the NIH Molecular Libraries Program (MLP) undertook the identification of tool compounds to expand biological insights, now termed small-molecule chemical probes. This inspired other organisations to initiate similar efforts from 2010 onwards. As a central focus of the Probes & Drugs portal (P&D), we have standardised, integrated and compared sets of declared probe compounds harvested from 12 different sources. This turned out to be challenging and revealed unexpected anomalies. Results in this work address key questions including; a) individual and total structure counts, b) overlaps between sources, c) comparisons with selected PubChem sources and d) investigating the probe coverage of druggable targets. In addition, we developed new high-level scoring schemes to filter collections down to probes of higher quality. This generated 548 high-quality chemical probes (HQCP) covering 447 distinct protein targets. This HQCP collection has been added to the P&D portal and will be regularly updated as established sources expand and new ones release data.
    Language English
    Publishing date 2021-07-16
    Publishing country England
    Document type Journal Article
    ISSN 2632-8682
    ISSN (online) 2632-8682
    DOI 10.1039/d1md00138h
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: BACE2 as a new diabetes target: a patent review (2010 - 2012).

    Southan, Christopher

    Expert opinion on therapeutic patents

    2013  Volume 23, Issue 5, Page(s) 649–663

    Abstract: Introduction: When two novel aspartyl proteases were published in 1999 and 2000, beta-site APP-cleaving enzyme 1 (BACE1) was confirmed as the long sought after beta-secretase and Alzheimer's disease drug target. However, the role of its paralogue, BACE2, ...

    Abstract Introduction: When two novel aspartyl proteases were published in 1999 and 2000, beta-site APP-cleaving enzyme 1 (BACE1) was confirmed as the long sought after beta-secretase and Alzheimer's disease drug target. However, the role of its paralogue, BACE2, proved elusive until a 2011 publication implicated it as a Collectrin (TMEM27) secretase controlling pancreatic beta-cell proliferation and a new therapeutic intervention for diabetes.
    Areas covered: This review, using SureChemOpen, encompasses early validation compounds and small-molecule BACE2 inhibitors for diabetes. Since 2010, one assay patent and several chemical series have been published by Roche but these were followed by filings from Novartis and Schering in 2012. The patents from these three companies include BACE2-only filings but also some specifying both BACE1 and BACE2 inhibitors.
    Expert opinion: Roche's early collaborative target validation has given them a lead in BACE2 medicinal chemistry. However, the extensive data output for BACE1 in patents and papers over the last decade, plus liganded crystal structures for both proteases, should expedite the design of BACE2 inhibitors by other organisations. This may also shorten the development time for clinical candidates that, unlike those now entering Phase I trials for BACE1, would not need to be brain-penetrant.
    MeSH term(s) Amyloid Precursor Protein Secretases/antagonists & inhibitors ; Amyloid Precursor Protein Secretases/metabolism ; Animals ; Aspartic Acid Endopeptidases/antagonists & inhibitors ; Aspartic Acid Endopeptidases/metabolism ; Diabetes Mellitus, Type 2/drug therapy ; Diabetes Mellitus, Type 2/physiopathology ; Drug Design ; Humans ; Hypoglycemic Agents/pharmacology ; Insulin-Secreting Cells/metabolism ; Molecular Targeted Therapy ; Patents as Topic
    Chemical Substances Hypoglycemic Agents ; Amyloid Precursor Protein Secretases (EC 3.4.-) ; Aspartic Acid Endopeptidases (EC 3.4.23.-) ; BACE2 protein, human (EC 3.4.23.45) ; BACE1 protein, human (EC 3.4.23.46)
    Language English
    Publishing date 2013-05
    Publishing country England
    Document type Journal Article ; Review
    ZDB-ID 1186201-4
    ISSN 1744-7674 ; 0962-2594 ; 1354-3776
    ISSN (online) 1744-7674
    ISSN 0962-2594 ; 1354-3776
    DOI 10.1517/13543776.2013.780032
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article: InChI in the wild: an assessment of InChIKey searching in Google.

    Southan, Christopher

    Journal of cheminformatics

    2013  Volume 5, Issue 1, Page(s) 10

    Abstract: While chemical databases can be queried using the InChI string and InChIKey (IK) the latter was designed for open-web searching. It is becoming increasingly effective for this since more sources enhance crawling of their websites by the Googlebot and ... ...

    Abstract While chemical databases can be queried using the InChI string and InChIKey (IK) the latter was designed for open-web searching. It is becoming increasingly effective for this since more sources enhance crawling of their websites by the Googlebot and consequent IK indexing. Searchers who use Google as an adjunct to database access may be less familiar with the advantages of using the IK as explored in this review. As an example, the IK for atorvastatin retrieves ~200 low-redundancy links from a Google search in 0.3 of a second. These include most major databases and a very low false-positive rate. Results encompass less familiar but potentially useful sources and can be extended to isomer capture by using just the skeleton layer of the IK. Google Advanced Search can be used to filter large result sets. Image searching with the IK is also effective and complementary to open-web queries. Results can be particularly useful for less-common structures as exemplified by a major metabolite of atorvastatin giving only three hits. Testing also demonstrated document-to-document and document-to-database joins via structure matching. The necessary generation of an IK from chemical names can be accomplished using open tools and resources for patents, papers, abstracts or other text sources. Active global sharing of local IK-linked information can be accomplished via surfacing in open laboratory notebooks, blogs, Twitter, figshare and other routes. While information-rich chemistry (e.g. approved drugs) can exhibit swamping and redundancy effects, the much smaller IK result sets for link-poor structures become a transformative first-pass option. The IK indexing has therefore turned Google into a de-facto open global chemical information hub by merging links to most significant sources, including over 50 million PubChem and ChemSpider records. The simplicity, specificity and speed of matching make it a useful option for biologists or others less familiar with chemical searching. However, compared to rigorously maintained major databases, users need to be circumspect about the consistency of Google results and provenance of retrieved links. In addition, community engagement may be necessary to ameliorate possible future degradation of utility.
    Language English
    Publishing date 2013-02-11
    Publishing country England
    Document type Journal Article
    ZDB-ID 2486539-4
    ISSN 1758-2946
    ISSN 1758-2946
    DOI 10.1186/1758-2946-5-10
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top