"Prostate cancer proteomics" database.

A database of Prostate Cancer Proteomics has been created by using the results of a proteomic study of human prostate carcinoma and benign hyperplasia tissues, and of some human-cultured cell lines (PCP, http://ef.inbi.ras.ru). PCP consists of 7 interrelated modules, each containing four levels of proteomic and biomedical data on the proteins in corresponding tissues or cells. The first data level, onto which each module is based, is a 2DE proteomic reference map where proteins separated by 2D electrophoresis, and subsequently identified by mass-spectrometry, are marked. The results of proteomic experiments form the second data level. The third level contains protein data from published articles and existing databases. The fourth level is formed with direct Internet links to the information on corresponding proteins in the NCBI and UniProt databases. PCP contains data on 359 proteins in total, including 17 potential biomarkers of prostate cancer, particularly AGR2, annexins, S100 proteins, PRO2675, and PRO2044. The database will be useful in a wide range of applications, including studies of molecular mechanisms of the aetiology and pathogenesis of prostate diseases, finding new diagnostic markers, etc.


INTRODUCTION
The first decade of the post-genome era was marked by a rapid development in the field of bioinformatics, the extension of major databases (such as NCBI and UniProt), and the creation of specialised information resources for biomedical research in many countries [1][2][3][4]. The impressive resources created in Ireland (UCD-2DPAGE, http://proteomics-portal.ucd.ie:8082/ cgi-bin/2d/2d.cgi) [2] and India (Human Proteinpedia, www.humanproteinpedia.org) [3] make the state of things in Russia pale in comparison.
Currently, one of the most important tasks for biomedical research is to find efficient prostate cancer (PCa) biomarkers which would enable new diagnostic methods [5][6][7][8]. The fact that in recent years the PCa incidence rate has dramatically increased worldwide [9,10], and particularly in Russia, making PCa the most frequent male oncological disease in some countries [11,12], is reason enough to pay close attention to this disease. In early diagnostics of PCa at the moment it is important to establish the presence of one of the most studied biomarkers, the so-called Prostate-Specific Antigens (PSA), in the blood. The test, however, is known to produce a significant number of false-positive and false-negative results, leading to the wrong clinical and financial outcomes [5,7]. Therefore, in the U.S. and in other Western countries, new PCa biomarkers are being actively sought, an initiative recently stimulated by the development of proteomic and other post-genome technologies [6,8,13].
Since 2005, the Bach Institute of Biochemistry, in collaboration with other research and medical institutions, has been researching new PCa biomarkers by utilising various proteomic technologies [14,15]. In 2009, the "Prostate Cancer Proteomics" (PCP, http://ef.inbi.ras.ru/) national database was created in order to facilitate this research, summarising experimental and referenced published data and providing links to several other biomedical Internet databases. This paper describes the structure and capabilities of the new, extended PCP version.
was performed using clinical, histological, and immunochemical (PSA level) tests. Histological verification was performed via U.S.-controlled transrectal multifocal needle biopsy; up to 18 tissue samples from various prostate zones per patient were taken [16,17]. All PCa cases were found to be adenocarcinoma. Gleason score was determined by following the standard procedure [16,17].
In parallel tests, we analysed the proteins of the РС-3 (АСС 465), DU-145 (ACC 261), and BPH-1 (ACC 143) cell cultures purchased from the German Collection of Microorganisms and Cell Cultures, as well as the proteins of cultured cells of the LNCaP line provided by Dr. I. G. Shemyakin (Obolensk National Science Centre for Applied Microbiology and Biotechnology). The cells were cultured in the RPMI-1640 medium with HEPES, sodium pyruvate, gentamicine and 20% fetal bovine serum (FBS) [18], using cell culture plastic (Costar, USA and Nunc, Denmark) in a СО 2 -incubator (Sanyo, Japan). In addition, we studied proteins from the cultured cells of two lines of human rhabdomyosarcoma (А-204 and RD) purchased from the Ivanovsky Virology Institute, RAMS, and proteins from the cultured normal human myoblasts kindly provided by Dr T. B. Krohina [19].
The preparation of protein extracts, their O'Farrell 2DE fractioning, Coomassie Blue R-250 and silver nitrate staining, and 2DE analysis were performed following the techniques described in [20,21]. In addition, we used a 2DE procedure with isoelectric focusing using IPG-PAGE and Ettan IPGphor 3 kit (GE Healthcare), according to the manufacturer's protocol. Pro-Prepare human biomaterials for study Step 1 Collect biopsy and prostatectomy sample series from patients with PCa and BPH Select several cultured human cell lines (LNCaP, PC-3, DU-145, BPH-1 etc) for a comparative study Step 2 Step 4 Create the four-level PCP database using the protein maps of human prostate tissue samples and cultured cell lines http://ef.inbi.ras.ru teins were identified with MALDI-TOF MS and MS/ MS using an Ultraflex instrument (Bruker) at a 336-nm UV laser beam in a 500-8000 Da cation mode calibrated using reference trypsin autolysis peaks and processed with Mascot software, Peptide Fingerprint option (Matrix Science, USA) [21,22]. The proteins were identified by matching experimental masses with the masses of proteins listed in the NCBI Рrotein and SwissProt/ TrEMBL databases. The accuracy of monoisotopic masses measured in the reflection mode calibrated with autolytic trypsin peaks was 0.005%, and the accuracy of the fragment masses was ±1 Da. Hypothetical proteins identified with MALDI-TOF MS corresponding to fragments of the full-size proteins, which are products of corresponding genes, were revealed with MS/ MS. The molecular masses of protein fractions were determined using the ultrapure recombinant protein sets SM0661 (10-200 kDa) and SM0671 (10-170 kDa) (Fermentas). The measurement of the optical density of 2DE images and/or their fragments was performed following scanning (Epson expression 1680) or digital photography (Nikon 2500 or Canon PowerShot A1000 IS). Digital image processing with densitometry of the protein fractions was performed with Melanie Image-Master, versions 6 and 7 (Genebio). Data logging and processing for the Prostate Cancer Proteomics multilevel database were done with various software packages, including MapThis!, Molly Penguin Software, Mozilla Firefox, and some Microsoft Office applications. A MySQL-based interactive database was used which could be updated and modified online using any computer with Internet connection. The BIOSTAT and Microsoft Office Excel 2003 software packages were used for statistical analysis.

RESULTS AND DISCUSSION
According to the conventional proteomics strategy developed in the late 20 th century, the national PCP database was created in several consecutive steps which involved systematic characterisation of proteins in prostate tissue samples obtained from benign and malignant tumors ( Fig. 1) ( [23,24]). Proteins from several  cultured human cell lines were studied in parallel experiments ( Fig. 1). The first step was to make series of 2DE protein samples (50 or more) by fractioning dozens of bioptates or prostate tissue samples (from 30 or more patients). Figure 2A illustrates a typical 2DE of the PCa prostate tissue proteins. The 2DE series for cell line proteins were created with 20 2DE, taking into account the homogeneity of the analyte.
The distribution of 2DE protein fractions was registered and stored as graphic *.tif files. The images of the entire 2DE and (in some cases) their segments were produced by scanning and/or digital photography. The relevance of the selected 2DE was assessed by comparing protein fractioning results using digital image matching [23,24].
The second step was to construct synthetic 2D maps of the proteins. 2DE images from each series were standardised with Melanie ImageMaster software using 15 selected reference points corresponding to easily identifiable major protein fractions. Figure 2B shows reference points on the 2DE images of proteins from prostate tissue samples.
Each image was analysed using the Cummings technique [25] with some modifications [20,24]. The analysis was based on dividing the images into 49 rectangular fragments, the sides of which formed six horizontal and six vertical standard lines and the sides of the 2DE image itself. The points for plotting the horizontal lines were determined by special molecular-weight-marker proteins, which were placed on each gel plate before fractioning in the second direction (SDS-PAGE). Thus, the protein fractions located on the corresponding horizontal lines have identical molecular weights. For plotting the vertical lines, protein markers with previously measured pI values were used [20,24]. As a result, each image analyzed consisted of 49 rectangular fragments, each usually containing no more than 10 protein fractions (only 4 fragments contained more than 20 fractions). Image fragmenting significantly simplified the image comparison and construction of synthetic 2D maps.
The described procedure was performed with the 60 best 2DE images of proteins from BHP samples and with 70 2DE images of proteins from PCa samples. The comparison of standardised BHP and PCa 2DE images showed that the coordinates of more than 95% of protein fractions were constant. Quantitative or qualitative variations in the coordinates were observed for less than 5% of the fractions. The variations could be caused by genetic factors (e.g. single nucleotide polymorphism) or a different expression of corresponding genes, as well as differences in tissue composition of the samples and the pathology's intensity.
Having made sure that the positions of the majority of the protein fractions were constant, we were able to construct the 2D maps of prostate tissue proteins from patients with BHP and PCa. After that, we performed a fragment-by-fragment comparison of the 2D maps constructed. The fraction patterns in BHP and PCa maps were found to be quite similar, the difference being that about 20 fractions present in the PCa map were either present in a much smaller amount in the BHP map or absent altogether. We paid particular attention to those fractions in further study, as described below. In general, as a result of our analysis, an integrated synthetic 2D map of human prostate proteins was constructed that contains more than 200 protein fractions in the ranges of Mw 8.5-450 kDa and pI 4.5-11.5 (Fig. 2C). Each fraction was assigned a unique seven-digit number; the first four digits representing the logarithm of the fraction's molecular weight, and the next three digits representing the value of the isoelectric point expressed in units described in [20,24].
The same procedure was applied to construct other synthetic maps of proteins from cultured human cell lines, although much fewer 2DE images were used for those maps. * modification [20] formation is entered. The filled Level 2 fields for one of the potential PCa biomarkers, protein NANS (Nacetylneuraminic acid phosphate synthase), are shown in Fig. 5.
The same protein could be present in more than one object. Therefore, within Level 2, one can use the control panel to create cross-links between identical proteins in different modules. An example of such crossreferencing for protein Dj-1 is presented in Fig. 6. The majority of the 359 identified fractions were well-known proteins (and/or their electrophoretic isophorms) described in the literature and various databases. Some information on those proteins, relating to the PCP scope, constituted the third information Level of the database. Level 3 is a standardized system of 23 fields for the entry of text and graphical data. In twelve fields information about the protein is entered, in the next six fields information about the gene coding for  the protein is entered, and in the next three fields information about the protein's polymorphism is entered.
In the other fields, selected references to publications about the general properties of the proteins, as well as its oncological properties, are entered (Fig. 7). The Level 3 text fields can contain hyperlinks to various Internet databases, such as NCBI's Protein, OMIM and PubMed, and SwissProt. This feature allowed the creation of the fourth information Level, providing the user with prompt access to international databases containing, in particular, the results of human genome sequencing.
The PCP database is an interactive MySQL-based web resource located at http://ef.inbi.ras.ru and can  be accessed from any computer connected to the Internet using the Mozilla Firefox and Microsoft Internet Explorer browsers. There are three access permission categories: "Guest," "Manager," and "Administrator," each giving certain rights for working with PCP. In particular, users with "Manager" access permission can make entries into and correct the Level 2 and 3 fields, while users with the "Administrator" category access permission, have the ability to expand the database by creating new modules and new functional elements. Users with "Guest" access can browse all fields but cannot edit them.
In conclusion, our work resulted in the creation of an original multi-module national database entitled "Pros-tate Cancer Proteomics," which summarizes data on the proteins in prostate tissue collected from patients with BHP and PCa, as well as on proteins in several human cell lines. This is very promising in the further use of proteomic and other biochemical data. We are hopeful that the PCP database will be useful to biochemists and other biomedical scientists, making their research on PCa more efficient.